multiband-diffusion

我要开发同款
匿名用户2024年07月31日
26阅读
所属分类aipytorch、audiocraft、music、audio、encodec
开源地址https://modelscope.cn/models/AI-ModelScope/multiband-diffusion
授权协议cc-by-nc-4.0

作品详情


MultiBand Diffusion

This repository contains the weights for Meta's MultiBand Diffusion models, described in this research paper: From Discrete Tokens to High Fidelity Audio using MultiBand Diffusion.

MultiBand diffusion is a collection of 4 models that can decode tokens from EnCodec tokenizer into waveform audio.

Model Details

Model Description

  • Developed by: Meta
  • Model type: Diffusion Models
  • License: The models weights in this repository are released under the CC-BY-NC 4.0 license.

Model Sources [optional]

Installation

Please follow the AudioCraft installation instructions from the README.

Usage

AudioCraft library offers a number of way to use MultiBand Diffusion:

  1. A MusicGen demo includes a toggle to try diffusion decoder. You can use the demo locally by running python -m demos.musicgen_app --share, or through a MusicGen Colab.
  2. You can play with MusicGen by running the jupyter notebook at demos/musicgen_demo.ipynb locally (if you have a GPU).

API

AudioCraft library provides a simple API and pre-trained models for MusicGen and for EnCodec at 24 khz for 3 bitrates (1.5 kbps, 3 kbps and 6 kbps).

See after a quick example for using MultiBandDiffusion with the MusicGen API:

import torchaudio
from audiocraft.models import MusicGen, MultiBandDiffusion
from audiocraft.data.audio import audio_write

model = MusicGen.get_pretrained('facebook/musicgen-melody')
mbd = MultiBandDiffusion.get_mbd_musicgen()
model.set_generation_params(duration=8)  # generate 8 seconds.
wav, tokens = model.generate_unconditional(4, return_tokens=True)    # generates 4 unconditional audio samples and keep the tokens for MBD generation
descriptions = ['happy rock', 'energetic EDM', 'sad jazz']
wav_diffusion = mbd.tokens_to_wav(tokens)
wav, tokens = model.generate(descriptions, return_tokens=True)  # generates 3 samples and keep the tokens.
wav_diffusion = mbd.tokens_to_wav(tokens)
melody, sr = torchaudio.load('./assets/bach.mp3')
# Generates using the melody from the given audio and the provided descriptions, returns audio and audio tokens.
wav, tokens = model.generate_with_chroma(descriptions, melody[None].expand(3, -1, -1), sr, return_tokens=True)
wav_diffusion = mbd.tokens_to_wav(tokens)

for idx, one_wav in enumerate(wav):
    # Will save under {idx}.wav and {idx}_diffusion.wav, with loudness normalization at -14 db LUFS for comparing the methods.
    audio_write(f'{idx}', one_wav.cpu(), model.sample_rate, strategy="loudness", loudness_compressor=True)
    audio_write(f'{idx}_diffusion', wav_diffusion[idx].cpu(), model.sample_rate, strategy="loudness", loudness_compressor=True)

For the compression task (and to compare with EnCodec):

import torch
from audiocraft.models import MultiBandDiffusion
from encodec import EncodecModel
from audiocraft.data.audio import audio_read, audio_write

bandwidth = 3.0  # 1.5, 3.0, 6.0
mbd = MultiBandDiffusion.get_mbd_24khz(bw=bandwidth)
encodec = EncodecModel.get_encodec_24khz()

somepath = ''
wav, sr = audio_read(somepath)
with torch.no_grad():
    compressed_encodec = encodec(wav)
    compressed_diffusion = mbd.regenerate(wav, sample_rate=sr)

audio_write('sample_encodec', compressed_encodec.squeeze(0).cpu(), mbd.sample_rate, strategy="loudness", loudness_compressor=True)
audio_write('sample_diffusion', compressed_diffusion.squeeze(0).cpu(), mbd.sample_rate, strategy="loudness", loudness_compressor=True)

Training

A DiffusionSolver implements Meta diffusion training pipeline. It generates waveform audio conditioned on the embeddings extracted from a pre-trained EnCodec model (see EnCodec documentation from the AudioCraft library for more details on how to train such model).

Note that the library do NOT provide any of the datasets used for training our diffusion models. We provide a dummy dataset containing just a few examples for illustrative purposes.

Example configurations and grids

One can train diffusion models as described in the paper by using this dora grid.

# 4 bands MBD trainning
dora grid diffusion.4_bands_base_32khz

Learn more

Learn more about AudioCraft training pipelines in the dedicated section.

Citation

@article{sanroman2023fromdi,
  title={From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion},
  author={San Roman, Robin and Adi, Yossi and Deleforge, Antoine and Serizel, Romain and Synnaeve, Gabriel and Défossez, Alexandre},
  journal={arXiv preprint arXiv:},
  year={2023}
}

[mbd_samples]: https://ai.honu.io/papers/mbd/

声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论