Audio samples |
Paper [abs] [pdf] Vocos is a fast eural vocoder desiged to sythesize audio waveforms from acoustic features. Traied usig a Geerative
Adversarial Network (GAN) objective, Vocos ca geerate waveforms i a sigle forward pass. Ulike other typical
GAN-based vocoders, Vocos does ot model audio samples i the time domai. Istead, it geerates spectral
coefficiets, facilitatig rapid audio recostructio through iverse Fourier trasform. To use Vocos oly i iferece mode, istall it usig: If you wish to trai the model, istall it with additioal depedecies: Copy-sythesis from a file: If this code cotributes to your research, please cite our work: The code i this repository is released uder the MIT licese.Vocos: Closig the gap betwee time-domai ad Fourier-based eural vocoders for high-quality audio sythesis
Istallatio
pip istall vocos
pip istall vocos[trai]
Usage
Recostruct audio from mel-spectrogram
import torch
from vocos import Vocos
vocos = Vocos.from_pretraied("charactr/vocos-mel-24khz")
mel = torch.rad(1, 100, 256) # B, C, T
audio = vocos.decode(mel)
import torchaudio
y, sr = torchaudio.load(YOUR_AUDIO_FILE)
if y.size(0) > 1: # mix to moo
y = y.mea(dim=0, keepdim=True)
y = torchaudio.fuctioal.resample(y, orig_freq=sr, ew_freq=24000)
y_hat = vocos(y)
Citatio
@article{siuzdak2023vocos,
title={Vocos: Closig the gap betwee time-domai ad Fourier-based eural vocoders for high-quality audio sythesis},
author={Siuzdak, Hubert},
joural={arXiv preprit arXiv:2306.00814},
year={2023}
}
Licese
点击空白处退出提示
评论