美声民族唱法分类模型

我要开发同款
匿名用户2024年07月31日
13阅读
开发技术Pytorch
所属分类ai
开源地址https://modelscope.cn/models/ccmusic-database/bel_canto
授权协议MIT License

作品详情

美声民族唱法分类模型旨在区分古典和民族声乐风格,所有音频样本均由专业歌手演唱。该模型使用一个包含四个类别的音频数据集进行微调,该数据集已经被转换为频谱图。骨干网络最初在计算机视觉(CV)领域进行预训练,随后经过专门为声乐风格分类任务设计的微调过程。在这个模型中,CV任务上的预训练为网络学习通用音频特征提供了基础,然后在微调过程中这些特征被调整以适应古典和民族声乐风格的微妙差异。这个包含四个类别的音频数据集包括来自古典和各种民族歌唱传统的样本,使模型能够捕捉与每种声乐风格相关的独特模式。将频谱图作为输入表示使模型能够有效地分析音频信号的时域和频域成分。通过微调过程,模型不断提升其辨别古典和民族风格之间声音传递和风格微妙差异的能力。这一专业模型在音乐产业和文化保护方面具有巨大潜力,因为它能够准确地将声乐表演分类为这两个广泛的类别。其基于预训练计算机视觉原理的基础展示了神经网络在不同领域的多功能性和适应性,增强了模型捕捉声乐表演复杂特征的能力。

The classification model for Bel Canto and Folk singing styles aims to distinguish between classical and folk vocal styles, with all audio samples performed by professional singers. The model is fine-tuned using an audio dataset comprising four categories, which have been converted into spectrograms. The backbone network was initially pre-trained in the field of computer vision (CV) and subsequently fine-tuned specifically for the task of vocal style classification. In this model, pre-training on CV tasks provides the network with a foundation for learning general audio features, which are then adjusted during the fine-tuning process to capture the subtle differences between classical and folk vocal styles. This audio dataset, which includes samples from classical and various folk singing traditions, enables the model to identify unique patterns associated with each vocal style. By using spectrograms as input representations, the model effectively analyzes the temporal and spectral components of audio signals. Through the fine-tuning process, the model continually enhances its ability to discern the nuanced differences in vocal delivery and style between classical and folk genres. This specialized model holds significant potential in the music industry and cultural preservation, as it accurately classifies vocal performances into these two broad categories. Its foundation in pre-trained computer vision principles demonstrates the versatility and adaptability of neural networks across different domains, enhancing the model's ability to capture the complex characteristics of vocal performances.

在线演示(Demo)

https://www.modelscope.cn/studios/ccmusic-database/bel-canto

使用(Usage)

from modelscope import snapshot_download
model_dir = snapshot_download('ccmusic-database/bel_canto')

维护(Maintenance)

GIT_LFS_SKIP_SMUDGE=1 git clone https://www.modelscope.cn/ccmusic-database/bel_canto.git
cd bel_canto

训练结果(Results)

一个 SqueezeNet 网络的微调结果(Fine-tuning results for a SqueezeNet network):

Loss curve
Training and validation accuracy
Confusion matrix

数据集(Dataset)

https://www.modelscope.cn/datasets/ccmusic-database/bel_canto

镜像(Mirror)

https://huggingface.co/datasets/ccmusic-database/bel_canto

评估(Evaluation)

https://github.com/monetjoe/ccmusic_eval

引用(Cite)

@dataset{zhaorui_liu_2021_5676893,
  author       = {Monan Zhou, Shenyang Xu, Zhaorui Liu, Zhaowen Wang, Feng Yu, Wei Li and Baoqiang Han},
  title        = {CCMusic: an Open and Diverse Database for Chinese and General Music Information Retrieval Research},
  month        = {mar},
  year         = {2024},
  publisher    = {HuggingFace},
  version      = {1.2},
  url          = {https://huggingface.co/ccmusic-database}
}
声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论