Whisper-LID模型介绍
Highlights
- 支持语种识别,包括中文、英文、日语、韩语等多个语种。
- 基于语种识别结果,配合Whisper模型在未知语种场景下进行语音识别。
FunASR开源项目介绍
FunASR希望在语音识别的学术研究和工业应用之间架起一座桥梁。通过发布工业级语音识别模型的训练和微调,研究人员和开发人员可以更方便地进行语音识别模型的研究和生产,并推动语音识别生态的发展。让语音识别更有趣!
github仓库 | 最新动态 | 环境安装 | 服务部署 | 模型库 | 联系我们
模型原理介绍
Whisper是OpenAI发布的支持多语种语音识别的语音系统,其在许多benchmark上都取得了不错的性能。在此,我们基于Whisper编码器得到的语音特征,使用EResNet作为语种识别模块,提供比原始Whisper更为准确的语种识别能力。
基于ModelScope进行推理
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
multilingual_wavs=[
"https://www.modelscope.cn/api/v1/models/iic/speech_whisper-large_lid_multilingual_pytorch/repo?Revision=master&FilePath=examples/example_zh-CN.mp3",
"https://www.modelscope.cn/api/v1/models/iic/speech_whisper-large_lid_multilingual_pytorch/repo?Revision=master&FilePath=examples/example_en.mp3",
"https://www.modelscope.cn/api/v1/models/iic/speech_whisper-large_lid_multilingual_pytorch/repo?Revision=master&FilePath=examples/example_ja.mp3",
"https://www.modelscope.cn/api/v1/models/iic/speech_whisper-large_lid_multilingual_pytorch/repo?Revision=master&FilePath=examples/example_ko.mp3",
]
inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
model='iic/speech_whisper-large_lid_multilingual_pytorch', model_revision="v2.0.4")
for wav in multilingual_wavs:
rec_result = inference_pipeline(input=wav)
print(rec_result)
基于FunASR进行推理
from funasr import AutoModel
multilingual_wavs = [
"example_zh-CN.mp3",
"example_en.mp3",
"example_ja.mp3",
"example_ko.mp3",
]
model = AutoModel(model="iic/speech_whisper-large_lid_multilingual_pytorch", model_revision="v2.0.4")
for wav_id in multilingual_wavs:
wav_file = f"{model.model_path}/examples/{wav_id}"
res = model.generate(input=wav_file, data_type="sound")
print("detect sample {}: {}".format(wav_id, res))
相关论文以及引用信息
@inproceedings{radford2023robust,
title={Robust speech recognition via large-scale weak supervision},
author={Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya},
booktitle={International Conference on Machine Learning},
pages={28492--28518},
year={2023},
organization={PMLR}
}
评论