RWKV BAT 模型介绍

项目介绍

Boundary Aware Transducer (BAT) 是达摩院语音团队改进传统RNN-Transducer (RNN-T) 得到的计算高效且低延迟的语音识别模型。

RWKV-BAT是以RWKV作为encoder的流式BAT模型。相比基于chunk conformer的流式模型，基于RWKV的模型延迟更小（因为无需使用未来信息），推理时需要的内存更小（因为无需缓存KV cache）。

如何快速体验模型效果

在Notebook中开发

对于有开发需求的使用者，特别推荐您使用Notebook进行离线处理。先登录ModelScope账号，点击模型页面右上角的“在Notebook中打开”按钮出现对话框。api调用方式可参考如下范例：

from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks

inference_pipline = pipeline(
    task=Tasks.auto_speech_recognition,
    model='damo/speech_rwkv_bat_asr-en-16k-librispeech-vocab5003-pytorch-online',
    model_revision="v2.0.2",
    )

rec_result = inference_pipline('https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_en.wav')
print(rec_result)

如何训练自己的BAT模型？

本项目提供的BAT是基于Librispeech的识别模型，开发者可以基于本项目对应的github代码仓库进一步进行模型的领域定制化。

基于github的模型训练和推理

FunASR框架支持魔搭社区开源的工业级的语音识别模型的training & finetuning，使得研究人员和开发者可以更加便捷的进行语音识别模型的研究和生产，目前已在github开源：https://github.com/alibaba-damo-academy/FunASR。

FunASR框架安装

安装FunASR和ModelScope

# Clone the repo:
git clone https://github.com/alibaba/FunASR.git

# Install Conda:
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
sh Miniconda3-latest-Linux-x86_64.sh
conda create -n funasr python=3.7
conda activate funasr

# Install Pytorch (version >= 1.7.0):
conda install pytorch==1.7.0 torchvision==0.8.0 torchaudio==0.7.0 cudatoolkit=9.2 -c pytorch  # For more versions, please see https://pytorch.org/get-started/locally/

# Install ModelScope
pip install "modelscope[audio]" -f https://modelscope.oss-cn-beijing.aliyuncs.com/releases/repo.html

# Install other packages:
pip install --editable ./

数据评估及结果

model	test clean(WER%)	test other(WER%)
RNN-T	3.90	9.56

使用方式以及适用范围

运行范围

现阶段只能在Linux-x86_64运行，不支持Mac和Windows。

使用方式

直接推理：可以直接对输入音频进行解码，输出目标文字。
微调：加载训练好的模型，采用私有或者开源数据进行模型训练。

使用范围与目标场景

适合于在线语音识别场景，如录音文件转写，配合GPU推理效果更加，推荐输入语音时长在20s以下。

模型局限性以及可能的偏差

考虑到特征提取流程和工具以及训练工具差异，会对CER的数据带来一定的差异（<0.1%），推理GPU环境差异导致的RTF数值差异。

RWKV-BAT语音识别-英文-librispeech-16k-在线

作品详情