腾讯中文hubert base模型

我要开发同款
匿名用户2024年07月31日
29阅读
所属分类ai
开源地址https://modelscope.cn/models/innnky/chinese-hubert-base-tencent
授权协议mit

作品详情

Pretrained on 10k hours WenetSpeech L subset. More details in TencentGameMate/chinesespeechpretrain

This model does not have a tokenizer as it was pretrained on audio alone. In order to use this model speech recognition, a tokenizer should be created and the model should be fine-tuned on labeled text data.

python package: transformers==4.16.2

import torch
import torch.nn.functional as F
import soundfile as sf

from transformers import (
    Wav2Vec2FeatureExtractor,
    HubertModel,
)


model_path=""
wav_path=""

feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained(model_path)
model = HubertModel.from_pretrained(model_path)

# for pretrain: Wav2Vec2ForPreTraining
# model = Wav2Vec2ForPreTraining.from_pretrained(model_path)

model = model.to(device)
model = model.half()
model.eval()

wav, sr = sf.read(wav_path)
input_values = feature_extractor(wav, return_tensors="pt").input_values
input_values = input_values.half()
input_values = input_values.to(device)

with torch.no_grad():
    outputs = model(input_values)
    last_hidden_state = outputs.last_hidden_state
声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论