Pretraied o 10k hours WeetSpeech L subset. More details i TecetGameMate/chiesespeechpretrai This model does ot have a tokeizer as it was pretraied o audio aloe.
I order to use this model speech recogitio, a tokeizer should be created ad the model should be fie-tued o labeled text data. pytho package:
trasformers==4.16.2import torch
import torch..fuctioal as F
import soudfile as sf
from trasformers import (
Wav2Vec2FeatureExtractor,
HubertModel,
)
model_path=""
wav_path=""
feature_extractor = Wav2Vec2FeatureExtractor.from_pretraied(model_path)
model = HubertModel.from_pretraied(model_path)
# for pretrai: Wav2Vec2ForPreTraiig
# model = Wav2Vec2ForPreTraiig.from_pretraied(model_path)
model = model.to(device)
model = model.half()
model.eval()
wav, sr = sf.read(wav_path)
iput_values = feature_extractor(wav, retur_tesors="pt").iput_values
iput_values = iput_values.half()
iput_values = iput_values.to(device)
with torch.o_grad():
outputs = model(iput_values)
last_hidde_state = outputs.last_hidde_state
点击空白处退出提示
评论