Semantic Speaker Turn Detection 模型

Speaker Turn Detectio任务判断说话人转换的位置，我们的模型基于文本模态，预测说话人转换点位置。此信息将帮助到speaker diarization模型。

模型说明

我们的模型基于BERT训练得到，核心为一个二分类的Token Classification任务，需要注意的是，本模型只考察标点之后是否存在转换点，后续会更新更加精细化的模型。

关于此模型的细节以及如何后续应用，请参考我们的论文：

Exploring Speaker-Related Information in Spoken Language Understanding for Better Speaker Diarization

数据集

我们的模型基于如下的数据集训练：

AISHELL-4
ALIMEETING

我们使用一个滑动窗策略来构造训练和测试数据。

模型效果评估

测试集	Precision	Recall	F1	Acc
ALIMEETING Test	0.7945	0.8779	0.8341	0.9146
AISHELL-4 Test	0.8814	0.7317	0.7996	0.925

使用Modelscope本地推理

可以利用modelscope快速使用本模型：

from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks

# build pipeline
speaker_turn_detection = pipeline(
    task=Tasks.speaker_diarization_semantic_speaker_turn_detection,
    model="damo/speech_bert_semantic-spk-turn-detection-punc_speaker-diarization_chinese",
    model_revision="v0.5.0"
)
sentence = "你是如何看待这个问题的呢？这个问题挺好解决的，我们只需要增加停车位就行了。嗯嗯，好，那我们业主就放心了。"
print(speaker_turn_detection(sentence))
# predict_results contains dict(['text', 'logits', 'prediction'])
# you can use logits to build the final results
# The above case will be like:
# "你是如何看待这个问题的呢？|这个问题挺好解决的，我们只需要增加停车位就行了。|嗯嗯，好，那我们业主就放心了。"

这里的示例我们使用|来表示文本上说话人转换点，实际推理过程并不会增加额外字符，请通过logits和prediction获得具体输出。

BERT-语义说话人转换点预测-中文-说话人日志

作品详情

Semantic Speaker Turn Detection 模型

模型说明

数据集

模型效果评估

使用Modelscope本地推理

相关论文以及引用信息

重点城市程序员兼职推荐

重点岗位程序员兼职推荐