speaker-diarization-3.1

我要开发同款
匿名用户2024年07月31日
29阅读
所属分类aiPytorch、automatic-speech-rec、overlapped-speech-de、voice-activity-detec、speaker-change-detec、speaker-diarization、speaker、speech、voice
开源地址https://modelscope.cn/models/mirror013/speaker-diarization-3.1
授权协议mit

作品详情

Using this open-source pipeline in production?
Make the most of it thanks to our consulting services.

? Speaker diarization 3.1

This pipeline is the same as pyannote/speaker-diarization-3.0 except it removes the problematic use of onnxruntime.
Both speaker segmentation and embedding now run in pure PyTorch. This should ease deployment and possibly speed up inference.
It requires pyannote.audio version 3.1 or higher.

It ingests mono audio sampled at 16kHz and outputs speaker diarization as an Annotation instance:

  • stereo or multi-channel audio files are automatically downmixed to mono by averaging the channels.
  • audio files sampled at a different rate are resampled to 16kHz automatically upon loading.

Requirements

  1. Install pyannote.audio 3.1 with pip install pyannote.audio
  2. Accept pyannote/segmentation-3.0 user conditions
  3. Accept pyannote/speaker-diarization-3.1 user conditions
  4. Create access token at hf.co/settings/tokens.

Usage

# instantiate the pipeline
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained(
  "pyannote/speaker-diarization-3.1",
  use_auth_token="HUGGINGFACE_ACCESS_TOKEN_GOES_HERE")

# run the pipeline on an audio file
diarization = pipeline("audio.wav")

# dump the diarization output to disk using RTTM format
with open("audio.rttm", "w") as rttm:
    diarization.write_rttm(rttm)

Processing on GPU

pyannote.audio pipelines run on CPU by default. You can send them to GPU with the following lines:

import torch
pipeline.to(torch.device("cuda"))

Processing from memory

Pre-loading audio files in memory may result in faster processing:

waveform, sample_rate = torchaudio.load("audio.wav")
diarization = pipeline({"waveform": waveform, "sample_rate": sample_rate})

Monitoring progress

Hooks are available to monitor the progress of the pipeline:

from pyannote.audio.pipelines.utils.hook import ProgressHook
with ProgressHook() as hook:
    diarization = pipeline("audio.wav", hook=hook)

Controlling the number of speakers

In case the number of speakers is known in advance, one can use the num_speakers option:

diarization = pipeline("audio.wav", num_speakers=2)

One can also provide lower and/or upper bounds on the number of speakers using min_speakers and max_speakers options:

diarization = pipeline("audio.wav", min_speakers=2, max_speakers=5)

Benchmark

This pipeline has been benchmarked on a large collection of datasets.

Processing is fully automatic:

  • no manual voice activity detection (as is sometimes the case in the literature)
  • no manual number of speakers (though it is possible to provide it to the pipeline)
  • no fine-tuning of the internal models nor tuning of the pipeline hyper-parameters to each dataset

… with the least forgiving diarization error rate (DER) setup (named "Full" in this paper):

  • no forgiveness collar
  • evaluation of overlapped speech
Benchmark DER% FA% Miss% Conf% Expected output File-level evaluation
AISHELL-4 12.2 3.8 4.4 4.0 RTTM eval
AliMeeting (channel 1) 24.4 4.4 10.0 10.0 RTTM eval
AMI (headset mix, onlywords_) 18.8 3.6 9.5 5.7 RTTM eval
AMI (array1, channel 1, onlywords)_ 22.4 3.8 11.2 7.5 RTTM eval
AVA-AVD 50.0 10.8 15.7 23.4 RTTM eval
DIHARD 3 (Full) 21.7 6.2 8.1 7.3 RTTM eval
MSDWild 25.3 5.8 8.0 11.5 RTTM eval
REPERE (phase 2) 7.8 1.8 2.6 3.5 RTTM eval
VoxConverse (v0.3) 11.3 4.1 3.4 3.8 RTTM eval

Citations

@inproceedings{Plaquet23,
  author={Alexis Plaquet and Hervé Bredin},
  title={{Powerset multi-class cross entropy loss for neural speaker diarization}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
}
@inproceedings{Bredin23,
  author={Hervé Bredin},
  title={{pyannote.audio 2.1 speaker diarization pipeline: principle, benchmark, and recipe}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
}
声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论