开源地址
https://modelscope.cn/models/mirror013/speaker-diarization-3.1授权协议
mit

Usig this ope-source pipelie i productio?
Make the most of it thaks to our cosultig services.

? Speaker diarizatio 3.1

This pipelie is the same as pyaote/speaker-diarizatio-3.0 except it removes the problematic use of oxrutime.
Both speaker segmetatio ad embeddig ow ru i pure PyTorch. This should ease deploymet ad possibly speed up iferece.
It requires pyaote.audio versio 3.1 or higher.

It igests moo audio sampled at 16kHz ad outputs speaker diarizatio as a Aotatio istace:

stereo or multi-chael audio files are automatically dowmixed to moo by averagig the chaels.
audio files sampled at a differet rate are resampled to 16kHz automatically upo loadig.

Requiremets

Istall pyaote.audio 3.1 with pip istall pyaote.audio
Accept pyaote/segmetatio-3.0 user coditios
Accept pyaote/speaker-diarizatio-3.1 user coditios
Create access toke at hf.co/settigs/tokes.

Usage

# istatiate the pipelie
from pyaote.audio import Pipelie
pipelie = Pipelie.from_pretraied(
  "pyaote/speaker-diarizatio-3.1",
  use_auth_toke="HUGGINGFACE_ACCESS_TOKEN_GOES_HERE")

# ru the pipelie o a audio file
diarizatio = pipelie("audio.wav")

# dump the diarizatio output to disk usig RTTM format
with ope("audio.rttm", "w") as rttm:
    diarizatio.write_rttm(rttm)

Processig o GPU

pyaote.audio pipelies ru o CPU by default. You ca sed them to GPU with the followig lies:

import torch
pipelie.to(torch.device("cuda"))

Processig from memory

Pre-loadig audio files i memory may result i faster processig:

waveform, sample_rate = torchaudio.load("audio.wav")
diarizatio = pipelie({"waveform": waveform, "sample_rate": sample_rate})

Moitorig progress

Hooks are available to moitor the progress of the pipelie:

from pyaote.audio.pipelies.utils.hook import ProgressHook
with ProgressHook() as hook:
    diarizatio = pipelie("audio.wav", hook=hook)

Cotrollig the umber of speakers

I case the umber of speakers is kow i advace, oe ca use the um_speakers optio:

diarizatio = pipelie("audio.wav", um_speakers=2)

Oe ca also provide lower ad/or upper bouds o the umber of speakers usig mi_speakers ad max_speakers optios:

diarizatio = pipelie("audio.wav", mi_speakers=2, max_speakers=5)

Bechmark

This pipelie has bee bechmarked o a large collectio of datasets.

Processig is fully automatic:

o maual voice activity detectio (as is sometimes the case i the literature)
o maual umber of speakers (though it is possible to provide it to the pipelie)
o fie-tuig of the iteral models or tuig of the pipelie hyper-parameters to each dataset

… with the least forgivig diarizatio error rate (DER) setup (amed "Full" i this paper):

o forgiveess collar
evaluatio of overlapped speech

Bechmark	DER%	FA%	Miss%	Cof%	Expected output	File-level evaluatio
AISHELL-4	12.2	3.8	4.4	4.0	RTTM	eval
AliMeetig (chael 1)	24.4	4.4	10.0	10.0	RTTM	eval
AMI (headset mix, olywords_)	18.8	3.6	9.5	5.7	RTTM	eval
AMI (array1, chael 1, olywords)_	22.4	3.8	11.2	7.5	RTTM	eval
AVA-AVD	50.0	10.8	15.7	23.4	RTTM	eval
DIHARD 3 (Full)	21.7	6.2	8.1	7.3	RTTM	eval
MSDWild	25.3	5.8	8.0	11.5	RTTM	eval
REPERE (phase 2)	7.8	1.8	2.6	3.5	RTTM	eval
VoxCoverse (v0.3)	11.3	4.1	3.4	3.8	RTTM	eval

Citatios

@iproceedigs{Plaquet23,
  author={Alexis Plaquet ad Hervé Bredi},
  title={{Powerset multi-class cross etropy loss for eural speaker diarizatio}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
}

@iproceedigs{Bredi23,
  author={Hervé Bredi},
  title={{pyaote.audio 2.1 speaker diarizatio pipelie: priciple, bechmark, ad recipe}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
}

Using this open-source pipeline in production? Make the most of it thanks to our consulting services

声明：本文仅代表作者观点，不代表本站立场。如果侵犯到您的合法权益，请联系我们删除侵权资源！如果遇到资源链接失效，请您通过评论或工单的方式通知管理员。未经允许，不得转载，本站所有资源文章禁止商业使用运营!

下载安装【程序员客栈】APP

实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

前往安装

speaker-diarization-3.1

技术信息

作品详情