speaker-diarization-3.1

我要开发同款
匿名用户2024年07月31日
75阅读

技术信息

开源地址
https://modelscope.cn/models/mirror013/speaker-diarization-3.1
授权协议
mit

作品详情

Usig this ope-source pipelie i productio?
Make the most of it thaks to our cosultig services.

? Speaker diarizatio 3.1

This pipelie is the same as pyaote/speaker-diarizatio-3.0 except it removes the problematic use of oxrutime.
Both speaker segmetatio ad embeddig ow ru i pure PyTorch. This should ease deploymet ad possibly speed up iferece.
It requires pyaote.audio versio 3.1 or higher.

It igests moo audio sampled at 16kHz ad outputs speaker diarizatio as a Aotatio istace:

  • stereo or multi-chael audio files are automatically dowmixed to moo by averagig the chaels.
  • audio files sampled at a differet rate are resampled to 16kHz automatically upo loadig.

Requiremets

  1. Istall pyaote.audio 3.1 with pip istall pyaote.audio
  2. Accept pyaote/segmetatio-3.0 user coditios
  3. Accept pyaote/speaker-diarizatio-3.1 user coditios
  4. Create access toke at hf.co/settigs/tokes.

Usage

# istatiate the pipelie
from pyaote.audio import Pipelie
pipelie = Pipelie.from_pretraied(
  "pyaote/speaker-diarizatio-3.1",
  use_auth_toke="HUGGINGFACE_ACCESS_TOKEN_GOES_HERE")

# ru the pipelie o a audio file
diarizatio = pipelie("audio.wav")

# dump the diarizatio output to disk usig RTTM format
with ope("audio.rttm", "w") as rttm:
    diarizatio.write_rttm(rttm)

Processig o GPU

pyaote.audio pipelies ru o CPU by default. You ca sed them to GPU with the followig lies:

import torch
pipelie.to(torch.device("cuda"))

Processig from memory

Pre-loadig audio files i memory may result i faster processig:

waveform, sample_rate = torchaudio.load("audio.wav")
diarizatio = pipelie({"waveform": waveform, "sample_rate": sample_rate})

Moitorig progress

Hooks are available to moitor the progress of the pipelie:

from pyaote.audio.pipelies.utils.hook import ProgressHook
with ProgressHook() as hook:
    diarizatio = pipelie("audio.wav", hook=hook)

Cotrollig the umber of speakers

I case the umber of speakers is kow i advace, oe ca use the um_speakers optio:

diarizatio = pipelie("audio.wav", um_speakers=2)

Oe ca also provide lower ad/or upper bouds o the umber of speakers usig mi_speakers ad max_speakers optios:

diarizatio = pipelie("audio.wav", mi_speakers=2, max_speakers=5)

Bechmark

This pipelie has bee bechmarked o a large collectio of datasets.

Processig is fully automatic:

  • o maual voice activity detectio (as is sometimes the case i the literature)
  • o maual umber of speakers (though it is possible to provide it to the pipelie)
  • o fie-tuig of the iteral models or tuig of the pipelie hyper-parameters to each dataset

… with the least forgivig diarizatio error rate (DER) setup (amed "Full" i this paper):

  • o forgiveess collar
  • evaluatio of overlapped speech
Bechmark DER% FA% Miss% Cof% Expected output File-level evaluatio
AISHELL-4 12.2 3.8 4.4 4.0 RTTM eval
AliMeetig (chael 1) 24.4 4.4 10.0 10.0 RTTM eval
AMI (headset mix, olywords_) 18.8 3.6 9.5 5.7 RTTM eval
AMI (array1, chael 1, olywords)_ 22.4 3.8 11.2 7.5 RTTM eval
AVA-AVD 50.0 10.8 15.7 23.4 RTTM eval
DIHARD 3 (Full) 21.7 6.2 8.1 7.3 RTTM eval
MSDWild 25.3 5.8 8.0 11.5 RTTM eval
REPERE (phase 2) 7.8 1.8 2.6 3.5 RTTM eval
VoxCoverse (v0.3) 11.3 4.1 3.4 3.8 RTTM eval

Citatios

@iproceedigs{Plaquet23,
  author={Alexis Plaquet ad Hervé Bredi},
  title={{Powerset multi-class cross etropy loss for eural speaker diarizatio}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
}
@iproceedigs{Bredi23,
  author={Hervé Bredi},
  title={{pyaote.audio 2.1 speaker diarizatio pipelie: priciple, bechmark, ad recipe}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
}

功能介绍

Using this open-source pipeline in production? Make the most of it thanks to our consulting services

声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论