whisper.cpp OpenAI Whisper 模型的 C/C++ 移植开源项目

我要开发同款
匿名用户2023年03月09日
48阅读
开发技术C/C++
所属分类人工智能、机器学习/深度学习
授权协议MIT

作品详情

whisper.cpp是 OpenAI的Whisper 自动语音识别(ASR)模型的 C/C++移植。

特性没有依赖项的普通C/C++实现Applesilicon一等公民-通过ArmNeon和Accelerate框架优化AVX内在函数支持x86架构VSX内在函数支持POWER架构混合F16/F32精度内存使用率低(FlashAttention)运行时零内存分配在CPU上运行C风格的API支持的平台:

 MacOS(IntelandArm) iOS Android Linux/FreeBSD WebAssembly Windows(MSVCandMinGW] RaspberryPi

模型的整个实现包含在2个源文件中:

张量运算: ggml.h / ggml.cTransformer 推断: whisper.h / whisper.cpp

这种轻量级的模型实现允许容易地将OpenAI的Whisper模型集成到不同的平台和应用程序中。

实现细节核心张量运算在C中实现 (ggml.h / ggml.c)转换器模型和高级C风格的API是用C++实现的 (whisper.h / whisper.cpp)main.cpp 中演示了示例用法stream.cpp 中演示了麦克风的实时音频转录示例 examples 文件夹中提供了各种其他示例

张量运算符针对Apple芯片的CPU进行了大量优化。根据计算大小,使用ArmNeonSIMDinstrisics或CBLASAccelerate框架例程。后者对于更大的尺寸特别有效,因为Accelerate框架利用现代Apple产品中提供的专用AMX协处理器。

Quickstart 快速开始首先,下载一个转换为ggml格式的Whisper模型。例如:

bash./models/download-ggml-model.shbase.en构建主要示例并转录一个音频文件,如下所示:

#buildthemainexamplemake#transcribeanaudiofile./main-fsamples/jfk.wav要快速演示,只需运行 makebase.en :

$makebase.encc-I.-O3-std=c11-pthread-DGGML_USE_ACCELERATE-cggml.c-oggml.oc++-I.-I./examples-O3-std=c++11-pthread-cwhisper.cpp-owhisper.oc++-I.-I./examples-O3-std=c++11-pthreadexamples/main/main.cppwhisper.oggml.o-omain-frameworkAccelerate./main-husage:./main[options]file0.wavfile1.wav...options:-h,--help[default]showthishelpmessageandexit-tN,--threadsN[4]numberofthreadstouseduringcomputation-pN,--processorsN[1]numberofprocessorstouseduringcomputation-otN,--offset-tN[0]timeoffsetinmilliseconds-onN,--offset-nN[0]segmentindexoffset-dN,--durationN[0]durationofaudiotoprocessinmilliseconds-mcN,--max-contextN[-1]maximumnumberoftextcontexttokenstostore-mlN,--max-lenN[0]maximumsegmentlengthincharacters-boN,--best-ofN[5]numberofbestcandidatestokeep-bsN,--beam-sizeN[-1]beamsizeforbeamsearch-wtN,--word-tholdN[0.01]wordtimestampprobabilitythreshold-etN,--entropy-tholdN[2.40]entropythresholdfordecoderfail-lptN,--logprob-tholdN[-1.00]logprobabilitythresholdfordecoderfail-su,--speed-up[false]speedupaudiobyx2(reducedaccuracy)-tr,--translate[false]translatefromsourcelanguagetoenglish-di,--diarize[false]stereoaudiodiarization-nf,--no-fallback[false]donotusetemperaturefallbackwhiledecoding-otxt,--output-txt[false]outputresultinatextfile-ovtt,--output-vtt[false]outputresultinavttfile-osrt,--output-srt[false]outputresultinasrtfile-owts,--output-words[false]outputscriptforgeneratingkaraokevideo-ocsv,--output-csv[false]outputresultinaCSVfile-ofFNAME,--output-fileFNAME[]outputfilepath(withoutfileextension)-ps,--print-special[false]printspecialtokens-pc,--print-colors[false]printcolors-pp,--print-progress[false]printprogress-nt,--no-timestamps[true]donotprinttimestamps-lLANG,--languageLANG[en]spokenlanguage('auto'forauto-detect)--promptPROMPT[]initialprompt-mFNAME,--modelFNAME[models/ggml-base.en.bin]modelpath-fFNAME,--fileFNAME[]inputWAVfilepathbash./models/download-ggml-model.shbase.enDownloadingggmlmodelbase.en...ggml-base.en.bin100%[========================>]141.11M6.34MB/sin24sDone!Model'base.en'savedin'models/ggml-base.en.bin'Youcannowuseitlikethis:$./main-mmodels/ggml-base.en.bin-fsamples/jfk.wav===============================================Runningbase.enonallsamplesin./samples...===============================================----------------------------------------------[+]Runningbase.enonsamples/jfk.wav...(run'ffplaysamples/jfk.wav'tolisten)----------------------------------------------whisper_init_from_file:loadingmodelfrom'models/ggml-base.en.bin'whisper_model_load:loadingmodelwhisper_model_load:n_vocab=51864whisper_model_load:n_audio_ctx=1500whisper_model_load:n_audio_state=512whisper_model_load:n_audio_head=8whisper_model_load:n_audio_layer=6whisper_model_load:n_text_ctx=448whisper_model_load:n_text_state=512whisper_model_load:n_text_head=8whisper_model_load:n_text_layer=6whisper_model_load:n_mels=80whisper_model_load:f16=1whisper_model_load:type=2whisper_model_load:memrequired=215.00MB(+6.00MBperdecoder)whisper_model_load:kvselfsize=5.25MBwhisper_model_load:kvcrosssize=17.58MBwhisper_model_load:adding1607extratokenswhisper_model_load:modelctx=140.60MBwhisper_model_load:modelsize=140.54MBsystem_info:n_threads=4/10|AVX=0|AVX2=0|AVX512=0|FMA=0|NEON=1|ARM_FMA=1|F16C=0|FP16_VA=1|WASM_SIMD=0|BLAS=1|SSE3=0|VSX=0|main:processing'samples/jfk.wav'(176000samples,11.0sec),4threads,1processors,lang=en,task=transcribe,timestamps=1...[00:00:00.000-->00:00:11.000]AndsomyfellowAmericans,asknotwhatyourcountrycandoforyou,askwhatyoucandoforyourcountry.whisper_print_timings:fallbacks=0p/0hwhisper_print_timings:loadtime=113.81mswhisper_print_timings:meltime=15.40mswhisper_print_timings:sampletime=11.58ms/27runs(0.43msperrun)whisper_print_timings:encodetime=266.60ms/1runs(266.60msperrun)whisper_print_timings:decodetime=66.11ms/27runs(2.45msperrun)whisper_print_timings:totaltime=476.31ms

 该命令下载转换为自定义 ggml 格式的 base.en 模型,并对文件夹 samples 中的所有 .wav 样本运行推理。

有关详细的使用说明,请运行: ./main-h

请注意,主要示例当前仅使用16位WAV文件运行,因此请确保在运行该工具之前转换您的输入。例如,您可以像这样使用 ffmpeg :

ffmpeg-iinput.mp3-ar16000-ac1-c:apcm_s16leoutput.wav内存使用状况ModelDiskMemSHAtiny75MB~125MBbd577a113a864445d4c299885e0cb97d4ba92b5fbase142MB~210MB465707469ff3a37a2b9b8d8f89f2f99de7299dacsmall466MB~600MB55356645c2b361a969dfd0ef2c5a50d530afd8d5medium1.5GB~1.7GBfd9727b6e1217c2f614f9b698455c4ffd82463b4large2.9GB~3.3GB0f4c8e34f21cf1a914c59d8b3ce882345ad349d6

 

声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论