模型的整个实现包含在2个源文件中:
张量运算: ggml.h / ggml.cTransformer 推断: whisper.h / whisper.cpp这种轻量级的模型实现允许容易地将OpenAI的Whisper模型集成到不同的平台和应用程序中。
实现细节核心张量运算在C中实现 (ggml.h / ggml.c)转换器模型和高级C风格的API是用C++实现的 (whisper.h / whisper.cpp)main.cpp 中演示了示例用法stream.cpp 中演示了麦克风的实时音频转录示例 examples 文件夹中提供了各种其他示例张量运算符针对Apple芯片的CPU进行了大量优化。根据计算大小,使用ArmNeonSIMDinstrisics或CBLASAccelerate框架例程。后者对于更大的尺寸特别有效,因为Accelerate框架利用现代Apple产品中提供的专用AMX协处理器。
Quickstart 快速开始首先,下载一个转换为ggml格式的Whisper模型。例如:bash./models/download-ggml-model.shbase.en构建主要示例并转录一个音频文件,如下所示:#buildthemainexamplemake#transcribeanaudiofile./main-fsamples/jfk.wav要快速演示,只需运行 makebase.en :$makebase.encc-I.-O3-std=c11-pthread-DGGML_USE_ACCELERATE-cggml.c-oggml.oc++-I.-I./examples-O3-std=c++11-pthread-cwhisper.cpp-owhisper.oc++-I.-I./examples-O3-std=c++11-pthreadexamples/main/main.cppwhisper.oggml.o-omain-frameworkAccelerate./main-husage:./main[options]file0.wavfile1.wav...options:-h,--help[default]showthishelpmessageandexit-tN,--threadsN[4]numberofthreadstouseduringcomputation-pN,--processorsN[1]numberofprocessorstouseduringcomputation-otN,--offset-tN[0]timeoffsetinmilliseconds-onN,--offset-nN[0]segmentindexoffset-dN,--durationN[0]durationofaudiotoprocessinmilliseconds-mcN,--max-contextN[-1]maximumnumberoftextcontexttokenstostore-mlN,--max-lenN[0]maximumsegmentlengthincharacters-boN,--best-ofN[5]numberofbestcandidatestokeep-bsN,--beam-sizeN[-1]beamsizeforbeamsearch-wtN,--word-tholdN[0.01]wordtimestampprobabilitythreshold-etN,--entropy-tholdN[2.40]entropythresholdfordecoderfail-lptN,--logprob-tholdN[-1.00]logprobabilitythresholdfordecoderfail-su,--speed-up[false]speedupaudiobyx2(reducedaccuracy)-tr,--translate[false]translatefromsourcelanguagetoenglish-di,--diarize[false]stereoaudiodiarization-nf,--no-fallback[false]donotusetemperaturefallbackwhiledecoding-otxt,--output-txt[false]outputresultinatextfile-ovtt,--output-vtt[false]outputresultinavttfile-osrt,--output-srt[false]outputresultinasrtfile-owts,--output-words[false]outputscriptforgeneratingkaraokevideo-ocsv,--output-csv[false]outputresultinaCSVfile-ofFNAME,--output-fileFNAME[]outputfilepath(withoutfileextension)-ps,--print-special[false]printspecialtokens-pc,--print-colors[false]printcolors-pp,--print-progress[false]printprogress-nt,--no-timestamps[true]donotprinttimestamps-lLANG,--languageLANG[en]spokenlanguage('auto'forauto-detect)--promptPROMPT[]initialprompt-mFNAME,--modelFNAME[models/ggml-base.en.bin]modelpath-fFNAME,--fileFNAME[]inputWAVfilepathbash./models/download-ggml-model.shbase.enDownloadingggmlmodelbase.en...ggml-base.en.bin100%[========================>]141.11M6.34MB/sin24sDone!Model'base.en'savedin'models/ggml-base.en.bin'Youcannowuseitlikethis:$./main-mmodels/ggml-base.en.bin-fsamples/jfk.wav===============================================Runningbase.enonallsamplesin./samples...===============================================----------------------------------------------[+]Runningbase.enonsamples/jfk.wav...(run'ffplaysamples/jfk.wav'tolisten)----------------------------------------------whisper_init_from_file:loadingmodelfrom'models/ggml-base.en.bin'whisper_model_load:loadingmodelwhisper_model_load:n_vocab=51864whisper_model_load:n_audio_ctx=1500whisper_model_load:n_audio_state=512whisper_model_load:n_audio_head=8whisper_model_load:n_audio_layer=6whisper_model_load:n_text_ctx=448whisper_model_load:n_text_state=512whisper_model_load:n_text_head=8whisper_model_load:n_text_layer=6whisper_model_load:n_mels=80whisper_model_load:f16=1whisper_model_load:type=2whisper_model_load:memrequired=215.00MB(+6.00MBperdecoder)whisper_model_load:kvselfsize=5.25MBwhisper_model_load:kvcrosssize=17.58MBwhisper_model_load:adding1607extratokenswhisper_model_load:modelctx=140.60MBwhisper_model_load:modelsize=140.54MBsystem_info:n_threads=4/10|AVX=0|AVX2=0|AVX512=0|FMA=0|NEON=1|ARM_FMA=1|F16C=0|FP16_VA=1|WASM_SIMD=0|BLAS=1|SSE3=0|VSX=0|main:processing'samples/jfk.wav'(176000samples,11.0sec),4threads,1processors,lang=en,task=transcribe,timestamps=1...[00:00:00.000-->00:00:11.000]AndsomyfellowAmericans,asknotwhatyourcountrycandoforyou,askwhatyoucandoforyourcountry.whisper_print_timings:fallbacks=0p/0hwhisper_print_timings:loadtime=113.81mswhisper_print_timings:meltime=15.40mswhisper_print_timings:sampletime=11.58ms/27runs(0.43msperrun)whisper_print_timings:encodetime=266.60ms/1runs(266.60msperrun)whisper_print_timings:decodetime=66.11ms/27runs(2.45msperrun)whisper_print_timings:totaltime=476.31ms该命令下载转换为自定义 ggml 格式的 base.en 模型,并对文件夹 samples 中的所有 .wav 样本运行推理。
有关详细的使用说明,请运行: ./main-h
请注意,主要示例当前仅使用16位WAV文件运行,因此请确保在运行该工具之前转换您的输入。例如,您可以像这样使用 ffmpeg :ffmpeg-iinput.mp3-ar16000-ac1-c:apcm_s16leoutput.wav内存使用状况ModelDiskMemSHAtiny75MB~125MBbd577a113a864445d4c299885e0cb97d4ba92b5fbase142MB~210MB465707469ff3a37a2b9b8d8f89f2f99de7299dacsmall466MB~600MB55356645c2b361a969dfd0ef2c5a50d530afd8d5medium1.5GB~1.7GBfd9727b6e1217c2f614f9b698455c4ffd82463b4large2.9GB~3.3GB0f4c8e34f21cf1a914c59d8b3ce882345ad349d6
评论