FPGA-Speech-Recognition开源项目

我要开发同款
匿名用户2021年11月23日
35阅读
所属分类、应用工具、科研计算工具
授权协议GPL-3.0 License

作品详情

FPGASpeechRecognition

SimpleSpeechRecognitionSystemusingMATLABandVHDLonAlteraDE0.DemoVideohere

Introduction

Thisprojectisatrialtodevelopasimplespeechrecognitionengineonlow-endandeducationalFPGAslikeAlteraDE0.Alsoasimplechallengetoexhaustthelimitsoflow-endFPGAsandtammingthemtodoadvancedstuff.Thesystemwasdesignedsoastorecognizethedigit(1or0)beingspokenintothemicrophoneoflaptopthentransferredintoFPGAoverUART.Bothindustryandacademiahavespentaconsiderableeffortinthisfieldfordevelopingsoftwareandhardwaretocomeupwitharobustsolution.However,itisbecauseoflargenumberofaccentsspokenaroundtheworldthatthisconundrumstillremainsanactiveareaofresearch.

SpeechRecognitionfindsnumerousapplicationsincludinghealthcare,artificialintelligence,humancomputerinteraction,InteractiveVoiceResponseSystems,military,avionicsetc.Anothermostimportantapplicationresidesinhelpingthephysicallychallengedpeopletointeractwiththeworldinabetterway.

Theory

Speechrecognitionsystemscanbeclassifiedintoseveralmodelsbydescribingthetypesofutterancestoberecognized.Theseclassesshalltakeintoconsiderationtheabilitytodeterminetheinstancewhenthespeakerstartsandfinishestheutterance.InthisprojectIaimedtoimplementIsolatedWordRecognitionSystemwhichusuallyusedahammingwindowoverthewordbeingspoken.

TheSpeechRecognitionEnginesarebroadlyclassifiedinto2types,namelyPatternRecognitionandAcousticPhoneticsystems.Whiletheformerusetheknown/trainedpatternstodetermineamatch,thelatterusesattributesofthehumanbodytocomparespeechfeatures(phoneticssuchasvowelsounds).Thepatternrecognitionsystemscombinewithcurrentcomputingtechniquesandtendtohavehigheraccuracy.

basicstructureofaspeechrecognitionsystemgoesasfollows:SpeechSignalRecording.SpectralAnalysis(FFT,Windowing,MFCC,PowerSpectrum).ProbabilityEstimation(NeuralNetworks,HiddenMarkovModel,VQ).SignalDecodingandDecisionMaking.AudioSignalsarecapturedusingmicrophonesandit’srecordedinthetimedomain(i.e.varieswithtime).Theproblemwithhumanvoicesignalsthattheyarenotstationaryandtheanalysisofsuchsignalsintimedomainisverycomplicatedproblemandcomputationallycostly.

Herecomestheroleofspectralanalysis,bydoingasetoftransformationsandprocessingalgorithmsontheincomingsignal,itisconvertedintoausableformthatfurtheranalysiscanbedoneonit.

ForthisI'mareusing:

DFT:ThediscreteFouriertransform(DFT)convertsafinitesequenceofequally-spacedsamplesofafunctionintoanequivalent-lengthsequenceofequally-spacedsamplesofthediscrete-timeFouriertransform(DTFT),whichisacomplex-valuedfunctionoffrequency.

HammingWindow:WheneveryoudoafiniteFouriertransform,youareimplicitlyapplyingittoaninfinitelyrepeatingsignal.So,ifthestartandendofthefinitesampledon'tmatchthenthatwilllookjustlikeadiscontinuityinthesignal,andshowupaslotsofhigh-frequencynonsenseintheFouriertransform,whichyoudon'twant.

Andifthesamplehappenstobeabeautifulsinusoidbutanintegernumberofperiodsdon'thappentofitexactlyintothefinitesample,yourFTwillshowappreciableenergyinallsortsofplacesnowhereneartherealfrequency.

Windowingthedatamakessurethattheendsmatchupwhilekeepingeverythingreasonablysmooth;thisgreatlyreducesthesortof"spectralleakage".

EuclideanDistance:TheEuclideandistanceorEuclideanmetricisthe"ordinary"straight-linedistancebetweentwopointsinEuclideanspace.Withthisdistance,Euclideanspacebecomesametricspace.TheassociatednormiscalledtheEuclideannorm.OlderliteraturereferstothemetricasPythagoreanmetric.

HammingDistance:Ininformationtheory,theHammingdistancebetweentwostringsofequallengthisthenumberofpositionsatwhichthecorrespondingsymbolsaredifferent.Inotherwords,itmeasurestheminimumnumberofsubstitutionsrequiredtochangeonestringintotheother,ortheminimumnumberoferrorsthatcouldhavetransformedonestringintotheother.Inamoregeneralcontext,theHammingdistanceisoneofseveralstringmetricsformeasuringtheeditdistancebetweentwosequences.

FFT:TheFFTisafast,O[Nlog(⁡N)]algorithmtocomputetheDiscreteFourierTransform(DFT),whichnaivelyisanO[N^2]computation.TheFFToperatesbydecomposinganNpointtimedomainsignalintoNtimedomainsignalseachcomposedofasinglepoint.ThesecondstepistocalculatetheNfrequencyspectracorrespondingtotheseNtimedomainsignals.Lastly,theNspectraaresynthesizedintoasinglefrequencyspectrum.

Implementation

ThesystemwasfirstintendedtobedevelopedintheFPGAonlywithoutexternalequipmentsbutitwasimpossibletodosoduetothelimitedcapabilitiesoftheboardIhave,soIdividedtheprojectinto2stages,thefront-end(signalacquisitionandanalysis)andtheback-end(patternmatchingandestimation,decisionmakingandUI).

Frontend(MATLAB):ThefrontendisbuiltintomatlabduetotheeaseofdoingDSPonitusingbuiltinfunctions,wehave2programs,onefortrainingandobtainingameansignalandtheotherforrealtimeoperation.stepsdoneinmatlabare:

DataAcquisitionusingmicrophone.Windowing&FastFourierTransformPlotting&DataTransmission.

FilesintheFrontend:[train.m,recorder.m]

Backend(AlteraDE0):DuetothelackofADCinAlteraDE0I'mtransmittingthedatafromthecomputer’smicrophoneusingUSBtoTTLmoduleovertheuartprotocol,thereceiveddataoflength(1000)samplesarecomparedthenwiththesavedvectorsfromthetrainingwithmatlab,theeuclideandistancesarecalculatedandthevectorwithmoreprobabilitytobetherightoneisgivenabiggerweight,weightsarethencomparedthendisplayingthefinalresultson7-SegmentsandLEDs.

ThebackendwasmodelledasaMooreFiniteStateMachinewith4states:(Receiving,CalculatingDistance,DecisionMaking,DisplayingResults).FilesintheBackend:[Voice_Recognition.vhd,uart_tx.vhd,uart_rx.vhd,uart_parity.vhd,uart.vhd]

DesignChoicesandWorkArounds

EuclideanDistanceCalculation:Calculationoftheeuclideandistancefor1000pointlengthvectorisveryexpensivetodoinFPGAdirectlyusingforloops,soIdidalittletrickandcalculatedtheweightsofvectorsindirectly,byonlycountingthestateswherethedistanceequalszero,thisapproachissimilartousingK-nearestneighbourinmachinelearning.Inotherwordswearereallycalculatinghammingdistanceinversely.

FFTPointsDiscarding:DuetotheirrelevanceofallthefrequenciesIonlytook1000pointsanddiscardedthewholesignal,alsowhiletakingtheFFTIdiscardedhalfthesignalsduetosymmetryoftheoutput.

MooreFSM:Thedesignwasmadeinmooremachineforautomaticrecognitionandtodecreasetheuserinteractionwiththesystem,alsoforcomplexityreduction.

UARTModule:UARTwasusedinthemodulefortransmittingdataduetothelimitationsoftheFPGABoard,andduetothesimplicityofimplementationandavailablitiyofconversionmodulesinthemarket.

ResultsRAMConsumptionaround380MBonubuntu16.04LTSforthefrontend.LogicElementsConsumptionis13,757LE.Consumes9144Registerand10,450LogicFunctions.Uses46PinsfortheUIandDataInterface.Accuracy90%forthesamespeaker,decreaseswithspeakerchanging.Candetect2Numbers(oneandzero)Conclusion:

ItwasshownherethatitispossibletoimplementabasicspeechrecognitionsystemonAlteraDE0andit’spossibletoovercomethelimitedcapabilitiesofthehardwarebymanysoftwareworkarounds.

Thesystemisabletosuccessfullyrecognizetwodigits(1and0)toagreataccuracyforthesamespeaker.Thesystemspeakerdependenttoagreatextentduetothelownumberoftestingsamples,thiscanbeimprovedbymakingabiggerdatasetfromvariousspeakers,alsobycalculatingandcomparingtheMFCCswithFFTtheapplicationwillbemoreeffectiveandwithaveryhighaccuracy.

Theavailabilityofmorepowerfulhardware,willallowmetoeasilyimplementmorerobustalgorithmslikeHiddenMarkovModelsandusemorepowerfulADCChipstorecordsoundmorepurelyresultinginmoreaccurateresults.

声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论