FPGA-Speech-Recognition_开源项目-程序员客栈

FPGASpeechRecognition

SimpleSpeechRecognitionSystemusingMATLABandVHDLonAlteraDE0.DemoVideohere

Introduction

Thisprojectisatrialtodevelopasimplespeechrecognitionengineonlow-endandeducationalFPGAslikeAlteraDE0.Alsoasimplechallengetoexhaustthelimitsoflow-endFPGAsandtammingthemtodoadvancedstuff.Thesystemwasdesignedsoastorecognizethedigit(1or0)beingspokenintothemicrophoneoflaptopthentransferredintoFPGAoverUART.Bothindustryandacademiahavespentaconsiderableeffortinthisfieldfordevelopingsoftwareandhardwaretocomeupwitharobustsolution.However,itisbecauseoflargenumberofaccentsspokenaroundtheworldthatthisconundrumstillremainsanactiveareaofresearch.

SpeechRecognitionfindsnumerousapplicationsincludinghealthcare,artificialintelligence,humancomputerinteraction,InteractiveVoiceResponseSystems,military,avionicsetc.Anothermostimportantapplicationresidesinhelpingthephysicallychallengedpeopletointeractwiththeworldinabetterway.

Theory

Speechrecognitionsystemscanbeclassifiedintoseveralmodelsbydescribingthetypesofutterancestoberecognized.Theseclassesshalltakeintoconsiderationtheabilitytodeterminetheinstancewhenthespeakerstartsandfinishestheutterance.InthisprojectIaimedtoimplementIsolatedWordRecognitionSystemwhichusuallyusedahammingwindowoverthewordbeingspoken.

TheSpeechRecognitionEnginesarebroadlyclassifiedinto2types,namelyPatternRecognitionandAcousticPhoneticsystems.Whiletheformerusetheknown/trainedpatternstodetermineamatch,thelatterusesattributesofthehumanbodytocomparespeechfeatures(phoneticssuchasvowelsounds).Thepatternrecognitionsystemscombinewithcurrentcomputingtechniquesandtendtohavehigheraccuracy.

basicstructureofaspeechrecognitionsystemgoesasfollows:SpeechSignalRecording.SpectralAnalysis(FFT,Windowing,MFCC,PowerSpectrum).ProbabilityEstimation(NeuralNetworks,HiddenMarkovModel,VQ).SignalDecodingandDecisionMaking.AudioSignalsarecapturedusingmicrophonesandit’srecordedinthetimedomain(i.e.varieswithtime).Theproblemwithhumanvoicesignalsthattheyarenotstationaryandtheanalysisofsuchsignalsintimedomainisverycomplicatedproblemandcomputationallycostly.

Herecomestheroleofspectralanalysis,bydoingasetoftransformationsandprocessingalgorithmsontheincomingsignal,itisconvertedintoausableformthatfurtheranalysiscanbedoneonit.

ForthisI'mareusing:

DFT:ThediscreteFouriertransform(DFT)convertsafinitesequenceofequally-spacedsamplesofafunctionintoanequivalent-lengthsequenceofequally-spacedsamplesofthediscrete-timeFouriertransform(DTFT),whichisacomplex-valuedfunctionoffrequency.

HammingWindow:WheneveryoudoafiniteFouriertransform,youareimplicitlyapplyingittoaninfinitelyrepeatingsignal.So,ifthestartandendofthefinitesampledon'tmatchthenthatwilllookjustlikeadiscontinuityinthesignal,andshowupaslotsofhigh-frequencynonsenseintheFouriertransform,whichyoudon'twant.

Andifthesamplehappenstobeabeautifulsinusoidbutanintegernumberofperiodsdon'thappentofitexactlyintothefinitesample,yourFTwillshowappreciableenergyinallsortsofplacesnowhereneartherealfrequency.

Windowingthedatamakessurethattheendsmatchupwhilekeepingeverythingreasonablysmooth;thisgreatlyreducesthesortof"spectralleakage".

EuclideanDistance:TheEuclideandistanceorEuclideanmetricisthe"ordinary"straight-linedistancebetweentwopointsinEuclideanspace.Withthisdistance,Euclideanspacebecomesametricspace.TheassociatednormiscalledtheEuclideannorm.OlderliteraturereferstothemetricasPythagoreanmetric.

HammingDistance:Ininformationtheory,theHammingdistancebetweentwostringsofequallengthisthenumberofpositionsatwhichthecorrespondingsymbolsaredifferent.Inotherwords,itmeasurestheminimumnumberofsubstitutionsrequiredtochangeonestringintotheother,ortheminimumnumberoferrorsthatcouldhavetransformedonestringintotheother.Inamoregeneralcontext,theHammingdistanceisoneofseveralstringmetricsformeasuringtheeditdistancebetweentwosequences.

FFT:TheFFTisafast,O[Nlog(⁡N)]algorithmtocomputetheDiscreteFourierTransform(DFT),whichnaivelyisanO[N^2]computation.TheFFToperatesbydecomposinganNpointtimedomainsignalintoNtimedomainsignalseachcomposedofasinglepoint.ThesecondstepistocalculatetheNfrequencyspectracorrespondingtotheseNtimedomainsignals.Lastly,theNspectraaresynthesizedintoasinglefrequencyspectrum.

Implementation

ThesystemwasfirstintendedtobedevelopedintheFPGAonlywithoutexternalequipmentsbutitwasimpossibletodosoduetothelimitedcapabilitiesoftheboardIhave,soIdividedtheprojectinto2stages,thefront-end(signalacquisitionandanalysis)andtheback-end(patternmatchingandestimation,decisionmakingandUI).

Frontend(MATLAB):ThefrontendisbuiltintomatlabduetotheeaseofdoingDSPonitusingbuiltinfunctions,wehave2programs,onefortrainingandobtainingameansignalandtheotherforrealtimeoperation.stepsdoneinmatlabare:

DataAcquisitionusingmicrophone.Windowing&FastFourierTransformPlotting&DataTransmission.

FilesintheFrontend:[train.m,recorder.m]

Backend(AlteraDE0):DuetothelackofADCinAlteraDE0I'mtransmittingthedatafromthecomputer’smicrophoneusingUSBtoTTLmoduleovertheuartprotocol,thereceiveddataoflength(1000)samplesarecomparedthenwiththesavedvectorsfromthetrainingwithmatlab,theeuclideandistancesarecalculatedandthevectorwithmoreprobabilitytobetherightoneisgivenabiggerweight,weightsarethencomparedthendisplayingthefinalresultson7-SegmentsandLEDs.

ThebackendwasmodelledasaMooreFiniteStateMachinewith4states:(Receiving,CalculatingDistance,DecisionMaking,DisplayingResults).FilesintheBackend:[Voice_Recognition.vhd,uart_tx.vhd,uart_rx.vhd,uart_parity.vhd,uart.vhd]

DesignChoicesandWorkArounds

EuclideanDistanceCalculation:Calculationoftheeuclideandistancefor1000pointlengthvectorisveryexpensivetodoinFPGAdirectlyusingforloops,soIdidalittletrickandcalculatedtheweightsofvectorsindirectly,byonlycountingthestateswherethedistanceequalszero,thisapproachissimilartousingK-nearestneighbourinmachinelearning.Inotherwordswearereallycalculatinghammingdistanceinversely.

FFTPointsDiscarding:DuetotheirrelevanceofallthefrequenciesIonlytook1000pointsanddiscardedthewholesignal,alsowhiletakingtheFFTIdiscardedhalfthesignalsduetosymmetryoftheoutput.

MooreFSM:Thedesignwasmadeinmooremachineforautomaticrecognitionandtodecreasetheuserinteractionwiththesystem,alsoforcomplexityreduction.

UARTModule:UARTwasusedinthemodulefortransmittingdataduetothelimitationsoftheFPGABoard,andduetothesimplicityofimplementationandavailablitiyofconversionmodulesinthemarket.

ResultsRAMConsumptionaround380MBonubuntu16.04LTSforthefrontend.LogicElementsConsumptionis13,757LE.Consumes9144Registerand10,450LogicFunctions.Uses46PinsfortheUIandDataInterface.Accuracy90%forthesamespeaker,decreaseswithspeakerchanging.Candetect2Numbers(oneandzero)Conclusion:

ItwasshownherethatitispossibletoimplementabasicspeechrecognitionsystemonAlteraDE0andit’spossibletoovercomethelimitedcapabilitiesofthehardwarebymanysoftwareworkarounds.

Thesystemisabletosuccessfullyrecognizetwodigits(1and0)toagreataccuracyforthesamespeaker.Thesystemspeakerdependenttoagreatextentduetothelownumberoftestingsamples,thiscanbeimprovedbymakingabiggerdatasetfromvariousspeakers,alsobycalculatingandcomparingtheMFCCswithFFTtheapplicationwillbemoreeffectiveandwithaveryhighaccuracy.

Theavailabilityofmorepowerfulhardware,willallowmetoeasilyimplementmorerobustalgorithmslikeHiddenMarkovModelsandusemorepowerfulADCChipstorecordsoundmorepurelyresultinginmoreaccurateresults.

FPGA-Speech-Recognition开源项目

作品详情

重点城市程序员兼职推荐

重点岗位程序员兼职推荐