ThisrepositorycontainsthelectureslidesandcoursedescriptionfortheDeepNaturalLanguageProcessingcourseofferedinHilaryTerm2017attheUniversityofOxford.
Thisisanadvancedcourseonnaturallanguageprocessing.AutomaticallyprocessingnaturallanguageinputsandproducinglanguageoutputsisakeycomponentofArtificialGeneralIntelligence.TheambiguitiesandnoiseinherentinhumancommunicationrendertraditionalsymbolicAItechniquesineffectiveforrepresentingandanalysinglanguagedata.Recentlystatisticaltechniquesbasedonneuralnetworkshaveachievedanumberofremarkablesuccessesinnaturallanguageprocessingleadingtoagreatdealofcommercialandacademicinterestinthefield
Thisisanappliedcoursefocussingonrecentadvancesinanalysingandgeneratingspeechandtextusingrecurrentneuralnetworks.Weintroducethemathematicaldefinitionsoftherelevantmachinelearningmodelsandderivetheirassociatedoptimisationalgorithms.ThecoursecoversarangeofapplicationsofneuralnetworksinNLPincludinganalysinglatentdimensionsintext,transcribingspeechtotext,translatingbetweenlanguages,andansweringquestions.Thesetopicsareorganisedintothreehighlevelthemesformingaprogressionfromunderstandingtheuseofneuralnetworksforsequentiallanguagemodelling,tounderstandingtheiruseasconditionallanguagemodelsfortransductiontasks,andfinallytoapproachesemployingthesetechniquesincombinationwithothermechanismsforadvancedapplications.ThroughoutthecoursethepracticalimplementationofsuchmodelsonCPUandGPUhardwareisalsodiscussed.
ThiscourseisorganisedbyPhilBlunsomanddeliveredinpartnershipwiththeDeepMindNaturalLanguageResearchGroup.
LecturersPhilBlunsom(OxfordUniversityandDeepMind)ChrisDyer(CarnegieMellonUniversityandDeepMind)EdwardGrefenstette(DeepMind)KarlMoritzHermann(DeepMind)AndrewSenior(DeepMind)WangLing(DeepMind)JeremyAppleyard(NVIDIA)TAsYannisAssaelYishuMiaoBrendanShillingfordJanBuysTimetablePracticalsGroup1-Monday,9:00-11:00(Weeks2-8),60.05ThomBuildingGroup2-Friday,16:00-18:00(Weeks2-8),Room379Practical1:word2vecPractical2:textclassificationPractical3:recurrentneuralnetworksfortextclassificationandlanguagemodellingPractical4:openpracticalLecturesPublicLecturesareheldinLectureTheatre1oftheMathsInstitute,onTuesdaysandThursdays(exceptweek8),16:00-18:00(HilaryTermWeeks1,3-8).
LectureMaterials1.Lecture1a-Introduction[PhilBlunsom]ThislectureintroducesthecourseandmotivateswhyitisinterestingtostudylanguageprocessingusingDeepLearningtechniques.
[slides][video]
2.Lecture1b-DeepNeuralNetworksAreOurFriends[WangLing]Thislecturerevisesbasicmachinelearningconceptsthatstudentsshouldknowbeforeembarkingonthiscourse.
[slides][video]
3.Lecture2a-WordLevelSemantics[EdGrefenstette]Wordsarethecoremeaningbearingunitsinlanguage.RepresentingandlearningthemeaningsofwordsisafundamentaltaskinNLPandinthislecturetheconceptofawordembeddingisintroducedasapracticalandscalablesolution.
[slides][video]
ReadingEmbeddingsBasicsFirth,JohnR."Asynopsisoflinguistictheory,1930-1955."(1957):1-32.Curran,JamesRichard."Fromdistributionaltosemanticsimilarity."(2004).Collobert,Ronan,etal."Naturallanguageprocessing(almost)fromscratch."JournalofMachineLearningResearch12.Aug(2011):2493-2537.Mikolov,Tomas,etal."Distributedrepresentationsofwordsandphrasesandtheircompositionality."Advancesinneuralinformationprocessingsystems.2013.DatasetsandVisualisationFinkelstein,Lev,etal."Placingsearchincontext:Theconceptrevisited."Proceedingsofthe10thinternationalconferenceonWorldWideWeb.ACM,2001.Hill,Felix,RoiReichart,andAnnaKorhonen."Simlex-999:Evaluatingsemanticmodelswith(genuine)similarityestimation."ComputationalLinguistics(2016).Maaten,Laurensvander,andGeoffreyHinton."Visualizingdatausingt-SNE."JournalofMachineLearningResearch9.Nov(2008):2579-2605.BlogpostsDeepLearning,NLP,andRepresentations,ChristopherOlah.VisualizingTopTweepswitht-SNE,inJavascript,AndrejKarpathy.FurtherReadingHermann,KarlMoritz,andPhilBlunsom."Multilingualmodelsforcompositionaldistributedsemantics."arXivpreprintarXiv:1404.4641(2014).Levy,Omer,andYoavGoldberg."Neuralwordembeddingasimplicitmatrixfactorization."Advancesinneuralinformationprocessingsystems.2014.Levy,Omer,YoavGoldberg,andIdoDagan."Improvingdistributionalsimilaritywithlessonslearnedfromwordembeddings."TransactionsoftheAssociationforComputationalLinguistics3(2015):211-225.Ling,Wang,etal."Two/TooSimpleAdaptationsofWord2VecforSyntaxProblems."HLT-NAACL.2015.4.Lecture2b-OverviewofthePracticals[ChrisDyer]Thislecturemotivatesthepracticalsegmentofthecourse.
[slides][video]
5.Lecture3-LanguageModellingandRNNsPart1[PhilBlunsom]LanguagemodellingisimportanttaskofgreatpracticaluseinmanyNLPapplications.Thislectureintroduceslanguagemodelling,includingtraditionaln-grambasedapproachesandmorecontemporaryneuralapproaches.InparticularthepopularRecurrentNeuralNetwork(RNN)languagemodelisintroducedanditsbasictrainingandevaluationalgorithmsdescribed.
[slides][video]
ReadingTextbookDeepLearning,Chapter10.BlogsTheUnreasonableEffectivenessofRecurrentNeuralNetworks,AndrejKarpathy.TheunreasonableeffectivenessofCharacter-levelLanguageModels,YoavGoldberg.Explainingandillustratingorthogonalinitializationforrecurrentneuralnetworks,StephenMerity.6.Lecture4-LanguageModellingandRNNsPart2[PhilBlunsom]ThislecturecontinuesonfromthepreviousoneandconsiderssomeoftheissuesinvolvedinproducinganeffectiveimplementationofanRNNlanguagemodel.Thevanishingandexplodinggradientproblemisdescribedandarchitecturalsolutions,suchasLongShortTermMemory(LSTM),areintroduced.
[slides][video]
ReadingTextbookDeepLearning,Chapter10.Vanishinggradients,LSTMsetc.Onthedifficultyoftrainingrecurrentneuralnetworks.Pascanuetal.,ICML2013.LongShort-TermMemory.HochreiterandSchmidhuber,NeuralComputation1997.LearningPhraseRepresentationsusingRNNEncoderDecoderforStatisticalMachineTranslation.Choetal,EMNLP2014.Blog:UnderstandingLSTMNetworks,ChristopherOlah.DealingwithlargevocabulariesAscalablehierarchicaldistributedlanguagemodel.MnihandHinton,NIPS2009.Afastandsimplealgorithmfortrainingneuralprobabilisticlanguagemodels.MnihandTeh,ICML2012.OnUsingVeryLargeTargetVocabularyforNeuralMachineTranslation.Jeanetal.,ACL2015.ExploringtheLimitsofLanguageModeling.Jozefowiczetal.,arXiv2016.EfficientsoftmaxapproximationforGPUs.Graveetal.,arXiv2016.NotesonNoiseContrastiveEstimationandNegativeSampling.Dyer,arXiv2014.PragmaticNeuralLanguageModellinginMachineTranslation.BaltescuandBlunsom,NAACL2015RegularisationanddropoutATheoreticallyGroundedApplicationofDropoutinRecurrentNeuralNetworks.GalandGhahramani,NIPS2016.Blog:UncertaintyinDeepLearning,YarinGal.OtherstuffRecurrentHighwayNetworks.Zillyetal.,arXiv2016.CapacityandTrainabilityinRecurrentNeuralNetworks.Collinsetal.,arXiv2016.7.Lecture5-TextClassification[KarlMoritzHermann]Thislecturediscussestextclassification,beginningwithbasicclassifiers,suchasNaiveBayes,andprogressingthroughtoRNNsandConvolutionNetworks.
[slides][video]
ReadingRecurrentConvolutionalNeuralNetworksforTextClassification.Laietal.AAAI2015.AConvolutionalNeuralNetworkforModellingSentences,Kalchbrenneretal.ACL2014.Semanticcompositionalitythroughrecursivematrix-vector,Socheretal.EMNLP2012.Blog:UnderstandingConvolutionNeuralNetworksForNLP,DennyBritz.Thesis:DistributionalRepresentationsforCompositionalSemantics,Hermann(2014).8.Lecture6-DeepNLPonNvidiaGPUs[JeremyAppleyard]ThislectureintroducesGraphicalProcessingUnits(GPUs)asanalternativetoCPUsforexecutingDeepLearningalgorithms.ThestrengthsandweaknessesofGPUsarediscussedaswellastheimportanceofunderstandinghowmemorybandwidthandcomputationimpactthroughputforRNNs.
[slides][video]
ReadingOptimizingPerformanceofRecurrentNeuralNetworksonGPUs.Appleyardetal.,arXiv2016.PersistentRNNs:StashingRecurrentWeightsOn-Chip,Diamosetal.,ICML2016EfficientsoftmaxapproximationforGPUs.Graveetal.,arXiv2016.9.Lecture7-ConditionalLanguageModels[ChrisDyer]Inthislectureweextendtheconceptoflanguagemodellingtoincorporatepriorinformation.ByconditioninganRNNlanguagemodelonaninputrepresentationwecangeneratecontextuallyrelevantlanguage.Thisverygeneralideacanbeappliedtotransducesequencesintonewsequencesfortaskssuchastranslationandsummarisation,orimagesintocaptionsdescribingtheircontent.
[slides][video]
ReadingRecurrentContinuousTranslationModels.KalchbrennerandBlunsom,EMNLP2013SequencetoSequenceLearningwithNeuralNetworks.Sutskeveretal.,NIPS2014MultimodalNeuralLanguageModels.Kirosetal.,ICML2014ShowandTell:ANeuralImageCaptionGenerator.Vinyalsetal.,CVPR201510.Lecture8-GeneratingLanguagewithAttention[ChrisDyer]ThislectureintroducesoneofthemostimportantandinfluencialmechanismsemployedinDeepNeuralNetworks:Attention.AttentionaugmentsrecurrentnetworkswiththeabilitytoconditiononspecificpartsoftheinputandiskeytoachievinghighperformanceintaskssuchasMachineTranslationandImageCaptioning.
[slides][video]
ReadingNeuralMachineTranslationbyJointlyLearningtoAlignandTranslate.Bahdanauetal.,ICLR2015Show,Attend,andTell:NeuralImageCaptionGenerationwithVisualAttention.Xuetal.,ICML2015Incorporatingstructuralalignmentbiasesintoanattentionalneuraltranslationmodel.Cohnetal.,NAACL2016BLEU:aMethodforAutomaticEvaluationofMachineTranslation.Papinenietal,ACL200211.Lecture9-SpeechRecognition(ASR)[AndrewSenior]AutomaticSpeechRecognition(ASR)isthetaskoftransducingrawaudiosignalsofspokenlanguageintotexttranscriptions.ThistalkcoversthehistoryofASRmodels,fromGaussianMixturestoattentionaugmentedRNNs,thebasiclinguisticsofspeech,andthevariousinputandoutputrepresentationsfrequentlyemployed.
[slides][video]
12.Lecture10-TexttoSpeech(TTS)[AndrewSenior]Thislectureintroducesalgorithmsforconvertingwrittenlanguageintospokenlanguage(TexttoSpeech).TTSistheinverseprocesstoASR,buttherearesomeimportantdifferencesinthemodelsapplied.HerewereviewtraditionalTTSmodels,andthencovermorerecentneuralapproachessuchasDeepMind'sWaveNetmodel.
[slides][video]
13.Lecture11-QuestionAnswering[KarlMoritzHermann][slides][video]
ReadingTeachingmachinestoreadandcomprehend.Hermannetal.,NIPS2015DeepLearningforAnswerSentenceSelection.Yuetal.,NIPSDeepLearningWorkshop201414.Lecture12-Memory[EdGrefenstette][slides][video]
ReadingHybridcomputingusinganeuralnetworkwithdynamicexternalmemory.Gravesetal.,Nature2016ReasoningaboutEntailmentwithNeuralAttention.Rocktäscheletal.,ICLR2016Learningtotransducewithunboundedmemory.Grefenstetteetal.,NIPS2015End-to-EndMemoryNetworks.Sukhbaataretal.,NIPS201515.Lecture13-LinguisticKnowledgeinNeuralNetworks[slides][video]
PiazzaWewillbeusingPiazzatofacilitateclassdiscussionduringthecourse.Ratherthanemailingquestionsdirectly,IencourageyoutopostyourquestionsonPiazzatobeansweredbyyourfellowstudents,instructors,andlecturers.Howeverdopleasedonotethatallthelecturersforthiscoursearevolunteeringtheirtimeandmaynotalwaysbeavailabletogivearesponse.
Findourclasspageat:https://piazza.com/ox.ac.uk/winter2017/dnlpht2017/home
AssessmentTheprimaryassessmentforthiscoursewillbeatake-homeassignmentissuedattheendoftheterm.Thisassignmentwillaskquestionsdrawingontheconceptsandmodelsdiscussedinthecourse,aswellasfromselectedresearchpublications.Thenatureofthequestionswillincludeanalysingmathematicaldescriptionsofmodelsandproposingextensions,improvements,orevaluationstosuchmodels.Theassignmentmayalsoaskstudentstoreadspecificresearchpublicationsanddiscusstheirproposedalgorithmsinthecontextofthecourse.Inansweringquestionsstudentswillbeexpectedtobothpresentcoherentwrittenargumentsanduseappropriatemathematicalformulae,andpossiblypseudo-code,toillustrateanswers.
Thepracticalcomponentofthecoursewillbeassessedintheusualway.
AcknowledgementsThiscoursewouldnothavebeenpossiblewithoutthesupportofDeepMind,TheUniversityofOxfordDepartmentofComputerScience,Nvidia,andthegenerousdonationofGPUresourcesfromMicrosoftAzure.
评论