attention_is_all_you_need

我要开发同款
匿名用户2021年11月17日
95阅读
开发技术Python
所属分类人工智能、机器学习/深度学习
授权协议BSD-3-Clause License

作品详情

Transformer-AttentionIsAllYouNeed

Chainer-basedPythonimplementationofTransformer,anattention-basedseq2seqmodelwithoutconvolutionandrecurrence.Ifyouwanttoseethearchitecture,pleaseseenet.py.

See"AttentionIsAllYouNeed",AshishVaswani,NoamShazeer,NikiParmar,JakobUszkoreit,LlionJones,AidanN.Gomez,LukaszKaiser,IlliaPolosukhin,arxiv,2017.

Thisrepositoryispartlyderivedfrommyconvolutionalseq2seqrepo,whichisalsoderivedfromChainer'sofficialseq2seqexample.

RequirementPython3.6.0+Chainer2.0.0+numpy1.12.1+cupy1.0.0+(ifusinggpu)nltkprogressbar(Youcaninstallallthroughpip)andtheirdependenciesPrepareDataset

Youcanuseanyparallelcorpus.Forexample,run

shdownload_wmt.sh

whichdownloadsanddecompressestrainingdatasetanddevelopmentdatasetfromWMT/europalintoyourcurrentdirectory.Thesefilesandtheirpathsaresetintrainingscripttrain.pyasdefault.

HowtoRunPYTHONIOENCODING=utf-8python-utrain.py-g=0-iDATA_DIR-oSAVE_DIR

Duringtraining,logsforloss,perplexity,wordaccuracyandtimeareprintedatacertaininternval,inadditiontovalidationtests(perplexityandBLEUforgeneration)everyhalfepoch.Andalso,generationtestisperformedandprintedforcheckingtrainingprogress.

Arguments

Someofthemisasfollows:

-g:yourgpuid.Ifcpu,set-1.-iDATA_DIR,-sSOURCE,-tTARGET,-svalidSVALID,-tvalidTVALID:DATA_DIRdirectoryneedstoincludeapairoftrainingdatasetSOURCEandTARGETwithapairofvalidationdatasetSVALIDandTVALID.Eachpairshouldbeparallellcorpuswithline-by-linesentencealignment.-oSAVE_DIR:JSONlogreportfileandamodelsnapshotwillbesavedinSAVE_DIRdirectory(ifitdoesnotexist,itwillbeautomaticallymade).-e:maxepochsoftrainingcorpus.-b:minibatchsize.-u:sizeofunitsandwordembeddings.-l:numberoflayersinboththeencoderandthedecoder.--source-vocab:maxsizeofvocabularysetofsourcelanguage--target-vocab:maxsizeofvocabularysetoftargetlanguage

Pleaseseetheothersbypythontrain.py-h.

Note

Thisrepositorydoesnotaimforcompletevalidationofresultsinthepaper,soIhavenoteagerlyconfirmedvalidityofperformance.But,Iexpectmyimplementationisalmostcompatiblewithamodeldescribedinthepaper.SomedifferenceswhereIamawareareasfollows:

Optimization/trainingstrategy.Detailedinformationaboutbatchsize,parameterinitialization,etc.isunclearinthepaper.Additionally,thelearningrateproposedinthepapermayworkonlywithalargebatchsize(e.g.4000)fordeeplayernets.Ichangedwarmup_stepto32000from4000,thoughthereisroomforimprovement.Ialsochangedreluintoleakyreluinfeedforwardnetlayersforeasygradientpropagation.Vocabularyset,dataset,preprocessingandevaluation.Thisrepousesacommonword-basedtokenization,althoughthepaperusesbyte-pairencoding.Sizeoftokensetalsodiffers.Evaluation(validation)islittleunfairandincompatiblewithoneinthepaper,e.g.,evenvalidationsetreplacesunknownwordstoasingle"unk"token.BeamsearchisunusedinBLEUcalculation.Modelsize.Thesettingofamodelinthisrepoisoneof"basemodel"inthepaper,althoughyoucanmodifysomelinesforusing"bigmodel".Thiscodefollowssomesettingsusedintensor2tensorrepository,whichincludesaTransformermodel.Forexample,positionalencodingusedintherepositoryseemstodifferfromoneinthepaper.Thiscodefollowstheformerone.
声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论