匿名用户2021年11月11日
30阅读
开发技术Python
所属分类人工智能、机器学习/深度学习
授权协议Readme

作品详情

MITIE:MITInformationExtraction

Thisprojectprovidesfree(evenforcommercialuse)state-of-the-artinformationextractiontools.Thecurrentreleaseincludestoolsforperformingnamedentityextractionandbinaryrelationdetectionaswellastoolsfortrainingcustomextractorsandrelationdetectors.

MITIEisbuiltontopofdlib,ahigh-performancemachine-learninglibrary[1],MITIEmakesuseofseveralstate-of-the-arttechniquesincludingtheuseofdistributionalwordembeddings[2]andStructuralSupportVectorMachines[3].MITIEoffersseveralpre-trainedmodelsprovidingvaryinglevelsofsupportforbothEnglish,Spanish,andGermantrainedusingavarietyoflinguisticresources(e.g.,CoNLL2003,ACE,Wikipedia,Freebase,andGigaword).ThecoreMITIEsoftwareiswritteninC++,butbindingsforseveralothersoftwarelanguagesincludingPython,R,Java,C,andMATLABallowausertoquicklyintegrateMITIEintohis/herownapplications.

OutsideprojectshavecreatedAPIbindingsforOCaml,.NET,.NETCore,andRuby.ThereisalsoaninteractivetoolforlabelingdataandtrainingMITIE.

UsingMITIE

MITIE'sprimaryAPIisaCAPIwhichisdocumentedinthemitie.hheaderfile.Beyondthis,therearemanyexampleprogramsshowinghowtouseMITIEfromC,C++,Java,R,orPython2.7.

InitialSetup

Beforeyoucanruntheprovidedexamplesyouwillneedtodownloadthetrainedmodelfileswhichyoucandobyrunning:

makeMITIE-models

orbysimplydownloadingtheMITIE-models-v0.2.tar.bz2fileandextractingitinyourMITIEfolder.NotethattheSpanishandGermanmodelsaresuppliedinseparatedownloads.SoifyouwanttousetheSpanishNERmodelthendownloadMITIE-models-v0.2-Spanish.zipandextractitintoyourMITIEfolder.SimilarlyfortheGermanmodel:MITIE-models-v0.2-German.tar.bz2

UsingMITIEfromthecommandline

MITIEcomeswithabasicstreamingNERtool.SoyoucantellMITIEtoprocesseachlineofatextfileindependentlyandoutputmarkeduptextwiththecommand:

catsample_text.txt|./ner_streamMITIE-models/english/ner_model.dat

Thener_streamexecutablecanbecompiledbyrunningmakeinthetoplevelMITIEfolderorbynavigatingtothetools/ner_streamfolderandrunningmakeorusingCMaketobuilditwhichcanbedonewiththefollowingcommands:

cdtools/ner_streammkdirbuildcdbuildcmake..cmake--build.--configReleaseCompilingMITIEasasharedlibrary

OnaUNIXlikesystem,thiscanbeaccomplishedbyrunningmakeinthetoplevelMITIEfolderorbyrunning:

cdmitielibmake

Thisproducessharedandstaticlibraryfilesinthemitielibfolder.OryoucanuseCMaketocompileasharedlibrarybytyping:

cdmitielibmkdirbuildcdbuildcmake..cmake--build.--configRelease--targetinstall

EitherofthesemethodswillcreateaMITIEsharedlibraryinthemitielibfolder.

CompilingMITIEusingOpenBLAS

IfyoucompileMITIEusingcmakethenitwillautomaticallyfindanduseanyoptimizedBLASlibrariesonyourmachine.However,ifyoucompileusingregularmakethenyouhavetomanuallylocateyourBLASlibariesorDLIBwilldefaulttoitsbuiltin,butslower,BLASimplementation.Therefore,touseOpenBLASwhencompilingwithoutcmake,locatelibopenblas.aandlibgfortran.a,thenrunmakeasfollows:

cdmitielibmakeBLAS_PATH=/path/to/openblas.aLIBGFORTRAN_PATH=/path/to/libfortran.a

NotethatifyourBLASlibrariesarenotinstandardlocationscmakewillfailtofindthem.However,youcantellitwhatfoldertolookinbyreplacingcmake..withastatementsuchas:

cmake-DCMAKE_LIBRARY_PATH=/home/me/place/i/put/blas/lib..UsingMITIEfromaPython2.7program

OnceyouhavebuilttheMITIEsharedlibrary,youcangototheexamples/pythonfolderandsimplyrunanyofthePythonscripts.EachscriptisatutorialexplainingsomeaspectofMITIE:namedentityrecognitionandrelationextraction,trainingacustomNERtool,ortrainingacustomrelationextractor.

Youcanalsoinstallmitiedireclyfromgithubwiththiscommand:pipinstallgit+https://github.com/mit-nlp/MITIE.git.

UsingMITIEfromR

MITIEcanbeinstalledasanRpackage.SeetheREADMEformoredetails.

UsingMITIEfromaCprogram

ThereareexampleCprogramsintheexamples/Cfolder.Tocompileofthemyousimplygointothosefoldersandrunmake.OruseCMakelikeso:

cdexamples/C/nermkdirbuildcdbuildcmake..cmake--build.--configReleaseUsingMITIEfromaC++program

ThereareexampleC++programsintheexamples/cppfolder.Tocompileanyofthemyousimplygointothosefoldersandrunmake.OruseCMakelikeso:

cdexamples/cpp/nermkdirbuildcdbuildcmake..cmake--build.--configReleaseUsingMITIEfromaJavaprogram

ThereisanexampleJavaprogramintheexamples/javafolder.BeforeyoucanrunityoumustcompileMITIE'sjavainterfacewhichyoucandolikeso:

cdmitielib/javamkdirbuildcdbuildcmake..cmake--build.--configRelease--targetinstall

Thatwillplaceajavamitiesharedlibraryandjarfileintothemitielibfolder.Onceyouhavethosetwofilesyoucanruntheexampleprograminexamples/javabyrunningrun_ner.batifyouareonWindowsorrun_ner.shifyouareonaPOSIXsystemlikeLinuxorOSX.

AlsonotethatyoumusthaveSwig1.3.40ornewer,CMake2.8.4ornewer,andtheJavaJDKinstalledtocompiletheMITIEinterface.Finally,notethatifyouareusing64bitJavaonWindowsthenyouwillneedtouseacommandlike:

cmake-G"VisualStudio10Win64"..

insteadofcmake..sothatVisualStudioknowstomakea64bitlibrary.

RunningMITIE'sunittests

Youcanrunasimpleregressiontesttovalidateyourbuild.DothisbyrunningthefollowingcommandfromthetoplevelMITIEfolder:

maketest

maketestbuildsboththeexampleprogramsanddownloadsrequiredexamplemodels.Ifyourequireanon-standardC++compiler,changeCCinexamples/C/makefileandintools/ner_stream/makefile.

PrecompiledPython2.7binaries

WehavebuiltPython2.7binariespackagedwithsamplemodelsfor64bitLinuxandWindows(both32and64bitversionofPython).Youcandownloadtheprecompiledpackagehere:PrecompiledMITIE0.2

PrecompiledJava64bitbinaries

WehavebuiltJavabinariesforthe64bitJVMwhichworkonLinuxandWindows.Youcandownloadtheprecompiledpackagehere:PrecompiledJavaMITIE0.3.Inthefileisanexamples/javafolder.Youcanruntheexamplebyexecutingtheprovided.bator.shfile.

CitingMITIE

Thereisn'tanypaperspecificallyaboutMITIE.However,sinceMITIEisbasicallyjustathinwrapperarounddlibpleasecitedlib'sJMLRpaperifyouuseMITIEinyourresearch:

DavisE.King.Dlib-ml:AMachineLearningToolkit.JournalofMachineLearningResearch10,pp.1755-1758,2009@Article{dlib09,author={DavisE.King},title={Dlib-ml:AMachineLearningToolkit},journal={JournalofMachineLearningResearch},year={2009},volume={10},pages={1755-1758},}License

MITIEislicensedundertheBoostSoftwareLicense-Version1.0-August17th,2003.

Permissionisherebygranted,freeofcharge,toanypersonororganizationobtainingacopyofthesoftwareandaccompanyingdocumentationcoveredbythislicense(the"Software")touse,reproduce,display,distribute,execute,andtransmittheSoftware,andtopreparederivativeworksoftheSoftware,andtopermitthird-partiestowhomtheSoftwareisfurnishedtodoso,allsubjecttothefollowing:

ThecopyrightnoticesintheSoftwareandthisentirestatement,includingtheabovelicensegrant,thisrestrictionandthefollowingdisclaimer,mustbeincludedinallcopiesoftheSoftware,inwholeorinpart,andallderivativeworksoftheSoftware,unlesssuchcopiesorderivativeworksaresolelyintheformofmachine-executableobjectcodegeneratedbyasourcelanguageprocessor.

THESOFTWAREISPROVIDED"ASIS",WITHOUTWARRANTYOFANYKIND,EXPRESSORIMPLIED,INCLUDINGBUTNOTLIMITEDTOTHEWARRANTIESOFMERCHANTABILITY,FITNESSFORAPARTICULARPURPOSE,TITLEANDNON-INFRINGEMENT.INNOEVENTSHALLTHECOPYRIGHTHOLDERSORANYONEDISTRIBUTINGTHESOFTWAREBELIABLEFORANYDAMAGESOROTHERLIABILITY,WHETHERINCONTRACT,TORTOROTHERWISE,ARISINGFROM,OUTOFORINCONNECTIONWITHTHESOFTWAREORTHEUSEOROTHERDEALINGSINTHESOFTWARE.

References

[1]DavisE.King.Dlib-ml:AMachineLearningToolkit.JournalofMachineLearningResearch10,pp.1755-1758,2009.

[2]ParamveerDhillon,DeanFosterandLyleUngar,Eigenwords:SpectralWordEmbeddings,JournalofMachineLearningResearch(JMLR),16,2015.

[3]T.Joachims,T.Finley,Chun-NamYu,Cutting-PlaneTrainingofStructuralSVMs,MachineLearning,77(1):27-59,2009.

声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论