Thisprojectprovidesfree(evenforcommercialuse)state-of-the-artinformationextractiontools.Thecurrentreleaseincludestoolsforperformingnamedentityextractionandbinaryrelationdetectionaswellastoolsfortrainingcustomextractorsandrelationdetectors.
MITIEisbuiltontopofdlib,ahigh-performancemachine-learninglibrary[1],MITIEmakesuseofseveralstate-of-the-arttechniquesincludingtheuseofdistributionalwordembeddings[2]andStructuralSupportVectorMachines[3].MITIEoffersseveralpre-trainedmodelsprovidingvaryinglevelsofsupportforbothEnglish,Spanish,andGermantrainedusingavarietyoflinguisticresources(e.g.,CoNLL2003,ACE,Wikipedia,Freebase,andGigaword).ThecoreMITIEsoftwareiswritteninC++,butbindingsforseveralothersoftwarelanguagesincludingPython,R,Java,C,andMATLABallowausertoquicklyintegrateMITIEintohis/herownapplications.
OutsideprojectshavecreatedAPIbindingsforOCaml,.NET,.NETCore,andRuby.ThereisalsoaninteractivetoolforlabelingdataandtrainingMITIE.
UsingMITIEMITIE'sprimaryAPIisaCAPIwhichisdocumentedinthemitie.hheaderfile.Beyondthis,therearemanyexampleprogramsshowinghowtouseMITIEfromC,C++,Java,R,orPython2.7.
InitialSetupBeforeyoucanruntheprovidedexamplesyouwillneedtodownloadthetrainedmodelfileswhichyoucandobyrunning:
makeMITIE-modelsorbysimplydownloadingtheMITIE-models-v0.2.tar.bz2fileandextractingitinyourMITIEfolder.NotethattheSpanishandGermanmodelsaresuppliedinseparatedownloads.SoifyouwanttousetheSpanishNERmodelthendownloadMITIE-models-v0.2-Spanish.zipandextractitintoyourMITIEfolder.SimilarlyfortheGermanmodel:MITIE-models-v0.2-German.tar.bz2
UsingMITIEfromthecommandlineMITIEcomeswithabasicstreamingNERtool.SoyoucantellMITIEtoprocesseachlineofatextfileindependentlyandoutputmarkeduptextwiththecommand:
catsample_text.txt|./ner_streamMITIE-models/english/ner_model.datThener_streamexecutablecanbecompiledbyrunningmakeinthetoplevelMITIEfolderorbynavigatingtothetools/ner_streamfolderandrunningmakeorusingCMaketobuilditwhichcanbedonewiththefollowingcommands:
cdtools/ner_streammkdirbuildcdbuildcmake..cmake--build.--configReleaseCompilingMITIEasasharedlibraryOnaUNIXlikesystem,thiscanbeaccomplishedbyrunningmakeinthetoplevelMITIEfolderorbyrunning:
cdmitielibmakeThisproducessharedandstaticlibraryfilesinthemitielibfolder.OryoucanuseCMaketocompileasharedlibrarybytyping:
cdmitielibmkdirbuildcdbuildcmake..cmake--build.--configRelease--targetinstallEitherofthesemethodswillcreateaMITIEsharedlibraryinthemitielibfolder.
CompilingMITIEusingOpenBLASIfyoucompileMITIEusingcmakethenitwillautomaticallyfindanduseanyoptimizedBLASlibrariesonyourmachine.However,ifyoucompileusingregularmakethenyouhavetomanuallylocateyourBLASlibariesorDLIBwilldefaulttoitsbuiltin,butslower,BLASimplementation.Therefore,touseOpenBLASwhencompilingwithoutcmake,locatelibopenblas.aandlibgfortran.a,thenrunmakeasfollows:
cdmitielibmakeBLAS_PATH=/path/to/openblas.aLIBGFORTRAN_PATH=/path/to/libfortran.aNotethatifyourBLASlibrariesarenotinstandardlocationscmakewillfailtofindthem.However,youcantellitwhatfoldertolookinbyreplacingcmake..withastatementsuchas:
cmake-DCMAKE_LIBRARY_PATH=/home/me/place/i/put/blas/lib..UsingMITIEfromaPython2.7programOnceyouhavebuilttheMITIEsharedlibrary,youcangototheexamples/pythonfolderandsimplyrunanyofthePythonscripts.EachscriptisatutorialexplainingsomeaspectofMITIE:namedentityrecognitionandrelationextraction,trainingacustomNERtool,ortrainingacustomrelationextractor.
Youcanalsoinstallmitiedireclyfromgithubwiththiscommand:pipinstallgit+https://github.com/mit-nlp/MITIE.git.
UsingMITIEfromRMITIEcanbeinstalledasanRpackage.SeetheREADMEformoredetails.
UsingMITIEfromaCprogramThereareexampleCprogramsintheexamples/Cfolder.Tocompileofthemyousimplygointothosefoldersandrunmake.OruseCMakelikeso:
cdexamples/C/nermkdirbuildcdbuildcmake..cmake--build.--configReleaseUsingMITIEfromaC++programThereareexampleC++programsintheexamples/cppfolder.Tocompileanyofthemyousimplygointothosefoldersandrunmake.OruseCMakelikeso:
cdexamples/cpp/nermkdirbuildcdbuildcmake..cmake--build.--configReleaseUsingMITIEfromaJavaprogramThereisanexampleJavaprogramintheexamples/javafolder.BeforeyoucanrunityoumustcompileMITIE'sjavainterfacewhichyoucandolikeso:
cdmitielib/javamkdirbuildcdbuildcmake..cmake--build.--configRelease--targetinstallThatwillplaceajavamitiesharedlibraryandjarfileintothemitielibfolder.Onceyouhavethosetwofilesyoucanruntheexampleprograminexamples/javabyrunningrun_ner.batifyouareonWindowsorrun_ner.shifyouareonaPOSIXsystemlikeLinuxorOSX.
AlsonotethatyoumusthaveSwig1.3.40ornewer,CMake2.8.4ornewer,andtheJavaJDKinstalledtocompiletheMITIEinterface.Finally,notethatifyouareusing64bitJavaonWindowsthenyouwillneedtouseacommandlike:
cmake-G"VisualStudio10Win64"..insteadofcmake..sothatVisualStudioknowstomakea64bitlibrary.
RunningMITIE'sunittestsYoucanrunasimpleregressiontesttovalidateyourbuild.DothisbyrunningthefollowingcommandfromthetoplevelMITIEfolder:
maketestmaketestbuildsboththeexampleprogramsanddownloadsrequiredexamplemodels.Ifyourequireanon-standardC++compiler,changeCCinexamples/C/makefileandintools/ner_stream/makefile.
PrecompiledPython2.7binariesWehavebuiltPython2.7binariespackagedwithsamplemodelsfor64bitLinuxandWindows(both32and64bitversionofPython).Youcandownloadtheprecompiledpackagehere:PrecompiledMITIE0.2
PrecompiledJava64bitbinariesWehavebuiltJavabinariesforthe64bitJVMwhichworkonLinuxandWindows.Youcandownloadtheprecompiledpackagehere:PrecompiledJavaMITIE0.3.Inthefileisanexamples/javafolder.Youcanruntheexamplebyexecutingtheprovided.bator.shfile.
CitingMITIEThereisn'tanypaperspecificallyaboutMITIE.However,sinceMITIEisbasicallyjustathinwrapperarounddlibpleasecitedlib'sJMLRpaperifyouuseMITIEinyourresearch:
DavisE.King.Dlib-ml:AMachineLearningToolkit.JournalofMachineLearningResearch10,pp.1755-1758,2009@Article{dlib09,author={DavisE.King},title={Dlib-ml:AMachineLearningToolkit},journal={JournalofMachineLearningResearch},year={2009},volume={10},pages={1755-1758},}LicenseMITIEislicensedundertheBoostSoftwareLicense-Version1.0-August17th,2003.
Permissionisherebygranted,freeofcharge,toanypersonororganizationobtainingacopyofthesoftwareandaccompanyingdocumentationcoveredbythislicense(the"Software")touse,reproduce,display,distribute,execute,andtransmittheSoftware,andtopreparederivativeworksoftheSoftware,andtopermitthird-partiestowhomtheSoftwareisfurnishedtodoso,allsubjecttothefollowing:
ThecopyrightnoticesintheSoftwareandthisentirestatement,includingtheabovelicensegrant,thisrestrictionandthefollowingdisclaimer,mustbeincludedinallcopiesoftheSoftware,inwholeorinpart,andallderivativeworksoftheSoftware,unlesssuchcopiesorderivativeworksaresolelyintheformofmachine-executableobjectcodegeneratedbyasourcelanguageprocessor.
THESOFTWAREISPROVIDED"ASIS",WITHOUTWARRANTYOFANYKIND,EXPRESSORIMPLIED,INCLUDINGBUTNOTLIMITEDTOTHEWARRANTIESOFMERCHANTABILITY,FITNESSFORAPARTICULARPURPOSE,TITLEANDNON-INFRINGEMENT.INNOEVENTSHALLTHECOPYRIGHTHOLDERSORANYONEDISTRIBUTINGTHESOFTWAREBELIABLEFORANYDAMAGESOROTHERLIABILITY,WHETHERINCONTRACT,TORTOROTHERWISE,ARISINGFROM,OUTOFORINCONNECTIONWITHTHESOFTWAREORTHEUSEOROTHERDEALINGSINTHESOFTWARE.
References[1]DavisE.King.Dlib-ml:AMachineLearningToolkit.JournalofMachineLearningResearch10,pp.1755-1758,2009.
[2]ParamveerDhillon,DeanFosterandLyleUngar,Eigenwords:SpectralWordEmbeddings,JournalofMachineLearningResearch(JMLR),16,2015.
[3]T.Joachims,T.Finley,Chun-NamYu,Cutting-PlaneTrainingofStructuralSVMs,MachineLearning,77(1):27-59,2009.
评论