匿名用户2021年11月11日
53阅读
开发技术Python
所属分类人工智能、机器学习/深度学习
授权协议Readme

作品详情

clj-ml

AmachinelearninglibraryforClojurebuiltontopofWekaandfriends.

Thislibrary(specifically,somedependencies)requiresJava1.7+.

InstallationInstallingfromClojars[cc.artifice/clj-ml"0.8.5"]InstallingfromMaven

(addClojarsrepository)

<dependency><groupId>cc.artifice</groupId><artifactId>clj-ml</artifactId><version>0.8.5</version></dependency>Supportedalgorithms

Filters

Discretization(supervised,unsupervised,PKI)Nominaltobinary(supervised,unsupervised)NumerictonominalStringtowordvectorAttributemanipulation(reorder,add,removerange,removepercentage,etc.)Resample(supervised,unsupervised)Replacemissingvalueswithmean(numericattributes)ormode(nominalattributes)

Classifiers

k-NearestneighborDecisiontrees:C4.5/J4.8,Boostedstump,Randomforest,Rotationforest,M5PNaiveBayesMultilayerperceptronsSupportvectormachines(grid-basedtraining),SMO,SpegasosRacedIncrementalLogitBoost

Regression

LinearLogisticPaceAdditivegradientboosting

Clusterers

k-MeansCobwebExpectation-maximizationUsage

APIdocumenationcanbefoundhere.

I/Oofdatauser>(use'clj-ml.io)niluser>(defds(load-instances:arff"file:///home/josh/git/clj-ml/iris.arff"))#'user/dsuser>ds#<Instances@relationiris@attributesepallengthnumeric@attributesepalwidthnumeric@attributepetallengthnumeric@attributepetalwidthnumeric@attributeclass{Iris-setosa,Iris-versicolor,Iris-virginica}@data5.1,3.5,1.4,0.2,Iris-setosa4.9,3,1.4,0.2,Iris-setosa4.7,3.2,1.3,0.2,Iris-setosa4.6,3.1,1.5,0.2,Iris-setosa5,3.6,1.4,0.2,Iris-setosa5.4,3.9,1.7,0.4,Iris-setosa4.6,3.4,1.4,0.3,Iris-setosa...user>(defds(load-instances:arff"https://repository.seasr.org/Datasets/UCI/arff/iris.arff"))#'user/dsuser>(save-instances:csv"iris.csv"ds)niluser>(println(slurp"iris.csv"))sepallength,sepalwidth,petallength,petalwidth,class5.1,3.5,1.4,0.2,Iris-setosa4.9,3,1.4,0.2,Iris-setosa4.7,3.2,1.3,0.2,Iris-setosa4.6,3.1,1.5,0.2,Iris-setosa5,3.6,1.4,0.2,Iris-setosa5.4,3.9,1.7,0.4,Iris-setosa4.6,3.4,1.4,0.3,Iris-setosa5,3.4,1.5,0.2,Iris-setosa4.4,2.9,1.4,0.2,Iris-setosa4.9,3.1,1.5,0.1,Iris-setosa5.4,3.7,1.5,0.2,Iris-setosa...user>(defds(load-instances:csv"file:///home/josh/git/clj-ml/iris.csv"))#'user/dsuser>ds#<Instances@relationstream@attributesepallengthnumeric@attributesepalwidthnumeric@attributepetallengthnumeric@attributepetalwidthnumeric@attributeclass{Iris-setosa,Iris-versicolor,Iris-virginica}@data5.1,3.5,1.4,0.2,Iris-setosa4.9,3,1.4,0.2,Iris-setosa4.7,3.2,1.3,0.2,Iris-setosa4.6,3.1,1.5,0.2,Iris-setosa5,3.6,1.4,0.2,Iris-setosa5.4,3.9,1.7,0.4,Iris-setosa4.6,3.4,1.4,0.3,Iris-setosa5,3.4,1.5,0.2,Iris-setosaWorkingwithdatasetsuser>(use'clj-ml.data)niluser>(defds(make-dataset"my-name"[:length:width{:stylenil}{:kind[:good:bad]}][[1224"longish":good][85"shortish":bad]]))#'user/dsuser>ds#<ClojureInstances@relationmy-name@attributelengthnumeric@attributewidthnumeric@attributestylestring@attributekind{good,bad}@data12,24,longish,good8,5,shortish,bad>user>(dataset-seqds)(#<Instance12,24,longish,good>#<Instance8,5,shortish,bad>)user>(mapinstance-to-map(dataset-seqds))({:kind:good,:style"longish",:width24.0,:length12.0}{:kind:bad,:style"shortish",:width5.0,:length8.0})user>(mapinstance-to-vector(dataset-seqds))([12.024.0"longish":good][8.05.0"shortish":bad])Filteringdatasetsuser>(use'clj-ml.filters'clj-ml.io)niluser>(defds(load-instances:csv"file:///home/josh/git/clj-ml/iris.csv"))#'user/dsuser>(defdiscretize(make-filter:unsupervised-discretize{:dataset-formatds:attributes[:sepallength:petallength]}))#'user/discretizeuser>(deffiltered-ds(filter-applydiscretizeds))#'user/filtered-dsuser>(mapinstance-to-map(dataset-seqfiltered-ds))({:class:Iris-setosa,:petalwidth0.2,:petallength:'(-inf-1.59]',:sepalwidth3.5,:sepallength:'(5.02-5.38]'}{:class:Iris-setosa,:petalwidth0.2,:petallength:'(-inf-1.59]',:sepalwidth3.0,:sepallength:'(4.66-5.02]'}{:class:Iris-setosa,:petalwidth0.2,:petallength:'(-inf-1.59]',:sepalwidth3.2,:sepallength:'(4.66-5.02]'}{:class:Iris-setosa,:petalwidth0.2,:petallength:'(-inf-1.59]',:sepalwidth3.1,:sepallength:'(-inf-4.66]'}{:class:Iris-setosa,:petalwidth0.2,:petallength:'(-inf-1.59]',:sepalwidth3.6,:sepallength:'(4.66-5.02]'}...);;thepetallengthandsepallengthattributesarenownominal

Equivalently,

user>(deffiltered-ds(->>"file:///home/josh/git/clj-ml/iris.csv"(load-instances:csv)(make-apply-filter:unsupervised-discretize{:attributes[:sepallength:petallength]})))Usingclassifiersuser>(use'clj-ml.classifiers'clj-ml.data'clj-ml.utils)niluser>(defds(->(load-instances:arff"file:///home/josh/git/clj-ml/iris.arff")(dataset-set-class:class)))#'user/dsuser>(defclassifier(->(make-classifier:decision-tree:c45)(classifier-trainds)))#'user/classifieruser>(definstance(->(first(dataset-seqds))(instance-set-class-missing)))user>(classifier-classifyclassifierinstance):Iris-setosa

Evaluation:

user>(defevaluation(classifier-evaluateclassifier:cross-validationds10))#'user/evaluationuser>(clojure.pprint/pprint(dissocevaluation:summary:confusion-matrix)){:incorrect7.0,:root-relative-squared-error36.693518966642074,:sf-entropy-gain-4076.3670930399717,:recall{:Iris-setosa0.9795918367346939,:Iris-versicolor0.94,:Iris-virginica0.94},:kb-information217.7935138195151,:kb-relative-information13741.240800360849,:false-positive-rate{:Iris-setosa0.0,:Iris-versicolor0.04040404040404041,:Iris-virginica0.030303030303030304},:percentage-correct95.30201342281879,:roc-area{:Iris-setosa0.984845423317842,:Iris-versicolor0.9456,:Iris-virginica0.9496},:kb-mean-information1.4617014350303028,:percentage-unclassified0.0,:percentage-incorrect4.697986577181208,:root-mean-squared-error0.17297908222448935,:unclassified0.0,:correlation-coefficient{:nan"Can'tcomputecorrelationcoefficient:classisnominal!"},:correct142.0,:sf-mean-entropy-gain-27.358168409664238,:mean-absolute-error0.04083212821368881,:relative-absolute-error9.187228848079984,:error-rate0.04697986577181208,:kappa0.9295222650179066,:f-measure{:Iris-setosa0.9896907216494846,:Iris-versicolor0.9306930693069307,:Iris-virginica0.94},:false-negative-rate{:Iris-setosa0.02040816326530612,:Iris-versicolor0.06,:Iris-virginica0.06},:evaluation-object#<Evaluationweka.classifiers.Evaluation@6a7272ca>,:average-cost0.0,:precision{:Iris-setosa1.0,:Iris-versicolor0.9215686274509803,:Iris-virginica0.94}}user>(println(:summaryevaluation))CorrectlyClassifiedInstances14295.302%IncorrectlyClassifiedInstances74.698%Kappastatistic0.9295Meanabsoluteerror0.0408Rootmeansquarederror0.173Relativeabsoluteerror9.1872%Rootrelativesquarederror36.6935%TotalNumberofInstances149IgnoredClassUnknownInstances1niluser>(println(:confusion-matrixevaluation))===ConfusionMatrix===abc<--classifiedas4810|a=Iris-setosa0473|b=Iris-versicolor0347|c=Iris-virginicanil

Savingandrestoring(trained)classifiers:

user>(serialize-to-fileclassifier"my-classifier.bin")"my-classifier.bin"user>(defclassifier2(deserialize-from-file"my-classifier.bin"))#'user/classifier2user>(classifier-classifyclassifier2instance):Iris-setosa

Textdocumenthandling:

user>(defdocs[{:id10:title"Documenttitle1":fulltext"Thisisthefulltext...":has-class?false}{:id11:title"Anotherdocumenttitle":fulltext"Somemore\"fulltext\";rabbitartificialmachinebananas":has-class?true}])#'user/docsuser>(docs-to-datasetdocs"bananas-model""my-models":stemmertrue:lowercasefalse)#<Instances@relation'docs-weka.filters.unsupervised.attribute.StringToWordVector...'@attributeclass{no,yes}@attributetitle-1numeric@attributetitle-Anothernumeric@attributetitle-Documentnumeric@attributetitle-documentnumeric@attributetitle-titlnumeric@attributefulltext-Somenumeric@attributefulltext-Thisnumeric@attributefulltext-artificinumeric@attributefulltext-banananumeric@attributefulltext-fulltextnumeric@attributefulltext-isnumeric@attributefulltext-machinnumeric@attributefulltext-morenumeric@attributefulltext-rabbitnumeric@attributefulltext-thenumeric@data{0yes,10.480453,30.480453,70.480453,110.480453,150.480453}{20.480453,40.480453,60.480453,80.480453,90.480453,120.480453,130.480453,140.480453}>user>

Wordsappearinginthedatasetwillonlybethoseappearinginthedocuments(orasubset;bydefault,themostcommon1000words).Thispresentsaproblemwhennewdocumentsareloadedandusedinaclassifiertrainedonotherdocuments.Theclassifierwillnotknowhowtohandlewordattributesthatwerenotpresentinthetrainingset.

Thedocs-to-datasetfunctionprovidestheabilitytosavethetrainingdocumentsdatasetand"filter"thetestingdocumentsthroughthisdatasettoensurethesamewordattributesareextractedforbothsets.Thefollowingexampleshowsthatthewords"foo,bar,baz,quux"areignoredinthenew(testing)documents,andalltheoriginalattributesinthetrainingdatasetareretained.

user>(docs-to-datasetdocs"Topic""Sports"1"/tmp":stemmertrue:lowercasefalse:trainingtrue)#<Instances@relation'docs-weka.filters.unsupervised.attribute.StringToWordVector...'@attributeclass{no,yes}@attributetitle-1numeric@attributetitle-Anothernumeric@attributetitle-Documentnumeric@attributetitle-documentnumeric@attributetitle-titlnumeric@attributefulltext-Somenumeric@attributefulltext-Thisnumeric@attributefulltext-artificinumeric@attributefulltext-banananumeric@attributefulltext-fulltextnumeric@attributefulltext-isnumeric@attributefulltext-machinnumeric@attributefulltext-morenumeric@attributefulltext-rabbitnumeric@attributefulltext-thenumeric@data{20.480453,40.480453,60.480453,80.480453,90.480453,120.480453,130.480453,140.480453}{0yes,10.480453,30.480453,70.480453,110.480453,150.480453}>user>(defdocs2[{:title"Documenttitle1foobar":fulltext"bazrabbitquux":terms{"Topic"["Sports"]}}])#'user/docs2user>(docs-to-datasetdocs2"Topic""Sports"1"/tmp":stemmertrue:lowercasefalse:testingtrue)#<Instances@relation'docs-weka.filters.unsupervised.attribute.StringToWordVector...'@attributeclass{no,yes}@attributetitle-1numeric@attributetitle-Anothernumeric@attributetitle-Documentnumeric@attributetitle-documentnumeric@attributetitle-titlnumeric@attributefulltext-Somenumeric@attributefulltext-Thisnumeric@attributefulltext-artificinumeric@attributefulltext-banananumeric@attributefulltext-fulltextnumeric@attributefulltext-isnumeric@attributefulltext-machinnumeric@attributefulltext-morenumeric@attributefulltext-rabbitnumeric@attributefulltext-thenumeric@data{0yes,10.480453,30.480453,140.480453}>user>Usingclusterersuser>(use'clj-ml.clusterers)niluser>(defds(->(load-instances:arff"file:///home/josh/git/clj-ml/iris.arff")(dataset-remove-attribute-at4)))#'user/dsuser>ds#<Instances@relationiris@attributesepallengthnumeric@attributesepalwidthnumeric@attributepetallengthnumeric@attributepetalwidthnumeric@data5.1,3.5,1.4,0.24.9,3,1.4,0.24.7,3.2,1.3,0.24.6,3.1,1.5,0.25,3.6,1.4,0.25.4,3.9,1.7,0.44.6,3.4,1.4,0.3...user>(defclusterer(make-clusterer:k-means{:number-clusters3}))#'user/clustereruser>(clusterer-buildclustererds)niluser>clusterer#<SimpleKMeanskMeans======Numberofiterations:6Withinclustersumofsquarederrors:6.998114004826762Missingvaluesgloballyreplacedwithmean/modeClustercentroids:Cluster#AttributeFullData012(150)(61)(50)(39)=========================================================sepallength5.84335.88855.0066.8462sepalwidth3.0542.73773.4183.0821petallength3.75874.39671.4645.7026petalwidth1.19871.4180.2442.0795>user>(defclustered-ds(clusterer-clusterclustererds))#'user/clustered-dsuser>clustered-ds#<ClojureInstances@relation'clusterediris'@attributesepallengthnumeric@attributesepalwidthnumeric@attributepetallengthnumeric@attributepetalwidthnumeric@attributeclass{0,1,2}@data5.1,3.5,1.4,0.2,14.9,3,1.4,0.2,14.7,3.2,1.3,0.2,14.6,3.1,1.5,0.2,15,3.6,1.4,0.2,15.4,3.9,1.7,0.4,14.6,3.4,1.4,0.3,15,3.4,1.5,0.2,14.4,2.9,1.4,0.2,1...Example:Homepriceprediction

https://www.ibm.com/developerworks/library/os-weka1/

user>(defhomes(make-dataset"homes"[:house-size:lot-size:bedrooms:granite:bathroom:sellingPrice][[3529,9191,6,0,0,205000][3247,10061,5,1,1,224900][4032,10150,5,0,1,197900][2397,14156,4,1,0,189900][2200,9600,4,0,1,195000][3536,19994,6,1,1,325000][2983,9365,5,0,1,230000]]))#'user/homesuser>(defhomes(dataset-set-classhomes:sellingPrice))#'user/homesuser>homes#<ClojureInstances@relationhomes@attributehouse-sizenumeric@attributelot-sizenumeric@attributebedroomsnumeric@attributegranitenumeric@attributebathroomnumeric@attributesellingPricenumeric@data3529,9191,6,0,0,2050003247,10061,5,1,1,2249004032,10150,5,0,1,1979002397,14156,4,1,0,1899002200,9600,4,0,1,1950003536,19994,6,1,1,3250002983,9365,5,0,1,230000>user>(defreg(classifier-train(make-classifier:regression:linear)homes))#'user/reguser>reg#<LinearRegressionLinearRegressionModelsellingPrice=-26.6882*house-size+7.0551*lot-size+43166.0767*bedrooms+42292.0901*bathroom+-21661.1208>user>user>(classifier-predict-numericreg(make-instancehomes[3198,9669,5,1,1,nil]))219328.35717359098Example:PredictingsurvivalontheTitanic

https://www.kaggle.com/c/titanic-gettingStarted

Firstgloballyreplacealldoublequotedstrings""foo""withbackslashquotedstrings:\"foo\".Wekadoesnothandletheformer.

声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论