KeystoneML是一个用Scala编写的软件框架,来自伯克利大学AMPLab实验室。该项目主要目的是简化构造大规模、端到端的机器学习管道,基于ApacheSpark构建。
示例代码:
val trainData = NewsGroupsDataLoader(sc, trainingDir)val predictor = Trim.then(LowerCase()) .then(Tokenizer()) .then(new NGramsFeaturizer(1 to conf.nGrams)) .then(TermFrequency(x => 1)) .thenEstimator(CommonSparseFeatures(conf.commonFeatures)) .fit(trainData.data) .thenLabelEstimator(NaiveBayesEstimator(numClasses)) .fit(trainData.data, trainData.labels) .then(MaxClassifier)测试:
val test = NewsGroupsDataLoader(sc, testingDir)val predictions = predictor(test.data)val eval = MulticlassClassifierEvaluator(predictions, test.labels, numClasses)println(eval.summary(newsgroupsData.classes))输出:
Avg Accuracy: 0.980Macro Precision:0.816Macro Recall: 0.797Macro F1: 0.797Total Accuracy: 0.804Micro Precision:0.804Micro Recall: 0.804Micro F1: 0.804
评论