stringmetric是Scala的字符串相似性度量算法的库。(如:Dice/Sorensen,Hamming,Jaccard,Jaro,Jaro-Winkler,Levenshtein,Metaphone,N-Gram,NYSIIS,Overlap,Ratcliff/Obershelp,RefinedNYSIIS,RefinedSoundex,Soundex,WeightedLevenshtein)
这个库提供了一些工具来进行字符串相似度匹配,用来测量字符串的相似性与距离,通过单词的发音和声音的相似性比较来索引,除核心库之外,每个度量和算法都有一个命令行界面。
要求:Scala2.10+
文档:Scaladoc
问题:Enhancements, Questions, Bugs
版本:SemanticVersioningv2.0
依赖SBT:
libraryDependencies += "com.rockymadden.stringmetric" %% "stringmetric-core" % "0.27.4"Gradle:
compile 'com.rockymadden.stringmetric:stringmetric-core_2.10:0.27.4'Maven:
<dependency> <groupId>com.rockymadden.stringmetric</groupId> <artifactId>stringmetric-core_2.10</artifactId> <version>0.27.4</version></dependency>创建CLIs$ git clone https://github.com/rockymadden/stringmetric.git$ cd stringmetric$ sbt clean package$ ./project/build.sh$ ./target/cli/jarometric abc xyz使用CLIs获取帮助
$ metaphonemetric --helpCompares two strings to determine if they are phonetically similarly, per the Metaphone algorithm.Syntax: metaphonemetric [Options] string1 string2...Options: -h, --help Outputs description, syntax, and options.获取度量的比较值
$ jarowinklermetric dog dawg0.75获取差异算法的代表值
$ metaphonealgorithm dogtk
评论