Prose是一款用于Go语言的文本处理库(主要是英文),支持okenization(分词)、part-of-speechtagging(词性标注)、named-entityextraction(命名实体提取)等。
安装$ go get github.com/jdkato/prose/...使用Tokenizing
Tagging
Transforming
Summarizing
Chunking
License
Tokenizing
单词、句子和regexptokenizer可用。每个分词器实现相同的接口,这样可以轻松地在库的其他部分中自定义词语切分。
package mainimport ( "fmt" "github.com/jdkato/prose/tokenize")func main() { text := "They'll save and invest more." tokenizer := tokenize.NewTreebankWordTokenizer() for _, word := range tokenizer.Tokenize(text) { // [They 'll save and invest more .] fmt.Println(word) }}
评论