发表新帖

发表新帖

Are there APIs for text analysis/mining in Java? [closed]

前端未结

关注

 5  1698

迷失自我 2021-01-31 18:18

5条回答

执笔经年 (楼主)

2021-01-31 18:57

For example - you might use some classes from standard library java.text, or use StreamTokenizer (you might customize it according to your requirements). But as you know - text data from internet sources is usually has many orthographical mistakes and for better performance you have to use something like fuzzy tokenizer - java.text and other standart utils has too limited capabilities in such context.

So, I'd advice you to use regular expressions (java.util.regex) and create own kind of tokenizer according to your needs.

P.S. According to your needs - you might create state-machine parser for recognizing templated parts in raw texts. You might see simple state-machine recognizer on the picture below (you can construct more advanced parser, which could recognize much more complex templates in text).

0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...

热议问题