发表新帖

发表新帖

Extract words out of a text file

后端未结

关注

 5  1098

春和景丽 2020-12-28 20:37

Let\'s say you have a text file like this one: http://www.gutenberg.org/files/17921/17921-8.txt

Does anyone has a good algorithm, or open-source code, to extract wor

5条回答

野趣味 (楼主)

2020-12-28 21:28
This sounds like the right job for regular expressions. Here is some Java code to give you an idea, in case you don't know how to start:
```
String input = "Input text, with words, punctuation, etc. Well, it's rather short.";
Pattern p = Pattern.compile("[\\w']+");
Matcher m = p.matcher(input);

while ( m.find() ) {
    System.out.println(input.substring(m.start(), m.end()));
}
```
The pattern [\w']+ matches all word characters, and the apostrophe, multiple times. The example string would be printed word-by-word. Have a look at the Java Pattern class documentation to read more.
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...

热议问题