Extract words out of a text file

后端 未结 5 1098
春和景丽
春和景丽 2020-12-28 20:37

Let\'s say you have a text file like this one: http://www.gutenberg.org/files/17921/17921-8.txt

Does anyone has a good algorithm, or open-source code, to extract wor

5条回答
  •  野趣味
    野趣味 (楼主)
    2020-12-28 21:28

    This sounds like the right job for regular expressions. Here is some Java code to give you an idea, in case you don't know how to start:

    String input = "Input text, with words, punctuation, etc. Well, it's rather short.";
    Pattern p = Pattern.compile("[\\w']+");
    Matcher m = p.matcher(input);
    
    while ( m.find() ) {
        System.out.println(input.substring(m.start(), m.end()));
    }
    

    The pattern [\w']+ matches all word characters, and the apostrophe, multiple times. The example string would be printed word-by-word. Have a look at the Java Pattern class documentation to read more.

提交回复
热议问题