问题
I am using Java8 and OpenNLP. I am trying to extract all noun words from sentences.
I have tried this example, but it extracts all noun phrases ("NP"). Does anyone know how I can just extract the individual noun words?
Thanks
回答1:
What have you tried so far? I haven't looked at the example you link to in a lot of detail, but I'm pretty sure that you could get where you want to with modifying that example. In any case, it's not very difficult:
InputStream modelIn = null;
POSModel POSModel = null;
try{
File f = new File("<location to your tagger model here>");
modelIn = new FileInputStream(f);
POSModel = new POSModel(modelIn);
POSTaggerME tagger = new POSTaggerME(POSModel);
SimpleTokenizer tokenizer= new SimpleTokenizer();
String tokens[] = tokenizer.tokenize("This is a sample sentence.");
String[] tagged = tagger.tag(tokens);
for (int i = 0; i < tagged.length; i++){
if (tagged[i].equalsIgnoreCase("nn")){
System.out.println(tokens[i]);
}
}
}
catch(IOException e){
throw new BadRequestException(e.getMessage());
}
You can download the tagger models here: http://opennlp.sourceforge.net/models-1.5/
And I should say that the SimpleTokenizer is deprecated. You may want to look into a bit more sophisticated one, but in my experience, the more fancy ones from OpenNLP are also a lot slower (and in general unacceptably slow for just tokenisation).
来源:https://stackoverflow.com/questions/40603865/java-opennlp-extract-all-nouns-from-a-sentence