问题
I want to count the numbers of pp/np/vp in the text but I don't know how to identify PP-tags/NP-tags/VP-tags in openNLP chunker? I have tried this code but it's not working.
ChunkerModel cModel = new ChunkerModel(modelIn);
ChunkerME chunkerME = new ChunkerME(cModel);
String result[] = chunkerME.chunk(whitespaceTokenizerLine, tags);
HashMap<Integer,String> phraseLablesMap = new HashMap<Integer, String>();
Integer wordCount = 1;
Integer phLableCount = 0;
for (String phLable : result) {
if(phLable.equals("O")) phLable += "-Punctuation"; //The phLable of the last word is OP
if(phLable.split("-")[0].equals("B")) phLableCount++;
phLable = phLable.split("-")[1] + phLableCount;
System.out.println(wordCount + ":" + phLable);
phraseLablesMap.put(wordCount, phLable);
wordCount++;
}
Integer noPP=0;
Integer TotalPP=0;
for (String PPattach: result) {
if (PPattach.equals("PP")) {
for (int i=0;i<result.length;i++)
TotalPP = noPP +1;
}
}
System.out.println(TotalPP);
Output:
1:NP1
2:VP2
3:NP3
4:NP3
5:VP4
6:PP5
7:NP6
8:NP6
9:NP6
10:NP6
11:PP7
12:NP8
13:NP8
14:NP8
15:PP9
16:NP10
17:NP10
18:PP11
19:NP12
20:NP12
21:VP13
22:VP13
23:NP14
24:NP14
25:PP15
26:NP16
27:NP16
28:Punctuation16
0
回答1:
best way is by using the span objects, they have a getType() method that returns the chunk type.
see this post
grouping all Named entities in a Document
来源:https://stackoverflow.com/questions/17035913/how-to-identify-pp-tags-np-tags-vp-tags-in-opennlp-chunker