问题
I am trying to get entities from a query.
I have a custom NameFinder model.
Queries are like this.
result for roll number 1304510020.
result for roll-number 1304510020.
result for rollnumber 1304510020.
result of rollnumber 1304510020.
result of roll number 1304510020.
result of roll-number 1304510020.
roll number 1304510020 result.
rollnumber 1304510020 result.
roll-number 1304510020 result.
show result of roll number 1304510020.
show result of rollnumber 1304510020.
show result of roll-number 1304510020.
show my result for 1304510020.
result of 1304510020.
This my training code
package nlpParser;
import java.io.BufferedOutputStream;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.nio.charset.Charset;
import opennlp.tools.namefind.NameFinderME;
import opennlp.tools.namefind.NameSample;
import opennlp.tools.namefind.NameSampleDataStream;
import opennlp.tools.namefind.TokenNameFinderFactory;
import opennlp.tools.namefind.TokenNameFinderModel;
import opennlp.tools.util.InputStreamFactory;
import opennlp.tools.util.ObjectStream;
import opennlp.tools.util.PlainTextByLineStream;
import opennlp.tools.util.TrainingParameters;
public class Trainer {
// training data set
static String trainingPath =
"C:\\Users\\MujeebulHasan\\Desktop\\Project\\hbtu\\hbtuaiagent\\Source Code\\parser\\training\\";
public static void main(String[] args) throws IOException {
String[] entities = new String[]{"rollnumber","result"};
String[] pathsOfTraingFile = new String[]{"rollnumber\\rollnumber.train","result\\result.train"};
String[] pathsOfTrainedFile = new String[]{"rollnumber\\rollnumber.bin","result\\result.bin"};
for(int i = 0; i < entities.length; i++){
final int j = i;
InputStreamFactory isf = new InputStreamFactory() {
public InputStream createInputStream() throws IOException {
return new FileInputStream(trainingPath+pathsOfTraingFile[j]);
}
};
Charset charset = Charset.forName("UTF-8");
ObjectStream<String> lineStream = new PlainTextByLineStream(isf, charset);
ObjectStream<NameSample> sampleStream = new NameSampleDataStream(lineStream);
TokenNameFinderModel model;
TokenNameFinderFactory nameFinderFactory = new TokenNameFinderFactory();
try {
model = NameFinderME.train("en", entities[i], sampleStream, TrainingParameters.defaultParams(),
nameFinderFactory);
} finally {
sampleStream.close();
}
BufferedOutputStream modelOut = null;
try {
modelOut = new BufferedOutputStream(new FileOutputStream(trainingPath+pathsOfTrainedFile[i]));
model.serialize(modelOut);
} finally {
if (modelOut != null)
modelOut.close();
}
}
}
}
rollnumber.train
result for roll number <START:rollnumber> 1304510020 <END> .
result for roll-number <START:rollnumber> 1304510020 <END> .
result for rollnumber <START:rollnumber> 1304510020 <END> .
result for roll <START:rollnumber> 1304510020 <END> .
result of rollnumber <START:rollnumber> 1304510020 <END> .
result of roll number <START:rollnumber> 1304510020 <END> .
result of roll-number <START:rollnumber> 1304510020 <END> .
result of roll <START:rollnumber> 1304510020 <END> .
roll number <START:rollnumber> 1304510020 <END> result.
rollnumber <START:rollnumber> 1304510020 <END> result.
roll-number <START:rollnumber> 1304510020 <END> result.
roll <START:rollnumber> 1304510020 <END> result.
show result of roll number <START:rollnumber> 1304510020 <END> .
show result of rollnumber <START:rollnumber> 1304510020 <END> .
show result of roll-number <START:rollnumber> 1304510020 <END> .
show result of roll <START:rollnumber> 1304510020 <END> .
show my result for <START:rollnumber> 1304510020 <END> .
result of <START:rollnumber> 1304510020 <END> .
result for <START:rollnumber> 1304510020 <END> .
what is my result for rollnumber <START:rollnumber> 1304510020 <END> .
what is my result of rollnumber <START:rollnumber> 1304510020 <END> .
what is my result for roll <START:rollnumber> 1304510020 <END> .
result.train
<START:result> result <END> for roll number 1304510020.
<START:result> result <END> for roll-number 1304510020.
<START:result> result <END> for rollnumber 1304510020.
<START:result> result <END> of rollnumber 1304510020.
<START:result> result <END> of roll number 1304510020.
<START:result> result <END> of roll-number 1304510020.
roll number 1304510020 <START:result> result <END> .
rollnumber 1304510020 <START:result> result <END> .
roll-number 1304510020 <START:result> result <END> .
show <START:result> result <END> of roll number 1304510020.
show <START:result> result <END> of rollnumber 1304510020.
show <START:result> result <END> of roll-number 1304510020.
show my <START:result> result <END> for 1304510020.
<START:result> result <END> of 1304510020.
When I test it using this code.
package nlpParser;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.Scanner;
import opennlp.tools.namefind.NameFinderME;
import opennlp.tools.namefind.TokenNameFinderModel;
import opennlp.tools.util.Span;
public class GetEntities {
public static void main(String[] args) throws IOException {
Scanner sc = new Scanner(System.in);
String query ="";
GetEntities obj = new GetEntities();
while((query = sc.nextLine()) != " "){
obj.parse(query);
}
sc.close();
}
public void parse(String query) throws IOException{
String[] entities = new String[]{"rollnumber","result"};
String[] pathsOfTrainedFile = new String[]{"rollnumber\\rollnumber.bin","result\\result.bin"};
for(int i = 0 ; i < entities.length; i++){
//Loading the NER model
InputStream inputStream = new
FileInputStream("C:\\Users\\MujeebulHasan\\Desktop\\Project\\hbtu\\hbtuaiagent\\Source Code\\parser\\training\\"+pathsOfTrainedFile[i]);
TokenNameFinderModel model = new TokenNameFinderModel(inputStream);
//Instantiating the NameFinder class
NameFinderME nameFinder = new NameFinderME(model);
//Finding the names in the sentence
System.out.println("Processing query... ");
System.out.print("Query = "+query);
query = query.replace(".", "");
String[] sentence = query.split(" ");
System.out.println();
System.out.println("RESULT :");
Span nameSpans[] = nameFinder.find(sentence);
//Printing the spans of the names in the sentence
for(Span s: nameSpans) {
System.out.println(s.toString());
System.out.println(sentence[s.getStart()]);
}
}
}
}
It gives following result. Which are wrong some times.
result of roll number 1304510020
Processing query...
Query = result of roll number 1304510020
RESULT :
Processing query...
Query = result of roll number 1304510020
RESULT :
[0..1) result
result
show result for roll number 1304510020
Processing query...
Query = show result for roll number 1304510020
RESULT :
Processing query...
Query = show result for roll number 1304510020
RESULT :
[1..2) result
result
result for rollnumber 1304510020
Processing query...
Query = result for rollnumber 1304510020
RESULT :
[3..4) rollnumber
1304510020
Processing query...
Query = result for rollnumber 1304510020
RESULT :
[0..1) result
result
result 1304510020
Processing query...
Query = result 1304510020
RESULT :
Processing query...
Query = result 1304510020
RESULT :
[0..1) result
result
1304510020 result
Processing query...
Query = 1304510020 result
RESULT :
Processing query...
Query = 1304510020 result
RESULT :
[1..2) result
result
回答1:
This happens. Due to the size of your training data. According to the OpenNLP Documentation, You must have around 15,000 lines in the training data inorder to get good results.
If you don't have enough data, you can simply use Regular Expressions in your case which is a lot easier that all of this.
If you are willing to make a larger training data-set, you can follow this or again use RegEX to tag your very large corpus.
Hope this helps!
来源:https://stackoverflow.com/questions/42039517/how-to-conduct-opennlp-training-for-custom-namefinder-model