问题
I tried to train a custom model for NER using openNlp. When I pass a sentence to predict the Entity, It just picks the first word of the sentence. Don't know where I am going wrong,.
Please find the training model code below,
public class OpenNLPNER {
public static void main(String[] args) {
train("en", "technology", "D:\\dl4j-examples-master\\dl4j-examples-master\\dl4j-examples\\src\\main\\java\\opennlpExamples\\src\\main\\resources\\technology.train", "D:\\dl4j-examples-master\\dl4j-examples-master\\dl4j-examples\\src\\main\\java\\opennlpExamples\\src\\main\\techno1.bin");
}
public static String train(String lang, String entity, InputStreamFactory inputStream, FileOutputStream modelStream) {
Charset charset = Charset.forName("UTF-8");
TokenNameFinderModel model = null;
ObjectStream<NameSample> sampleStream = null;
try {
ObjectStream<String> lineStream = new PlainTextByLineStream(inputStream, charset);
sampleStream = new NameSampleDataStream(lineStream);
TokenNameFinderFactory nameFinderFactory = new TokenNameFinderFactory();
model = NameFinderME.train("en", "technology", sampleStream, TrainingParameters.defaultParams(),
nameFinderFactory);
} catch (FileNotFoundException fio) {
} catch (IOException io) {
} finally {
try {
sampleStream.close();
} catch (IOException io) {
}
}
BufferedOutputStream modelOut = null;
try {
modelOut = new BufferedOutputStream(modelStream);
model.serialize(modelOut);
} catch (IOException io) {
} finally {
if (modelOut != null) {
try {
modelOut.close();
} catch (IOException io) {
}
}
}
return "Something goes wrong with training module.";
}
public static String train(String lang, String entity, String taggedCoprusFile,
String modelFile) {
try {
InputStreamFactory inputStream = new InputStreamFactory() {
FileInputStream fileInputStream = new FileInputStream("D:\\dl4j-examples-master\\dl4j-examples-master\\dl4j-examples\\src\\main\\java\\opennlpExamples\\src\\main\\resources\\technology.train");
public InputStream createInputStream() throws IOException {
return fileInputStream;
}
};
// InputStreamFactory temp= new InputStream("D:\\dl4j-examples-master\\dl4j-examples-master\\dl4j-examples\\src\\main\\java\\opennlpExamples\\src\\main\\resources\\en-ner-medical.train") ;
return train(lang, entity, inputStream,
new FileOutputStream(modelFile));
} catch (Exception e) {
e.printStackTrace();
}
return "Something goes wrong with training module.";
}
}
Now loading the saved the model, When i pass a sentence to predict the output, It picks only the 1st word and only if the first letter of the first word is in caps.
find the load model and predict code below,
public class nameEntity {
public static void main(String[] args) throws Exception {
InputStream modelIn = new FileInputStream( "D:/main/techno.bin");
InputStream tokenModelIn = new FileInputStream( "C:/openNLP/en-
token.bin");
try {
TokenNameFinderModel model = new TokenNameFinderModel(modelIn);
NameFinderME nameFinder = new NameFinderME(model);
//Instantiating the NameFinder class
//nameFinder = new NameFinderME(model);
TokenizerModel tokenModel = new TokenizerModel(tokenModelIn);
//Instantiating the TokenizerME class
TokenizerME tokenizer = new TokenizerME(tokenModel);
//Getting the sentence in the form of String array
String sentence = "Camel is a Java software";
String tokens[] = tokenizer.tokenize(sentence);
//Finding the names in the sentence
nameFinder.clearAdaptiveData();
Span nameSpans[] = nameFinder.find(tokens);
System.out.println(sentence);
//Printing the spans of the names in the sentence
for(Span s: nameSpans) {
System.out.println(s.toString()+" "+tokens[s.getStart()]);
}
}
}
train file:
Abdera implementation of the Atom Syndication Format and Atom Publishing Protocol, Accumulo secure implementation of BigTable, ActiveMQ message broker supporting different communication protocols and clients, including a full Java Message Service (JMS) 1.1 client. Allura Python-based an open source implementation of a software forge. Ant Java-based build tool, Apache Arrow "A high-performance cross-system data layer for columnar in-memory analytics". APR Apache Portable Runtime, a portability library written in C, Archiva Build Artifact Repository Manager, Apache Beam, an uber-API for big data Beehive Java visual object model. Bloodhound defect tracker based on Trac[3]. Calcite dynamic data management framework, Camel declarative routing and mediation rules engine which implements the Enterprise Integration Patterns using a Java-based domain specific language.
Output When 1st word of the 1st letter is in caps: Is Camel a Java software [0..1) technology Is
Output When 1st word of the 1st letter is not in caps: camel is a Java software
Now what happens here is, If the 1st word is found in train file or not. the output is the 1st word of the sentence iff 1st letter of word is in caps.
tried using openNlp tool 1.6.0 & 1.7.2 version to train the model.
Please tell me, where can be the issue ? Am i missing any rules ??
Thanks in advance.
来源:https://stackoverflow.com/questions/44043876/open-nlp-ner-is-not-properly-trained