edit config file in stanford pos tagger

房东的猫 提交于 2019-12-11 02:21:52

问题


i have tagged a simple sentence and this is my code:

package tagger;

import edu.stanford.nlp.tagger.maxent.MaxentTagger;

public class myTag {

public static void main(String[] args) {

    MaxentTagger tagger = new MaxentTagger("D:/tagger/english-bidirectional-distsim.tagger");


    String sample = "i go to school by bus";

    String tagged = tagger.tagString(sample);

    System.out.println(tagged);
}

}

this is the output:

    Reading POS tagger model from D:/tagger/english-bidirectional-distsim.tagger    ... done [3.0 sec].
i_LS go_VB to_TO school_NN by_IN bus_NN 

after editing the properties file it doesn't have any effect at all. for example i have changed the tag separator to ( * ) but in the output it still prints ( _ ).

how could i use the model config file in eclipse?


回答1:


You can load Properties file and pass it to the constructor of MaxEnt, something like this:

Properties props = new Properties();
props.load(new FileReader("path/to/properties"));
MaxentTagger tagger = new MaxentTagger("D:/tagger/english-bidirectional-distsim.tagger", props);

You can also set properties in props object directly:

props.setProperty("tagSeparator", "*");

NB: if you use the original properties file and it fails with exception like

java.io.FileNotFoundException: /u/nl
p/data/pos_tags_are_useless/egw4-reut.512.clusters (No such file or directory)

then remove arch and trainFile attributes.




回答2:


Instead of writing a java code for this, you can use the bash file which comes in the downloaded ZIP file. After extracting the postagger's ZIP file, edit the following bash file:

stanford-postagger.sh

It should have the following line:

java -mx300m -cp 'stanford-postagger.jar:lib/*' edu.stanford.nlp.tagger.maxent.MaxentTagger -model $1 -textFile $2

Add a parameter called "-tagSeparator [YourTag]" after "-model $1":

java -mx300m -cp 'stanford-postagger.jar:lib/*' edu.stanford.nlp.tagger.maxent.MaxentTagger -model $1 -tagSeparator * -textFile $2

To run it (Make sure necessary permissions are given):

./stanford-postagger.sh models/model_name.tagger in_filename > out_filename

Voilà!



来源:https://stackoverflow.com/questions/29429137/edit-config-file-in-stanford-pos-tagger

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!