问题
I downloaded and installed SyntaxNet following Syntax official documentation on Github. following the documentation (annotating corpus) I tried to read a .conll
file named wj.conll
by SyntaxNet and write the results in wj-tagged.conll
but I could not. My questions are:
does SyntaxNet always reads
.conll
files? (not.txt
files?). I got a bit confused as I knew SyntaxNet reads.conll
file for training and testing process but I am a bit suspicious that it is necessary to convert a.txt
file to.conll
file in order to have their Part Of Speach and Dependancy Parsing.How can I make SyntaxNet reads from files (I tired all possible ways explain in GitHub documentation about SyntaxNet and It didn't work for me)
回答1:
Add these declaration lines to "context.pbtxt" at the end of the file. Here "inp" and "out" are the text files present in the root directory of syntexnet.
input {
name: 'inp_file'
record_format: 'english-text'
Part {
file_pattern: 'inp'
}
}
input {
name: 'out_file'
record_format: 'english-text'
Part {
file_pattern: 'out'
}
}
Add sentences to the "inp" file for which you want tagging to be done and specify them in shell the next time you run syntaxnet using --input and --output tags.
Just to help you a bit more I am pasting an example shell command.
bazel-bin/syntaxnet/parser_eval \
--input inp_file \
--output stdout-conll \
--model syntaxnet/models/parsey_mcparseface/tagger-params \
--task_context syntaxnet/models/parsey_mcparseface/context.pbtxt \
--hidden_layer_sizes 64 \
--arg_prefix brain_tagger \
--graph_builder structured \
--slim_model \
--batch_size 1024 | bazel-bin/syntaxnet/parser_eval \
--input stdout-conll \
--output out_file \
--hidden_layer_sizes 512,512 \
--arg_prefix brain_parser \
--graph_builder structured \
--task_context syntaxnet/models/parsey_mcparseface/context.pbtxt \
--model_path syntaxnet/models/parsey_mcparseface/parser-params \
--slim_model --batch_size 1024
In the above script the output(POS tagging) of the first shell command is used as an input for the second shell command, where the two shell commands are seperated by "|"
回答2:
just a quick help if you want to save the output of demo in a .txt file:
try echo "open file X with application Y" | ./demo.sh > output.txt
it gives you sentence tree to the current directory.
来源:https://stackoverflow.com/questions/37464591/annotating-a-corpus-syntaxnet