How to generate .conllu from a Doc object?

别说谁变了你拦得住时间么 提交于 2020-03-05 04:04:02

问题


Where can I find an example .conllu file Spacy will accept ? or example how to generate it ? with IOB ?

Trying to convert .conllu file I generated to .json for model training, this way :

 head_ix = token.head.i - sent[0].i + 1
 conll.append( (str(i), token.orth_, token.lemma_, token.tag_, token.ent_type_, str(head_ix), token.dep_) )

(Do you have correct example of doing this )

here is the error :

 $ python -m spacy convert spt3.conllu 

  .......
  File "/usr/local/lib/python2.7/dist-packages/spacy/cli/converters/conllu2json.py", line 25, in conllu2json
for i, (raw_text, tokens) in enumerate(conll_tuples):
  File "/usr/local/lib/python2.7/dist-packages/spacy/cli/converters/conllu2json.py", line 65, in read_conllx
id_, word, lemma, pos, tag, morph, head, dep, _1, iob = parts
ValueError: need more than 7 values to unpack

then with this :

        conll.append( (str(i), token.orth_, token.lemma_, token.tag_, '-', str(head_ix), token.dep_, str(head_ix), token.dep_, '-') )

the error is this :

head = (int(head) - 1) if head != "0" else id_
ValueError: invalid literal for int() with base 10: 'amod'

回答1:


textacy can do this:

from textacy.export import doc_to_conll
doc_to_conll(doc)



回答2:


This worked out :

 [ str(i), token.text, token.lemma_, token.pos_, token.tag_, '-', str(head_ix), token.dep_, '-', '-' ]


来源:https://stackoverflow.com/questions/57465128/how-to-generate-conllu-from-a-doc-object

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!