extracting relations from text

后端 未结 1 1379
抹茶落季
抹茶落季 2021-02-10 16:35

I want to extract relations from unstructured text in the form of (SUBJECT,OBJECT,ACTION) relations,

for instance,

\"The boy is sitting on the table eating the c

相关标签:
1条回答
  • 2021-02-10 16:45

    If your sentences do not get much more complicated than the example you have shown (for instance, with respect to anaphoras), the Stanford parser will give good results, based on a probabilistic context-free grammar, that you will easily be able to convert into the format you want. There is a demo available online. For your example, it will give something like

    nsubj(sitting, boy)

    prep_on(sitting, table)

    etc.

    If your sentences do get more complicated, you might be interested in trying Boxer, which builds discourse representation structures from C&C parses, based on probabilistic combinatory categorial grammars. Those structures may prove more difficult to adapt to the format you want, but will allow you much more flexibility. There is, again, a demo available online. For your example, it will look something like

    sit(x)

    boy(y)

    table(z)

    agent(x,y)

    on(x,z)

    etc.

    The Stanford parser is written in Java and is available under the GPL. C&C is written in C++ and Boxer in SWI Prolog. Those two are not released under a genuinely free licence, but you can obtain the source code, modify it, and use it for any non-commercial project.

    Neither will give you a characterisation for the relation between "boy" and "table" in your example—you will need much more powerful semantic reasoning tools for this, and I am not sure whether something like this exists.

    Edit

    It has now become once more possible to obtain the source code for C&C and Boxer, along with a collection of models.

    0 讨论(0)
提交回复
热议问题