问题
I am working on a project where I need to extract corporate bonds information from the unstructured emails. After doing a lot of research, I found that machine learning can be used for information extraction. I tried Opennlp NER (Named entity recognizer) but I am not sure whether I picked up the correct library for this problem or not because I am getting the results but not up to the mark.
Could someone please suggest me any library or algorithms means how can I parse and extract data from it. I am planning to explore Naïve Bayes or N-gram or Support vector machine but not sure, this will help me or not. Please suggest.
Examples are like:
[/] Trading 10mm ABC 2.5 19 05/06 mkt can use 50mm
---> here I want to extract "ABC 2.5 19"
Example 2:
XYZ 6.5 15 10-2B 106-107 B3 AAA- 1.646MM 2x2
---> here I want to extract "XYZ 6.5 15"
回答1:
In Perl, you can use Marpa::R2 — a general BNF parser.
This gist extracts info from your examples.
Hope this helps.
来源:https://stackoverflow.com/questions/25758919/how-to-extract-corporate-bonds-informations-using-machine-learning