Resume parser in Java [closed]

可紊 提交于 2020-02-21 06:54:25

问题


I want to parse a resume to get different titles and content, which includes bullets, paragraphs, urls. I have the resume in .doc/.docx format. Research so far has resulted in

1.building an xml file from the .doc file and then
2. build an xml parser using JDOM.

Is there any other approach or a better way to do this? some algorithm that would help identify structures in resume?


回答1:


look like you are in right direction. Simple approach is : Once you identify information and moved further, you just need to transverse based on +/- steps with calculated spaces, and identify results.

I am sure you are using NLP methodology which can help you to get data with proximity and then you can remove noise based on your experience.

or simple go and get some already build up. I recomend you RChilli CV Parsing or others like hireability or sovren and discuss your need. I am sure you get some information

thanks -K




回答2:


Interesting -- I worked in a solution where we used Solr to identify my identities.

Another approach is - you can use Apache Solr / index document into that, and fetch faceted search .

Only challenge is how to build library. This will be much shorter and simpler than Apache POI

Let me know if you need some help ?



来源:https://stackoverflow.com/questions/21994957/resume-parser-in-java

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!