Solr Index PDF documents and post them to a remote server

笑着哭i 提交于 2019-12-11 05:06:50

问题


Hi I am a naive user when it come to Solr. Please guide me on the following hurdles.

1) Solr Index PDF documents

Solution tried

I used tika-app 0.9.jar to extract the content from the Input PDF files to text file. Now I am trying to write a java code to index the documents to Solr.

2) Post them to a remote server

I need to post either the documents or the index to a central remote server. Can curl command be used for this.

Regards Balaji.


回答1:


1) Solr Index PDF documents - I believe Solr does this for you. You can use Solr's http interface or SolrJ. 2) Post the index to a remote server - Solr replication may fit the bill.




回答2:


Assuming the PDFs are on a web server, you can use Nutch to fetch and parse them, and then push the index to Solr via its HTTP interface.



来源:https://stackoverflow.com/questions/6482820/solr-index-pdf-documents-and-post-them-to-a-remote-server

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!