问题
Hi I am a naive user when it come to Solr. Please guide me on the following hurdles.
1) Solr Index PDF documents
Solution tried
I used tika-app 0.9.jar to extract the content from the Input PDF files to text file. Now I am trying to write a java code to index the documents to Solr.
2) Post them to a remote server
I need to post either the documents or the index to a central remote server. Can curl command be used for this.
Regards Balaji.
回答1:
1) Solr Index PDF documents - I believe Solr does this for you. You can use Solr's http interface or SolrJ. 2) Post the index to a remote server - Solr replication may fit the bill.
回答2:
Assuming the PDFs are on a web server, you can use Nutch to fetch and parse them, and then push the index to Solr via its HTTP interface.
来源:https://stackoverflow.com/questions/6482820/solr-index-pdf-documents-and-post-them-to-a-remote-server