发表新帖

发表新帖

Indexing PDF with Solr

前端未结

关注

 6  1992

一向 2020-12-31 05:46

Can anyone point me to a tutorial.

My main experience with Solr is indexing CSV files. But I cannot find any simple instructions/tutorial to tell me what I need to d

6条回答

小蘑菇 (楼主)

2020-12-31 06:11
With solr-4.9 (the latest version as of now), extracting data from rich documents like pdfs, spreadsheets(xls, xlxs family), presentations(ppt, ppts), documentation(doc, txt etc) has become fairly simple. The sample code examples provided in the downloaded archive from here contains a basic solr template project to get you started quickly.

The necessary configuration changes are as follows:
1. Change the solrConfig.xml to include following lines :
create a request handler as follows:

2.Add the necessary jars from the solrExample to your project.

3.Define the schema as per your needs and fire a query like :

curl "http://localhost:8983/solr/collection1/update/extract?literal.id=1&literal.filename=testDocToExtractFrom.txt&literal.created_at=2014-07-22+09:50:12.234&commit=true" -F "myfile=@testDocToExtractFrom.txt"

go to the GUI portal and query to see the indexed contents.

Let me know if you face any problems.
0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...

热议问题