Using ElasticSearch and/or Solr as a datastore for MS Office and PDF documents

前端 未结 5 1578
一生所求
一生所求 2020-12-23 10:31

I\'m currently designing a full text search system where users perform text queries against MS Office and PDF documents, and the result will return a list of documents that

5条回答
  •  一生所求
    2020-12-23 10:58

    A bit late to the party but this may help someone :)

    I had a similar problem and some research led me to fscrawler. Description:

    This crawler helps to index binary documents such as PDF, Open Office, MS Office.

    Main features:

    • Local file system (or a mounted drive) crawling and index new files,
    • update existing ones and removes old ones. Remote file system over SSH crawling.
    • REST interface to let you "upload" your binary documents to elasticsearch.

提交回复
热议问题