Sitecore text search in PDF or Word documents

与世无争的帅哥 提交于 2019-12-03 20:13:59
Arnold Zokas

I've recently had to do something similar on one of my projects. Have a look at How to index Word 2003, 2007 and 2010 documents using Lucene.NET.

I ended up creating a custom indexer which handled MS Office documents (XP,2003,2007 and 2010 format) and PDF documents:

  • For indexing XP-2003 MS Office documents you can use IFilters built into the OS (assuming you are using Windows Server 2003 or newer)
  • For indexing 2007-2010 MS Office documents you will need to install Microsoft Office 2010 Filter Packs
  • For indexing PDF documents I strongly recommend using Foxit PDF IFilter. It is not free, but does a much better job than the Adobe PDF IFilter.

Note: Don't waste your time with Adobe PDF IFilter: it fails to read valid PDF files and is a lot slower. Foxit IFilter is designed to take advantage of multi-core CPUs and performs much better on large documents.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!