How to implement an IFilter for indexing heavyweight formats?

﹥>﹥吖頭↗ 提交于 2019-12-24 05:31:05

问题


I need to develop an IFilter for Microsoft Search Server 2008 that performs prolonged computations to extract text. Extracting text from one file can take from 5 seconds to 12 hours. How can I desing such an IFilter so that the daemon doesn't reset it on timeout and also other IFilters can be reset on timeout if they hang up?


回答1:


12 hours, wow!

If it takes that long and there are many files, your best option would be to create a pre-processing application that would extract the text and make it available for the iFilter to access.

Another option would be to create html summaries of the documents and instruct the crawler to index those. If the summary page could easily link to the document itself if necessary.




回答2:


I have not actually developed any filters yet, so I'm basically just guessing, but the way I always understood things is that the IFilter is chunk-based for exactly this reason. It's up to the filter implementation to make sure the returned chunks are "small enough", so the calling search daemon can simply quit in between two chunks if things are taking too long.

Apparently, my assumption is wrong, or you would not be asking this very question.



来源:https://stackoverflow.com/questions/464443/how-to-implement-an-ifilter-for-indexing-heavyweight-formats

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!