问题
Does anybody know of an API/SDK or IFilter in .NET that can read the subject ('title' metadata) and text from the following files:
.PDF .DOC .XLS .PPT .CSV .TXT .DOCX .XLS .PPTX + the OpenOffice and Open Document standards.
Open source would be awesome... but commercial is OK too.
I can't find anything anywhere!
回答1:
I don't think you will be able to find a single IFilter that will be able to access the contents of all of those types. Typically, an IFilter will be for a specific technology.
For example, Adobe have one for PDFs, Microsoft provide one for Office that can do Word, Excel, Powerpoint, CSV (that I believe comes pre-installed with Windows).
来源:https://stackoverflow.com/questions/1535992/ifilter-or-sdk-for-many-file-types