问题
I am using DIH to index local file system. But the file path, size and lastmodified field were not stored. in the schema.xml I defined:
<fields>
<field name="title" type="string" indexed="true" stored="true"/>
<field name="author" type="string" indexed="true" stored="true" />
<!--<field name="text" type="text" indexed="true" stored="true" />
liang added-->
<field name="path" type="string" indexed="true" stored="true" />
<field name="size" type="long" indexed="true" stored="true" />
<field name="lastmodified" type="date" indexed="true" stored="true" />
</fields>
And also defined tika-data-config.xml:
<dataConfig>
<dataSource name="bin" type="BinFileDataSource" />
<document>
<entity name="f" dataSource="null" rootEntity="false"
processor="FileListEntityProcessor"
baseDir="E:/my_project/ecmkit/infotouch"
fileName=".*\.(DOC)|(PDF)|(pdf)|(doc)|(docx)|(ppt)" onError="skip"
recursive="true">
<entity name="tika-test" dataSource="bin" processor="TikaEntityProcessor"
url="${f.fileAbsolutePath}" format="text" onError="skip">
<field column="Author" name="author" meta="true"/>
<field column="title" name="title" meta="true"/>
<!--
<field column="text" name="text"/> -->
<field column="fileAbsolutePath" name="path" />
<field column="fileSize" name="size" />
<field column="fileLastModified" name="lastmodified" />
</entity>
</entity>
</document>
</dataConfig>
The Solr version is 3.5. any idea?
Thanks in advance.
回答1:
Those data don't come from the Tika metadata, so you should move them to the FileListEntityProcessor
entity like this:
<dataConfig>
<dataSource name="bin" type="BinFileDataSource" />
<document>
<entity name="f" dataSource="null" rootEntity="false"
processor="FileListEntityProcessor"
baseDir="/home/luca/Documents"
fileName=".*\.(DOC)|(PDF)|(pdf)|(doc)|(docx)|(ppt)" onError="skip"
recursive="true">
<field column="fileAbsolutePath" name="path" />
<field column="fileSize" name="size" />
<field column="fileLastModified" name="lastmodified" />
<entity name="tika-test" dataSource="bin" processor="TikaEntityProcessor"
url="${f.fileAbsolutePath}" format="text" onError="skip">
<field column="Author" name="author" meta="true"/>
<field column="title" name="title" meta="true"/>
<!--<field column="text" />-->
</entity>
</entity>
</document>
</dataConfig>
回答2:
You don't need to declare this fields in DIH config, just define them in schema.xml:
<field name="fileAbsolutePath" type="string" indexed="true" stored="true" multiValued="false" />
<field name="file" type="string" indexed="true" stored="true" multiValued="false" />
<field name="fileLastModified" type="string" indexed="true" stored="true" multiValued="false" />
They are filled automatically (tested in solr 4.6) based on FileListEntityProcessor
.
来源:https://stackoverflow.com/questions/9883699/how-to-store-file-path-in-solr-when-using-tikaentityprocessor