why the tikaEntityProcesor does not index the Text field in the following data-config file?

后端 未结 1 1696
终归单人心
终归单人心 2020-12-22 04:55



        
相关标签:
1条回答
  • 2020-12-22 05:34

    I have solved the problem by declaring the TikaEntityProcessor inside the main entity. I am sharing the correct code for information.

     <dataSource name="test2" type="BinFileDataSource" />
            <document>
                <entity name="files" dataSource="null" rootEntity="false"
                processor="FileListEntityProcessor" transformer="RegexTransformer"
                baseDir="/home/shah/solr/IndexTest" fileName="\.(txt)|(pdf)|(docx)"
                onError="skip"
                recursive="true">
    
                    <field column="fileSize" name="size" />
                    <field column="fileLastModified" name="lastModified" />
                    <field column="file" name="id" regex="(.txt)" replaceWith=""/>
    
    <entity  dataSource="test2"
                        name="documentImport"
                        processor="TikaEntityProcessor"
                        url="${files.fileAbsolutePath}"
                        format="text">
                    <field column="text" name="Text" />
    
    <entity name="item" dataSource="test1" query="select PaperID, ID, VName  from ACL.Score where PaperID='${files.file}'">
                <field column="PaperID" name="PaperID" />
                <field column="ID" name="ID" />
                <field column="VName" name="Venue" />
    
            </entity>
    </entity>
    
    </entity>
        </document>
    </dataConfig>
    
    0 讨论(0)
提交回复
热议问题