问题
Hi I have the following CSV file
132 1536130302256087040
133 1536130302256087041
134 1536130302256087042
the fields are seperated by a tab. Now I have the Dataimporthandler (DIH) for the solr, and I try to import the CSV into solr, but I only get the first line into solr. Thats the result, but the other lines from the CSV are missing:
"response": {
"numFound": 1,
"start": 0,
"maxScore": 1,
"docs": [ {
"string": "1536130302256087040",
"id": "132",
"_version_": 1536202153221161000
} ] }
Here is my data-config.xml
<dataConfig>
<dataSource type="FileDataSource" encoding="UTF-8" name="fds"/>
<document>
<entity name="f"
processor="FileListEntityProcessor"
fileName="myfile.csv"
baseDir="/var/www/solr-5.4.0/server/csv/files"
recursive="false"
rootEntity="true"
dataSource="null" >
<entity
onError="continue"
name="jc"
processor="LineEntityProcessor"
url="${f.fileAbsolutePath}"
dataSource="fds"
rootEntity="true"
header="false"
separator="\t"
transformer="RegexTransformer" >
<field column="id" name="id" sourceColName="rawLine" regex="^(.*)\t"/>
<field column="string" name="string" sourceColName="rawLine" regex="\t(.*)$"/>
</entity>
</entity>
</document>
</dataConfig>
Here is my schema.xml
<field name="id" type="text_general" indexed="true" stored="true" multiValued="false" required="true"/>
<field name="string" type="text_general" indexed="true" stored="true" multiValued="false"/>
<field name="_version_" type="long" indexed="true" stored="true"/>
<uniqueKey>id</uniqueKey>
What I'm doing wrong?
回答1:
You have rootEntity=true for both levels of entities. So, you will only get one document for the outer entity. Try setting the outer level rootEntity to false.
Also, you can just send tab-separated files to the Solr with CSV processor, no DIH required.
来源:https://stackoverflow.com/questions/37629261/solr-dih-regextransformer-processes-only-one-csv-line