Is it possible to get Solr's DataImportHadler to ignore fields with empty strings?

China☆狼群 提交于 2019-12-24 05:08:26

问题


I am using Solr's DataImportHandler to import data from a database. Some of the records have empty strings if there is no value for that column.

Currently the configuration I have produces Solr documents like this:

{
    "x": "value",
    "y": "",
    "z": 2
}

However I would like to ignore all fields that have no value so that documents like this are created:

{
    "x": "value",
    "z": 2
}

Is there something I can define in the configuration file for the DataImportHandler that will give me my desired results?


回答1:


One of the little-realized aspects of Solr is that you can plug UpdateRequestProcessor to run after the DIH. And, there are specialized URPs specifically for this problem.

So you could do something like this:

<updateRequestProcessorChain name="skip-empty">
    <!--  Next two processors affect all fields - default configuration -->
    <processor class="TrimFieldUpdateProcessorFactory" /> <!--  Get rid of leading/trailing spaces. Also empties all-spaces fields for next filter-->
    <processor class="RemoveBlankFieldUpdateProcessorFactory" /> <!--  Delete fields with no content. More efficient and allows to query for presence/absence of field -->

    <processor class="solr.LogUpdateProcessorFactory" />
    <processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>

Obviously, remember to also reference this chain in the DIH's handler's definition:

<requestHandler name="/dataimport" class="solr.DataImportHandler">
  <lst name="defaults">
    ....
    <str name="update.chain">skip-empty</str>
  </lst>
</requestHandler>

You can see the full list of the UpdateRequestProcessors at http://solr-start.com




回答2:


You can either do this in SQL as I suggested in the comment above, or if you want to have a solution in the DIH processor chain, using the ScriptTransformer is a possibility. The ScriptTransformer will allow you to write a small Javascript to check if any column is an empty string, and use row.remove(fieldname) to get rid of that field completely.

If you want to write it in pure Java instead, you can also create a reusable custom transformer for DIH.



来源:https://stackoverflow.com/questions/24570545/is-it-possible-to-get-solrs-dataimporthadler-to-ignore-fields-with-empty-string

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!