Solr: DIH for multilingual index & multiValued field?

不羁的心 提交于 2019-12-03 20:57:44

First your schema needs to allow it with something like this:

<dynamicField name="text_*" type="string" indexed="true" stored="true" />

Then in your DIH config something like this:

<entity name="document" dataSource="ds1" transformer="script:ftextLang" query="SELECT * FROM documents" />

With the script being defined just below the datasource:

<script><![CDATA[
  function ftextLang(row){
     var name = row.get('language_code');
     var value = row.get('text');
     row.put('text_'+name, value); return row;
  }
]]></script>

I'm sorry I don't have a direct answer about your DIH question, though it'd be interesting to know.

I did notice your 2 letter language code and suggest a 5 letter slot. Some languages have dialect differences that are non trivial. For example, Simplified Chinese vs. Traditional Chinese. For morphological analysis, the SmartCN filter can handle zh-cn, but not zh-tw, etc.

Portuguese and Spanish are also languages where we've been warned against mixing all dialects together, although the differences are less drastic, and both would still be searchable.

Of course you may have already known this, and just didn't add it to the question to keep it simple. It's just a subject very fresh on my mind.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!