How to create Solr schema for hierarchical facet by splitting data into multiple fields at index time

放肆的年华 提交于 2019-12-10 19:01:59

问题


I want to implement Solr hierarchical facet for my application where there is 2 level hierarchy between Category and SubCategory. I want to use a solution mentioned on http://wiki.apache.org/solr/HierarchicalFaceting#Pivot_Facets link.

The flattened data will be as below:

Doc#1: NonFic > Law
Doc#2: NonFic > Sci
Doc#3: NonFic > Sci > Phys

And this data should be split into a separate field for each level of the hierarchy at index time. Same as below.

Indexed Terms

Doc#1: category_level0: NonFic; category_level1: Law
Doc#2: category_level0: NonFic; category_level1: Sci
Doc#3: category_level0: NonFic; category_level1: Sci, category_level2:Phys

So can anyone please suggest ways to implement this? How do I define Solr schema to achieve this? I could not find any reference for splitting data as mentioned above at Index time.

Thanks,

Priyanka


回答1:


Do you need to display those individual fields as part of the documents returned? In which case you need those split values in 'stored' version of the field. If you only need to have them during search or during faceting, you can ignore the 'stored' form and concentrate on 'indexed' form.

In either case, if you need to split one field into several, you can do that with copyField or with UpdateRequestProcessor.

With copyField, the 'stored' form will be the same for all fields, but you can have different processors for each field, picking different part of the hierarchy for the 'indexed' part.

With UpdateRequestProcessor, you can write a custom one that takes one field and then spits out several fields, each with only its part of the path. You can do a custom one or do a couple of field copies and then different Regex processor on each field.




回答2:


To split the data, use a ScriptTransformer that allows you to transform the data using Javascript within your config files.

Add the following to your db-data-config at the same level as dataSource and document. This defines a function that splits the string within a field on the delimiter, >, and adds a field for each of the split values called category_level0, category_level1,...

<script><![CDATA[
    function CategoryPieces(row) {
        var pieces = row.get('ColumnToSplit').split('>');
        for (var i=0; i < pieces.length; i++) {
            row.put('category_level' + i, pieces[i]);
        }
        return row;
    }
]]></script>

Then in your main <entity> tag, add transformer="script:CategoryPieces", and add the columns to your field list.

<field column="category_level0" name="Category_Level0" />
<field column="category_level1" name="Category_Level1" />

Last, in your schema.xml, add the new fields.

<field name="Category_Level0" type="string" indexed="true" stored="true" multiValued="false" />
<field name="Category_Level1" type="string" indexed="true" stored="true" multiValued="false" />


来源:https://stackoverflow.com/questions/15089549/how-to-create-solr-schema-for-hierarchical-facet-by-splitting-data-into-multiple

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!