Query multiple collections with different fields in solr

后端 未结 2 542
广开言路
广开言路 2020-12-13 16:12

Given the following (single core) query\'s:

http://localhost/solr/a/select?indent=true&q=*:*&rows=100&start=0&wt=json
http://localhost/solr/b         


        
相关标签:
2条回答
  • 2020-12-13 16:17

    What you need is - what I call - a unification core. That schema itself will have no content, it is only used as a sort of wrapper to unify those fields you want to display from both cores. In there you will need

    • a schema.xml that wraps up all the fields that you want to have in your unified result
    • a query handler that combines the two different cores for you

    An important restriction beforehand taken from the Solr Wiki page about DistributedSearch

    Documents must have a unique key and the unique key must be stored (stored="true" in schema.xml) The unique key field must be unique across all shards. If docs with duplicate unique keys are encountered, Solr will make an attempt to return valid results, but the behavior may be non-deterministic.

    As example, I have shard-1 with the fields id, title, description and shard-2 with the fields id, title, abstractText. So I have these schemas

    schema of shard-1

    <schema name="shard-1" version="1.5">
    
      <fields>
        <field name="id"
              type="int" indexed="true" stored="true" multiValued="false" />
        <field name="title" 
              type="text" indexed="true" stored="true" multiValued="false" />
        <field name="description"
              type="text" indexed="true" stored="true" multiValued="false" />
      </fields>
      <!-- type definition left out, have a look in github -->
    </schema>
    

    schema of shard-2

    <schema name="shard-2" version="1.5">
    
      <fields>
        <field name="id" 
          type="int" indexed="true" stored="true" multiValued="false" />
        <field name="title" 
          type="text" indexed="true" stored="true" multiValued="false" />
        <field name="abstractText" 
          type="text" indexed="true" stored="true" multiValued="false" />
      </fields>
      <!-- type definition left out, have a look in github -->
    </schema>
    

    To unify these schemas I create a third schema that I call shard-unification, which contains all four fields.

    <schema name="shard-unification" version="1.5">
    
      <fields>
        <field name="id" 
          type="int" indexed="true" stored="true" multiValued="false" />
        <field name="title" 
          type="text" indexed="true" stored="true" multiValued="false" />
        <field name="abstractText" 
          type="text" indexed="true" stored="true" multiValued="false" />
        <field name="description" 
          type="text" indexed="true" stored="true" multiValued="false" />
      </fields>
      <!-- type definition left out, have a look in github -->
    </schema>
    

    Now I need to make use of this combined schema, so I create a query handler in the solrconfig.xml of the solr-unification core

    <requestHandler name="standard" class="solr.StandardRequestHandler" default="true">
      <lst name="defaults">
        <str name="defType">edismax</str>
        <str name="q.alt">*:*</str>
        <str name="qf">id title description abstractText</str>
        <str name="fl">*,score</str>
        <str name="mm">100%</str>
      </lst>
    </requestHandler>
    <queryParser name="edismax" class="org.apache.solr.search.ExtendedDismaxQParserPlugin" />
    

    That's it. Now some index-data is required in shard-1 and shard-2. To query for a unified result, just query shard-unification with appropriate shards param.

    http://localhost/solr/shard-unification/select?q=*:*&rows=100&start=0&wt=json&shards=localhost/solr/shard-1,localhost/solr/shard-2
    

    This will return you a result like

    {
      "responseHeader":{
        "status":0,
        "QTime":10},
      "response":{"numFound":2,"start":0,"maxScore":1.0,"docs":[
          {
            "id":1,
            "title":"title 1",
            "description":"description 1",
            "score":1.0},
          {
            "id":2,
            "title":"title 2",
            "abstractText":"abstract 2",
            "score":1.0}]
      }}
    

    Fetch the origin shard of a document

    If you want to fetch the originating shard into each document, you just need to specify [shard] within fl. Either as parameter with the query or within the requesthandler's defaults, see below. The brackets are mandatory, they will also be in the resulting response.

    <requestHandler name="standard" class="solr.StandardRequestHandler" default="true">
      <lst name="defaults">
        <str name="defType">edismax</str>
        <str name="q.alt">*:*</str>
        <str name="qf">id title description abstractText</str>
        <str name="fl">*,score,[shard]</str>
        <str name="mm">100%</str>
      </lst>
    </requestHandler>
    <queryParser name="edismax" class="org.apache.solr.search.ExtendedDismaxQParserPlugin" />
    

    Working Sample

    If you want to see a running example, checkout my solrsample project on github and execute the ShardUnificationTest. I have also included the shard-fetching by now.

    0 讨论(0)
  • 2020-12-13 16:28

    Shards should be used in Solr

    When an index becomes too large to fit on a single system, or when a single query takes too long to execute

    so the number and names of the columns should always be the same. This is specified in this document (where the previous quote also come from): http://wiki.apache.org/solr/DistributedSearch

    If you leave your query as it is and make the two shards with the same fields this shoudl just work as expected.

    If you want more info about how the shards work in SolrCould have a look at this docuemtn also: http://wiki.apache.org/solr/SolrCloud

    0 讨论(0)
提交回复
热议问题