Elastic search wildcard query to get sorted results

笑着哭i 提交于 2020-03-03 13:05:13

问题


I have a Elastic Search server setup where am storing company names to be used for for company search, the way it works is:

From company name, spaces and dots will be removed and stored in ES in a field called trimmedcompanyname,

{
          "companyName" : "RECKON INFOSYSTEM PRIVATE LIMITED",
          "trimmedCompanyName" : "reckoninfosystemprivatelimited",
          "id" : "1079"
}        

now when search comes to my server i remove the spaces and dots and then make request to ES server. The ES request in query format is:

GET /_search
{
   "from": 0,"size": 100,
    "query": {
        "wildcard": {
            "trimmedCompanyName.keyword": {
                "value": "*infosys*"
            }
        }
    }
}

But i have around 600 companies with name infosys in them and they would be stored with spaces removed. So ES returns me 100 companies but in these 100 companies infosys is present in the starting of second word or starting of third word but i want the result to include companies that have infosys in first word and then in second word and so on.

One solution i could think up was to fire two ES request one with wildcard query infosys* and second query *infosys* combine both the results, remove the duplicates and give the response back but since this request has to work along with pagination hence firing two request can get things wrong, can someone please help me with this


回答1:


First of all, when it comes to corpus data, traditional similarity algorithms or queries that we use in ES would not take into account the position of the terms while calculating the relevancy.

For positional based queries, you would need to make use of Span Queries

I've been able to come up with the below solution which should work in your case. Note that I've used the query for the field companyName and I assume that it is making use of Standard Analyzer.

Below are the mapping, sample documents, the query and response as how it appears:

Mapping:

PUT my_company
{
  "mappings": {
    "properties": {
      "companyName":{
        "type":"text"
      }
    }
  }
}

Sample Documents:

POST my_company/_doc/1
{
  "companyName": "reckon infosystem private limited"
}

POST my_company/_doc/2
{
  "companyName": "infosys"
}

POST my_company/_doc/3
{
  "companyName": "telecom services infosystem private limited"
}

POST my_company/_doc/4
{
  "companyName":"infosystems technological solution"
}

Query:

POST <your_index_name>/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "span_multi": {
            "match": {
              "wildcard": {
                "companyName": "infosys*"
              }
            }
          }
        }
      ]
    }
  }
}

Note that I've made use of wildcard query inside Span multi-term query.

You might be wondering why I've not made use of the field trimmedCompanyName, that is because, looking at its mapping, (even if its text type with standard analyzer) the values or contents in it are all considered as a single term and stored that way in inverted index.

Response:

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : 4.3264027,
    "hits" : [
      {
        "_index" : "my_company",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 4.3264027,
        "_source" : {
          "companyName" : "infosys"
        }
      },
      {
        "_index" : "my_company",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 3.2018504,
        "_source" : {
          "companyName" : "infosystems technological solution"
        }
      },
      {
        "_index" : "my_company",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 2.8335867,
        "_source" : {
          "companyName" : "reckon infosystem private limited"
        }
      },
      {
        "_index" : "my_company",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 2.5412967,
        "_source" : {
          "companyName" : "telecom services infosystem private limited"
        }
      }
    ]
  }
}

Let me know if this helps!



来源:https://stackoverflow.com/questions/60462805/elastic-search-wildcard-query-to-get-sorted-results

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!