Elasticsearch - use a “tags” index to discover all tags in a given string

安稳与你 提交于 2019-12-20 06:48:08

问题


I have an elasticsearch v2.x cluster with a "tags" index that contains about 5000 tags: {tagName, tagID}. Given a string, is it possible to query the tags index to get all tags that are found in that string? Not only do I want exact matches, but I also want to be able to control for fuzzy matches without being too generous. By too generous, a tag should only match if all tokens in the tag are found within a certain proximity of each other (say 5 words).

For example, given the string:

Model 22340 Sound Spectrum Analyzer

The following tags should match:

sound analyzer sound spectrum analyzer

BUT NOT

sound meter light spectrum chemical analyzer


回答1:


"query": {
"match": {
  "tagName": {
    "query":     "Model 22340 Sound Spectrum Analyzer",
    "fuzziness": "AUTO",
    "operator":  "or"
  }
}

}

If you want an equal match so that "sound meter" will not match you will have to add another field for each tag containing the terms count in the tag name, add a script to count the terms in the query and add a comparison of the both in the match_query, see: Finding Multiple Exact Values.

Regarding the proximity issue: Since you require "Fuzzyness" you cannot control the proximity because the "match_phrase" query is not integrated with Fuzzyness, as stated by Elastic docs Fuzzy-match-query:

Fuzziness works only with the basic match and multi_match queries. It doesn’t work with phrase matching, common terms, or cross_fields matches.

so you need to decide: Fuzzyness vs. Proximity.




回答2:


I don't think it's possible to create an accurate elasticsearch query that will auto-tag a random string. That's basically a reverse query. The most accurate way to match a tag to a document is to construct a query for the tag, and then search the document. Obviously this would be terribly inefficient if you need to iterate over each tag to auto-tag a document.

To do a reverse query, you want to use the Elasticsearch Percolator API:

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-percolate.html

The API is very flexible and allows you to create fairly complex queries into documents with multiple fields.

The basic concept is this (assuming your tags have an app specific ID field):

  1. For each tag, create a query for it, and register the query with the percolator (using the tag's ID field).

  2. To auto-tag a string, pass your string (as a document) to the Percolator, which will match it against all registered queries.

  3. Iterate over the matches. Each match includes the _id of the query. Use the _id to reference the tag.

This is also a good article to read: https://www.elastic.co/blog/percolator-redesign-blog-post




回答3:


Of course you can. You can achieve what you want to get using only just match query with standard analyzer.

curl -XGET "http://localhost:9200/tags/_search?pretty" -d '{
  "query": {
    "match" : {
      "tagName" : "Model 22340 Sound Spectrum Analyzer"
    }
  }
}'


来源:https://stackoverflow.com/questions/37889680/elasticsearch-use-a-tags-index-to-discover-all-tags-in-a-given-string

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!