Elastic Search 1.6
I want to index text that contains hyphens, for example U-12, U-17, WU-12, t-shirt... and to be able to use a \"Simple Query String\" query to sea
the Quote from Igor Motov is true, you have to add "analyze_wildcard":true, in order to make it worked with regex. But it is important to notice that the hyphen actually tokenizes "u-12" in "u" "12", two separated words.
if preserve the original is important do not use Mapping char filter. Otherwise is kind of useful.
Imagine that you have "m0-77", "m1-77" and "m2-77", if you search m*-77 you are going to have zero hits. However you can remplace "-" (hyphen) with AND in order to connect the two separed words and then search m* AND 77 that is going to give you a correct hit.
you can do it in the client front.
In your problem u-*
{
"query":{
"simple_query_string":{
"query":"u AND 1*",
"analyze_wildcard":true
}
}
}
t-sh*
{
"query":{
"simple_query_string":{
"query":"t AND sh*",
"analyze_wildcard":true
}
}
}
If anyone is still looking for a simple workaround to this issue, replace hyphen with underscore _
when indexing data.
For eg, O-000022334 should indexed as O_000022334.
When searching, replace underscore back to hyphen again when displaying results. This way you can search for "O-000022334" and it will find a correct match.
The answer is really simple:
Quote from Igor Motov: Configuring the standard tokenizer
By default the simple_query_string query doesn't analyze the words with wildcards. As a result it searches for all tokens that start with i-ma. The word i-mac doesn't match this request because during analysis it's split into two tokens i and mac and neither of these tokens starts with i-ma. In order to make this query find i-mac you need to make it analyze wildcards:
{
"_source":true,
"query":{
"simple_query_string":{
"query":"u-1*",
"analyze_wildcard":true,
"default_operator":"AND"
}
}
}