I am using a php library of elasticsearch to index and find documents in my website. This is the code for creating the index:
curl -XPUT \'http://localhost:9200/
Since 'porterStem' filter is oversensitive, it is more suited if you use 'minimal_english' filter. 'porterStem' creates similar tokens for words such as :
searching for 'Test' will result you 'Test', 'Tests', 'Testing', 'Tester' et. al.
But 'minimal_english' will only yield - 'Test' and 'Tests'.
The default elascticsearch analyzer doesn't do stemming and this is what you need to handle plural/singular. You can try using Snowball Analyzer for your text fields to see if it works better for your use case:
curl -XPUT 'http://localhost:9200/test' -d '{
"settings" : {
"index" : {
"number_of_shards" : 1,
"number_of_replicas" : 1
}
},
"mappings" : {
"page" : {
"properties" : {
"mytextfield": { "type": "string", "analyzer": "snowball", "store": "yes"}
}
}
}
}'
Somehow snowball is not working for me... am getting errors like I mentioned in the comment to @imotov's answer. I used porter stem and it worked perfectly for me. This is the config I used:
curl -XPUT localhost:9200/index_name -d '
{
"settings" : {
"analysis" : {
"analyzer" : {
"stem" : {
"tokenizer" : "standard",
"filter" : ["standard", "lowercase", "stop", "porter_stem"]
}
}
}
},
"mappings" : {
"index_type_1" : {
"dynamic" : true,
"properties" : {
"field1" : {
"type" : "string",
"analyzer" : "stem"
},
"field2" : {
"type" : "string",
"analyzer" : "stem"
}
}
}
}
}'