Is it possible to specify a script be executed when inserting a document into ElasticSearch using its Index API? This functionality exists
Elasticsearch 1.3
If you just need to search/filter on the fields that you'd like to add, the mapping transform capabilities that were added into 1.3.0 could possibly work for you:
The document can be transformed before it is indexed by registering a script in the transform element of the mapping. The result of the transform is indexed but the original source is stored in the _source field.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-transform.html
You can also have the same transformation run when you get a document as well by adding the _source_transform
url parameter to the request:
The get endpoint will retransform the source if the _source_transform parameter is set.The transform is performed before any source filtering but it is mostly designed to make it easy to see what was passed to the index for debugging.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/_get_transformed.html
However, I don't think the _search endpoint accepts the _source_transform url parameter so I don't think you can apply the transformation to search results. That would be a nice feature request.
Elasticsearch 1.4
Elasticsearch 1.4 added a couple features which makes all this much nicer. As you mentioned, the update API allows you to specify a script to be executed. The update API in 1.4 can also accept a default document to be used in the case of an upsert. From the 1.4 docs:
curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
"script" : "ctx._source.counter += count",
"params" : {
"count" : 4
},
"upsert" : {
"counter" : 1
}
}'
In the example above, if the document doesn't exist it uses the contents of the upsert key to initialize the document. So in the case above the counter key in the newly created document will have a value of 1.
Now, if we set scripted_upsert
to true (scripted_upsert is another new option in 1.4), our script will run against the newly initialized document:
curl -XPOST 'localhost:9200/test/type1/2/_update' -d '{
"script": "ctx._source.counter += count",
"params": {
"count": 4
},
"upsert": {
"counter": 1
},
"scripted_upsert": true
}'
In this example, if the document didn't exist the counter key would have a value of 5.
Full documentation from Elasticsearch site.