ElasticSearch

Elasticsearch : Strip HTML tags before indexing docs with html_strip filter not working

瘦欲@ 提交于 2021-02-07 11:19:19
问题 Given I have specified my html strip char filter in my custom analyser When I index a document with html content Then I expect the html to be strip out of the indexed content And on retrieval the returned doc from the index shoult not contain hmtl ACTUAL : The indexed doc contained html The retrieved doc contained html I have tried specifying the analyzer as index_analyzer as one would expect and a few others out of desperation search_analyzer and analyzer. Non seem to have any effect on the

Elasticsearch : Strip HTML tags before indexing docs with html_strip filter not working

房东的猫 提交于 2021-02-07 11:18:14
问题 Given I have specified my html strip char filter in my custom analyser When I index a document with html content Then I expect the html to be strip out of the indexed content And on retrieval the returned doc from the index shoult not contain hmtl ACTUAL : The indexed doc contained html The retrieved doc contained html I have tried specifying the analyzer as index_analyzer as one would expect and a few others out of desperation search_analyzer and analyzer. Non seem to have any effect on the

Partial query results due to Elasticsearch cluster health yellow

北战南征 提交于 2021-02-07 10:24:51
问题 My elasticsearch cluster health status is Yellow . Does it mean that I may get partial results? I am using amazon elastic instance and the summary on cluster health tab looks like this... Number of nodes 1 Number of data nodes 1 Active primary shards 206 Active shards 206 Unassigned shards 205 I will like to know how do I change the status from Yellow to Green 回答1: Cluster health yellow means one or more replicas shards are not allocated and elasticsearch query goes either to primary or

Partial query results due to Elasticsearch cluster health yellow

可紊 提交于 2021-02-07 10:24:21
问题 My elasticsearch cluster health status is Yellow . Does it mean that I may get partial results? I am using amazon elastic instance and the summary on cluster health tab looks like this... Number of nodes 1 Number of data nodes 1 Active primary shards 206 Active shards 206 Unassigned shards 205 I will like to know how do I change the status from Yellow to Green 回答1: Cluster health yellow means one or more replicas shards are not allocated and elasticsearch query goes either to primary or

pandas dataframe from a nested dictionary (elasticsearch result)

谁说我不能喝 提交于 2021-02-07 04:20:19
问题 I am having hard time translating results from elasticsearch aggregations to pandas. I am trying to write an abstract function which would take nested dictionary (arbitrary number of levels) and flatten them into a pandas dataframe Here is how a typical result look like -- edit : I added the parent key as well x1 = {u'xColor': {u'buckets': [{u'doc_count': 4, u'key': u'red', u'xMake': {u'buckets': [{u'doc_count': 3, u'key': u'honda', u'xCity': {u'buckets': [{u'doc_count': 2, u'key': u'ROME'},

FeatureCollection to geo_shape in Elasticsearch

微笑、不失礼 提交于 2021-02-07 04:00:49
问题 Whats the right way to translate a geojson FeatureCollection to a es geo_shape? I have a FeatureCollection looking like this: { "type": "FeatureCollection", "features": [ { "type": "Feature", "geometry": { "type": "Polygon", "coordinates": [[[1.96, 42.455],[1.985,42.445]]] } }, { "type": "Feature", "geometry": { "type": "Polygon", "coordinates": [...] } } ] } How can I translate this into the es geo_shape . Currently I just index it like that (dropping type: Feature and type:

elasticsearch dynamic query - Add another field to each document returned

谁都会走 提交于 2021-02-06 09:26:07
问题 What I need is very simple, but I am unable to find how to do it in Elasticsearch, possibly because of the complexity of what is required to be done. Input (two sample JSON documents) { "car" : 150, "bike" : 300 } { "car" : 100, "bike" : 200} What I want in return is that when I fire a search query it returns me the documents with an extra field inventory which is defined as the sum of number of cars and bikes. And in the sorted order. Sample Output: hits: [ { "car" : 150, "bike" : 300,

How to store date range data in elastic search (aws) and search for a range?

戏子无情 提交于 2021-02-05 20:38:31
问题 I am trying to store hotel room availability in elasticsearch. And then I need to search rooms those are available from a date till another date. I have come up with two ways to store data for availability, and they are as follows: Here availability dictionary store all dates and value of each date key is true of false, representing its available on that day or not. { "_id": "khg2uo47tyhgjwebu7624787", "room_type": "garden view", "hotel_name": "Cool hotel", "hotel_id": "jytu64r982u0299023",

How to increase vm.max_map_count?

两盒软妹~` 提交于 2021-02-05 20:21:21
问题 I'm trying to run Elastic search in an Ubuntu EC2 machine (t2.medium). But I'm getting the message: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144] How can I increase the vm.max_map_count? 回答1: To make it persistent, you can add this line: vm.max_map_count=262144 in your /etc/sysctl.conf and run $ sudo sysctl -p to reload configuration with new value 回答2: I use # sysctl -w vm.max_map_count=262144 And for the persistence configuration # echo "vm.max

How to increase vm.max_map_count?

烂漫一生 提交于 2021-02-05 20:12:05
问题 I'm trying to run Elastic search in an Ubuntu EC2 machine (t2.medium). But I'm getting the message: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144] How can I increase the vm.max_map_count? 回答1: To make it persistent, you can add this line: vm.max_map_count=262144 in your /etc/sysctl.conf and run $ sudo sysctl -p to reload configuration with new value 回答2: I use # sysctl -w vm.max_map_count=262144 And for the persistence configuration # echo "vm.max