Parse json in a list in logstash

陌路散爱 提交于 2019-12-19 03:35:09

问题


I have a json in the form of

[
    {
        "foo":"bar"
    }
]

I am trying to filter it using the json filter in logstash. But it doesn't seem to work. I found that I can't parse list json using the json filter in logstash. Can someone please tell me about any workaround for this?

UPDATE

My logs

IP - - 0.000 0.000 [24/May/2015:06:51:13 +0000] *"POST /c.gif HTTP/1.1"* 200 4 * user_id=UserID&package_name=SomePackageName&model=Titanium+S202&country_code=in&android_id=AndroidID&eT=1432450271859&eTz=GMT%2B05%3A30&events=%5B%7B%22eV%22%3A%22com.olx.southasia%22%2C%22eC%22%3A%22appUpdate%22%2C%22eA%22%3A%22app_activated%22%2C%22eTz%22%3A%22GMT%2B05%3A30%22%2C%22eT%22%3A%221432386324909%22%2C%22eL%22%3A%22packageName%22%7D%5D * "-" "-" "-"

URL decoded version of the above log is

IP - - 0.000 0.000 [24/May/2015:06:51:13  0000] *"POST /c.gif HTTP/1.1"* 200 4 * user_id=UserID&package_name=SomePackageName&model=Titanium S202&country_code=in&android_id=AndroidID&eT=1432450271859&eTz=GMT+05:30&events=[{"eV":"com.olx.southasia","eC":"appUpdate","eA":"app_activated","eTz":"GMT+05:30","eT":"1432386324909","eL":"packageName"}] * "-" "-" "-"

Please find below my config file for the above logs..

filter {

urldecode{
    field => "message"
}
 grok {
  match => ["message",'%{IP:clientip}%{GREEDYDATA} \[%{GREEDYDATA:timestamp}\] \*"%{WORD:method}%{GREEDYDATA}']
}

kv {
    field_split => "&? "
}
json{
    source=> "events"
}
geoip {
    source => "clientip"
}

}

I need to parse the events, ie events=[{"eV":"com.olx.southasia","eC":"appUpdate","eA":"app_activated","eTz":"GMT+05:30","eT":"1432386324909","eL":"packageName"}]


回答1:


I assume that you have your json in a file. You are right, you cannot use the json filter directly. You'll have to use the multiline codec and use the json filter afterwards.

The following config works for your given input. However, you might have to change it in order to properly separate your events. It depends on your needs and the json format of your file.

Logstash config:

input     {   
    file     {
        codec => multiline
        {
            pattern => "^\]" # Change to separate events
            negate => true
            what => previous               
        }
        path => ["/absolute/path/to/your/json/file"]
        start_position => "beginning"
        sincedb_path => "/dev/null" # This is just for testing
    }
}

filter     {
    mutate   {
            gsub => [ "message","\[",""]
            gsub => [ "message","\n",""]
        }
    json { source => message }
}

UPDATE

After your update I guess I've found the problem. Apparently you get a jsonparsefailure because of the square brackets. As a workaround you could manually remove them. Add the following mutate filter after your kv and before your json filter:

mutate  {
    gsub => [ "events","\]",""]
    gsub => [ "events","\[",""]
}

UPDATE 2

Alright, assuming your input looks like this:

[{"foo":"bar"},{"foo":"bar1"}]

Here are 4 options:

Option a) ugly gsub

An ugly workaround would be another gsub:

gsub => [ "event","\},\{",","]

But this would remove the inner relations so I guess you don't want to do that.

Option b) split

A better approach might be to use the split filter:

split {
    field => "event"
    terminator => ","
}
mutate  {
    gsub => [ "event","\]",""]
    gsub => [ "event","\[",""]
   }
json{
    source=> "event"
}

This would generate multiple events. (First with foo = bar and second with foo1 = bar1.)

Option c) mutate split

You might want to have all the values in one logstash event. You could use the mutate => split filter to generate an array and parse the json if an entry exists. Unfortunately you will have to set a conditional for each entry because logstash doesn't support loops in its config.

mutate  {
    gsub => [ "event","\]",""]
    gsub => [ "event","\[",""]
    split => [ "event", "," ]
   }

json{
    source=> "event[0]"
    target => "result[0]"
}

if 'event[1]' {
    json{
        source=> "event[1]"
        target => "result[1]"
    }
    if 'event[2]' {
        json{
            source=> "event[2]"
            target => "result[2]"
        }
    }
    # You would have to specify more conditionals if you expect even more dictionaries
}

Option d) Ruby

According to your comment I tried to find a ruby way. Following works (after your kv filter):

mutate  {
    gsub => [ "event","\]",""]
    gsub => [ "event","\[",""]
}

ruby  {
    init => "require 'json'"
    code => "
        e = event['event'].split(',')
        ary = Array.new
        e.each do |x|
            hash = JSON.parse(x)
            hash.each do |key, value|
                ary.push( { key =>  value } )
            end
        end
        event['result'] = ary
    "
}

Option e) Ruby

Use this approach after your kv filter (without setting a mutate filter):

ruby  {
    init => "require 'json'"
    code => "
            event['result'] = JSON.parse(event['event'])
    "
}

It will parse events like event=[{"name":"Alex","address":"NewYork"},{"name":"David","address":"NewJersey"}]

into:

"result" => [
    [0] {
           "name" => "Alex",
        "address" => "NewYork"
    },
    [1] {
           "name" => "David",
        "address" => "NewJersey"
    }

Since the behavior of the kv filter this does not support whitespaces. I hope you don't have any in your real inputs, do you?



来源:https://stackoverflow.com/questions/31782160/parse-json-in-a-list-in-logstash

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!