How to split Logstash event containing multiple times the same pattern

余生长醉 提交于 2019-12-24 04:05:12

问题


I'm reading a xml formated input and I'm trying to extract each row of a html table as a separate event.

For example if my input is :

<xml> <table> <tr> <td> 1 </td> <td> 2 </td> </tr> <tr> <td> 3 </td> <td> 4 </td> </tr> </table> </xml>

I want the output to be :

{
       "message" => "<tr> <td> 1 </td> <td> 2 </td> </tr>",
      "@version" => "1",
    "@timestamp" => "2015-03-20T10:30:38.234Z",
          "host" => "VirtualBox"
}
{
       "message" => "<tr> <td> 3 </td> <td> 4 </td> </tr>",
      "@version" => "1",
    "@timestamp" => "2015-03-20T10:30:38.234Z",
          "host" => "VirtualBox"
}

The problem is I need to split an event into multiple event. Using the split filter didn't work because it removes the string used as "terminator".

I designed a custom grok pattern to extract the content of a html row : (?<data><tr>(.)*?</tr>)

Unfortunatly, this pattern only detects first occurrence and while there's a finite number of occurrences in a single xml, the number of rows is not known in advance.

Having a look at JIRA-703 on logstash website I'm afraid grok can not find a single pattern multiple times.(for now, Mars 2015)

Am I forced to code my own custom filter ? Is it possible to store each match of a grok filter as a new event ?

You can have a look at my filter

    input {
        stdin { }
    }

    filter {
        mutate {
            gsub => ["message", "<tr>", "[split]<tr>"]
        }
        mutate {
            gsub => ["message", "</tr>", "</tr>[split]"]
        }
        split {
            terminator => "[split]"
        }
        grok {
            patterns_dir => "../patterns"
            #voir pourquoi le meme pattern plusieurs fois ne fonctionne pas
            #https://logstash.jira.com/browse/LOGSTASH-703
            match => ["message", "%{HTML_ROW_LINE:data}" ]
        }
    }

    output {
        stdout {
            codec => rubydebug
        }
    }

I find that when I split the event before and after the line, the grok filter seems to not work anymore. I indeed retrieve what I want in the "message" field but no longer in the "data" field as wanted.

The strange thing is that I don't get a "_grokparsefailure" tag while I don't get a data field. This seems to indicate that there actually is a match, but it's not stored in a field.

来源:https://stackoverflow.com/questions/29164972/how-to-split-logstash-event-containing-multiple-times-the-same-pattern

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!