问题
I'm reading a xml formated input and I'm trying to extract each row of a html table as a separate event.
For example if my input is :
<xml> <table> <tr> <td> 1 </td> <td> 2 </td> </tr> <tr> <td> 3 </td> <td> 4 </td> </tr> </table> </xml>
I want the output to be :
{
"message" => "<tr> <td> 1 </td> <td> 2 </td> </tr>",
"@version" => "1",
"@timestamp" => "2015-03-20T10:30:38.234Z",
"host" => "VirtualBox"
}
{
"message" => "<tr> <td> 3 </td> <td> 4 </td> </tr>",
"@version" => "1",
"@timestamp" => "2015-03-20T10:30:38.234Z",
"host" => "VirtualBox"
}
The problem is I need to split an event into multiple event. Using the split filter didn't work because it removes the string used as "terminator".
I designed a custom grok pattern to extract the content of a html row :
(?<data><tr>(.)*?</tr>)
Unfortunatly, this pattern only detects first occurrence and while there's a finite number of occurrences in a single xml, the number of rows is not known in advance.
Having a look at JIRA-703 on logstash website I'm afraid grok can not find a single pattern multiple times.(for now, Mars 2015)
Am I forced to code my own custom filter ? Is it possible to store each match of a grok filter as a new event ?
You can have a look at my filter
input {
stdin { }
}
filter {
mutate {
gsub => ["message", "<tr>", "[split]<tr>"]
}
mutate {
gsub => ["message", "</tr>", "</tr>[split]"]
}
split {
terminator => "[split]"
}
grok {
patterns_dir => "../patterns"
#voir pourquoi le meme pattern plusieurs fois ne fonctionne pas
#https://logstash.jira.com/browse/LOGSTASH-703
match => ["message", "%{HTML_ROW_LINE:data}" ]
}
}
output {
stdout {
codec => rubydebug
}
}
I find that when I split the event before and after the line, the grok filter seems to not work anymore. I indeed retrieve what I want in the "message" field but no longer in the "data" field as wanted.
The strange thing is that I don't get a "_grokparsefailure" tag while I don't get a data field. This seems to indicate that there actually is a match, but it's not stored in a field.
来源:https://stackoverflow.com/questions/29164972/how-to-split-logstash-event-containing-multiple-times-the-same-pattern