I am trying to use Logs Insights with data containing JSON in one of the fields, and to parse the JSON fields
My data looks like the following when I put it in insights with the starter code
fields @timestamp, @message
| sort @timestamp desc
| limit 25
How can I easily extract the path
variable in my nested JSON to perform aggregations on it ? By looking at some documentation, I thought @message.path
would work but it does not seem so. Has anyone successfully interpreted JSON logs in Insights
EDIT : Sample of what my data looks like
#
@timestamp
@message
1
2018-12-19 23:42:52.000
I, [2018-12-19T23:42:52.629855 #23447] INFO -- : [2ce588f1-c27d-4a55-ac05-62a75b39e762] {"method":"GET","path":"/api/v1/professionals/ID","format":"json","controller":"API::V1::Public::ProfessionalsController","action":"show","status":200,"duration":285.27,"view":222.36,"time":"2018-12-19T23:42:52.344+00:00","params":{"include":"user,tags,promotions,company_sector,similar_professionals.tags,similar_professionals.user","format":"json","compress":false,"id":"ID"},"@timestamp":"2018-12-19T23:42:52.629Z","@version":"1","message":"[200] GET /api/v1/professionals/ID (API::V1::Public::ProfessionalsController#show)"}
@logStream i-05d1d61ab853517a0
@message I, [2018-12-19T23:42:52.629855 #23447] INFO -- : [2ce588f1-c27d-4a55-ac05-62a75b39e762] {"method":"GET","path":"/api/v1/professionals/ID","format":"json","controller":"API::V1::Public::ProfessionalsController","action":"show","status":200,"duration":285.27,"view":222.36,"time":"2018-12-19T23:42:52.344+00:00","params":{"include":"xxx","format":"json","compress":false,"id":"ID"},"@timestamp":"2018-12-19T23:42:52.629Z","@version":"1","message":"[200] GET /api/v1/professionals/ID (API::V1::Public::ProfessionalsController#show)"}
@timestamp 1545262972000
2
2018-12-19 23:42:16.000
I, [2018-12-19T23:42:16.723472 #851] INFO -- : [ea712503-eb86-4a6e-ab38-ddbcd6c2b4d0] {"method":"GET","path":"/api/v1/heartbeats/new","format":"json","controller":"API::V1::Public::HeartbeatsController","action":"new","status":201,"duration":9.97,"view":3.2,"time":"2018-12-19T23:42:16.712+00:00","params":{"format":"json","compress":false},"@timestamp":"2018-12-19T23:42:16.722Z","@version":"1","message":"[201] GET /api/v1/heartbeats/new (API::V1::Public::HeartbeatsController#new)"}
You can use the parse
command to extract fields.
If @message
is
I, [2018-12-11T13:20:27] INFO -- : {"method":"GET"}
Then you extract the fields like so:
fields @timestamp, @message
| parse "I, [*T*] INFO -- : {"method":"*"}" as @date, @time, @method
| filter method=GET
| sort @timestamp desc
| limit 20
The documentation is rather light for now. I am able to get results by replacing the wildcard *
with a regular expression, but then the parsing fails.
Building on @pyb insights, I was able to use parse @message '"path":"*"' as path
to extract the path from any place in the @message
.
You can go on to get your method by piping another parse @message '"method":"*"' as method
without concern for ordering as it is a second global plain text search on @message
In the case where your @message
is:
I, [2018-12-19T23:42:52.629855 #23447] INFO -- : [2ce588f1-c27d-4a55-ac05-62a75b39e762] {"method":"GET","path":"/api/v1/professionals/ID","format":"json","controller":"API::V1::Public::ProfessionalsController","action":"show","status":200,"duration":285.27,"view":222.36,"time":"2018-12-19T23:42:52.344+00:00","params":{"include":"xxx","format":"json","compress":false,"id":"ID"},"@timestamp":"2018-12-19T23:42:52.629Z","@version":"1","message":"[200] GET /api/v1/professionals/ID (API::V1::Public::ProfessionalsController#show)"}
Using:
parse @message '"path":"*"' as path | parse @message '"method":"*"' as method
will result in the fields: path = '/api/v1/professionals/ID'
and method = 'GET'
Note that this is still simply string parsing and as such, it has no concept of nested keys like params.format
would not find json
, however using just format
would, so long as there wasn't another format
string anywhere in your @message
.
Also note that this is for the case where Insights is not discovering your JSON in the message. I belive this is the case that @pyb was referring to in this answer. My logs aren't being discovered either using the following format
info - Request: {"method":"POST","path":"/auth/login/","body":{"login":{"email":"email@example.com","password":"********"}},"uuid":"36d76df2-aec4-4549-8b73-f237e8f14e23","ip":"*.*.*.*"}
CloudWatch Insights Logs automatically discovers fields for the following log types:
Lambda logs
CloudWatch Logs Insights automatically discovers log fields in Lambda logs, but only for the first embedded JSON fragment in each log event (Note: emphasis mine). If a Lambda log event contains multiple JSON fragments, you can parse and extract the log fields by using the parse command. For more information, see Fields in JSON Logs.
CloudTrail logs
See fields in JSON logs.
Source: Supported Logs and Discovered Fields
If @message
is I, [2018-12-11T13:20:27] INFO -- : {"method":"GET"}
Then you can select and filter the fields like so:
fields @timestamp, @message, method
| filter method = "GET"
| sort @timestamp desc
It works with nested fields too, i.e. params.format = "json"
or results.0.firstName = "Paul"
.
Would this regex query within the parse command help you?
filter @message like / \"path\":\"/
| parse @message /(?<@endpt>((\/[a-zA-Z0-9_{}()-?]+){1,}))/
Good luck!
来源:https://stackoverflow.com/questions/53725133/amazon-cloudwatch-logs-insights-with-json-fields