CloudWatch logs acting weird

I have two log files with multi-line log statements. Both of them have same datetime format at the begining of each log statement. The configuration looks like this:

state_file = /var/lib/awslogs/agent-state

[/opt/logdir/log1.0]
datetime_format = %Y-%m-%d %H:%M:%S
file = /opt/logdir/log1.0
log_stream_name = /opt/logdir/logs/log1.0
initial_position = start_of_file
multi_line_start_pattern = {datetime_format}
log_group_name = my.log.group


[/opt/logdir/log2-console.log]
datetime_format = %Y-%m-%d %H:%M:%S
file = /opt/logdir/log2-console.log
log_stream_name = /opt/logdir/log2-console.log
initial_position = start_of_file
multi_line_start_pattern = {datetime_format}
log_group_name = my.log.group

The cloudwatch logs agent is sending log1.0 logs correctly to my log group on cloudwatch, however, its not sending log files for log2-console.log.

awslogs.log says:

2016-11-15 08:11:41,308 - cwlogs.push.batch - WARNING - 3593 - Thread-4 - Skip event: {'timestamp': 1479196444000, 'start_position': 42330916L, 'end_position': 42331504L}, reason: timestamp is more than 2 hours in future.
2016-11-15 08:11:41,308 - cwlogs.push.batch - WARNING - 3593 - Thread-4 - Skip event: {'timestamp': 1479196451000, 'start_position': 42331504L, 'end_position': 42332092L}, reason: timestamp is more than 2 hours in future.

Though server time is correct. Also weird thing is Line numbers mentioned in start_position and end_position does not exist in actual log file being pushed.

Anyone else experiencing this issue?

I was able to fix this.

The state of awslogs was broken. The state is stored in a sqlite database in /var/awslogs/state/agent-state. You can access it via

sudo sqlite3 /var/awslogs/state/agent-state

sudo is needed to have write access.

List all streams with

select * from stream_state;

Look up your log stream and note the source_id which is part of a json data structure in the v column.

Then, list all records with this source_id (in my case it was 7675f84405fcb8fe5b6bb14eaa0c4bfd) in the push_state table

select * from push_state where k="7675f84405fcb8fe5b6bb14eaa0c4bfd";

The resulting record has a json data structure in the v column which contains a batch_timestamp. And this batch_timestamp seams to be wrong. It was in the past and any newer (more than 2 hours) log entries were not processed anymore.

The solution is to update this record. Copy the v column, replace the batch_timestamp with the current timestamp and update with something like

update push_state set v='... insert new value here ...' where k='7675f84405fcb8fe5b6bb14eaa0c4bfd';

Restart the service with

sudo /etc/init.d/awslogs restart

I hope it works for you!

Rajasekhar Vesangi

We had the same issue and the following steps fixed the issue.

If log groups are not updating with latest events: Run These steps:

Stopped the awslogs service
Deleted file /var/awslogs/state/agent-state
Updated /var/awslogs/etc/awslogs.conf configuration from hostaname to instance ID Ex:
```
log_stream_name = {hostname} to log_stream_name = {instance_id}   
```
Started awslogs service.

I was able to resolve this issue on Amazon Linux by:

sudo yum reinstall awslogs
sudo service awslogs restart

This method retained my config files in /var/awslogs/, though you may wish to back them up before a reinstall.

Note: In my troubleshooting, I had also deleted my Log Group via the AWS Console. The restart fully reloaded all historical logs, but at the present timestamp, which is of less value. I'm unsure if deleting the Log Group was this was necessary for this method to work. You might want to look at setting the initial_position config to end_of_file before you restart.

来源：https://stackoverflow.com/questions/40604940/cloudwatch-logs-acting-weird

标签

amazon-web-services

amazon-cloudwatch

cloudwatch

amazon-cloudwatchlogs