问题
I'm trying to use tensorboard on my local machine to read tensorflow logs on S3. Everything works but tensorboard continuously throws the following errors to the console. According to this the reason is that when Tensorflow s3 client checks if directory exists it firstly run Stat on it since s3 have no possibility to check whether directory exists. Then it checks if key with such name exists and fails with such error messages.
While this could be a wanted behavior for model serving to look for updated models and can be stopped using file_system_poll_wait_second
, I don't know how to stop it for training. In fact the same happens during training if you save checkpoints and logs in S3.
Suppressing these errors increasing the log level is not an option because Tensorflow still continuously polls S3 and you pay for these useless requests.
I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2020-11-23 11:41:02.502274: E tensorflow/core/platform/s3/aws_logging.cc:60] HTTP response code: 404
Exception name:
Error message: No response body.
6 response headers:
connection : close
content-type : application/xml
date : Mon, 23 Nov 2020 10:41:01 GMT
server : AmazonS3
x-amz-id-2 : ...
x-amz-request-id : ...
2020-11-23 11:41:02.502364: W tensorflow/core/platform/s3/aws_logging.cc:57] If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2020-11-23 11:41:02.502699: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2020-11-23 11:41:03.327409: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2020-11-23 11:41:03.491773: E tensorflow/core/platform/s3/aws_logging.cc:60] HTTP response code: 404
Any idea?
回答1:
I was wrong. TF just write logs to S3 and while the errors are related to the linked issue, this is the normal behavior. Extra costs are minimal because AWS doesn't charge you for data transfer between services in the same region, but only for the operations. The same apply using tensorboard with S3. For anyone interested in these topics, I made a repository here
来源:https://stackoverflow.com/questions/64969198/is-tensorflow-continuously-polling-a-s3-filesystem-during-training-or-using-tens