Removing Airflow task logs

后端 未结 6 1878
走了就别回头了
走了就别回头了 2021-02-03 19:52

I\'m running 5 DAG\'s which have generated a total of about 6GB of log data in the base_log_folder over a months period. I just added a remote_base_log_folder

6条回答
  •  鱼传尺愫
    2021-02-03 20:40

    For your concrete problems, I have some suggestions. For those, you would always need a specialized logging config as described in this answer: https://stackoverflow.com/a/54195537/2668430

    • automatically remove old log files and rotate them

    I don't have any practical experience with the TimedRotatingFileHandler from the Python standard library yet, but you might give it a try: https://docs.python.org/3/library/logging.handlers.html#timedrotatingfilehandler

    It not only offers to rotate your files based on a time interval, but if you specify the backupCount parameter, it even deletes your old log files:

    If backupCount is nonzero, at most backupCount files will be kept, and if more would be created when rollover occurs, the oldest one is deleted. The deletion logic uses the interval to determine which files to delete, so changing the interval may leave old files lying around.

    Which sounds pretty much like the best solution for your first problem.


    • force airflow to not log on disk (base_log_folder), but only in remote storage?

    In this case you should specify the logging config in such a way that you do not have any logging handlers that write to a file, i.e. remove all FileHandlers.

    Rather, try to find logging handlers that send the output directly to a remote address. E.g. CMRESHandler which logs directly to ElasticSearch but needs some extra fields in the log calls. Alternatively, write your own handler class and let it inherit from the Python standard library's HTTPHandler.


    A final suggestion would be to combine both the TimedRotatingFileHandler and setup ElasticSearch together with FileBeat, so you would be able to store your logs inside ElasticSearch (i.e. remote), but you wouldn't store a huge amount of logs on your Airflow disk since they will be removed by the backupCount retention policy of your TimedRotatingFileHandler.

提交回复
热议问题