python running coverage on never ending process

前端 未结 3 601
夕颜
夕颜 2021-02-05 09:01

I have a multi processed web server with processes that never end, I would like to check my code coverage on the whole project in a live environment (not only from tests).

3条回答
  •  轻奢々
    轻奢々 (楼主)
    2021-02-05 09:50

    Apparently, it is not possible to control coverage very well with multiple Threads. Once different thread are started, stopping the Coverage object will stop all coverage and start will only restart it in the "starting" Thread. So your code basically stops the coverage after 2 seconds for all Thread other than the CoverageThread.

    I played a bit with the API and it is possible to access the measurments without stopping the Coverage object. So you could launch a thread that save the coverage data periodically, using the API. A first implementation would be something like in this

    import threading
    from time import sleep
    from coverage import Coverage
    from coverage.data import CoverageData, CoverageDataFiles
    from coverage.files import abs_file
    
    cov = Coverage(config_file=True)
    cov.start()
    
    
    def get_data_dict(d):
        """Return a dict like d, but with keys modified by `abs_file` and
        remove the copied elements from d.
        """
        res = {}
        keys = list(d.keys())
        for k in keys:
            a = {}
            lines = list(d[k].keys())
            for l in lines:
                v = d[k].pop(l)
                a[l] = v
            res[abs_file(k)] = a
        return res
    
    
    class CoverageLoggerThread(threading.Thread):
        _kill_now = False
        _delay = 2
    
        def __init__(self, main=True):
            self.main = main
            self._data = CoverageData()
            self._fname = cov.config.data_file
            self._suffix = None
            self._data_files = CoverageDataFiles(basename=self._fname,
                                                 warn=cov._warn)
            self._pid = os.getpid()
            super(CoverageLoggerThread, self).__init__()
    
        def shutdown(self):
            self._kill_now = True
    
        def combine(self):
            aliases = None
            if cov.config.paths:
                from coverage.aliases import PathAliases
                aliases = PathAliases()
                for paths in self.config.paths.values():
                    result = paths[0]
                    for pattern in paths[1:]:
                        aliases.add(pattern, result)
    
            self._data_files.combine_parallel_data(self._data, aliases=aliases)
    
        def export(self, new=True):
            cov_report = cov
            if new:
                cov_report = Coverage(config_file=True)
                cov_report.load()
            self.combine()
            self._data_files.write(self._data)
            cov_report.data.update(self._data)
            cov_report.html_report(directory="coverage_report_data.html")
            cov_report.report(show_missing=True)
    
        def _collect_and_export(self):
            new_data = get_data_dict(cov.collector.data)
            if cov.collector.branch:
                self._data.add_arcs(new_data)
            else:
                self._data.add_lines(new_data)
            self._data.add_file_tracers(get_data_dict(cov.collector.file_tracers))
            self._data_files.write(self._data, self._suffix)
    
            if self.main:
                self.export()
    
        def run(self):
            while True:
                sleep(CoverageLoggerThread._delay)
                if self._kill_now:
                    break
    
                self._collect_and_export()
    
            cov.stop()
    
            if not self.main:
                self._collect_and_export()
                return
    
            self.export(new=False)
            print("End of the program. I was killed gracefully :)")
    

    A more stable version can be found in this GIST. This code basically grab the info collected by the collector without stopping it. The get_data_dict function take the dictionary in the Coverage.collector and pop the available data. This should be safe enough so you don't lose any measurement.
    The report files get updated every _delay seconds.

    But if you have multiple process running, you need to add extra efforts to make sure all the process run the CoverageLoggerThread. This is the patch_multiprocessing function, monkey patched from the coverage monkey patch...
    The code is in the GIST. It basically replaces the original Process with a custom process, which start the CoverageLoggerThread just before running the run method and join the thread at the end of the process. The script main.py permits to launch different tests with threads and processes.

    There is 2/3 drawbacks to this code that you need to be carefull of:

    • It is a bad idea to use the combine function concurrently as it performs comcurrent read/write/delete access to the .coverage.* files. This means that the function export is not super safe. It should be alright as the data is replicated multiple time but I would do some testing before using it in production.

    • Once the data have been exported, it stays in memory. So if the code base is huge, it could eat some ressources. It is possible to dump all the data and reload it but I assumed that if you want to log every 2 seconds, you do not want to reload all the data every time. If you go with a delay in minutes, I would create a new _data every time, using CoverageData.read_file to reload previous state of the coverage for this process.

    • The custom process will wait for _delay before finishing as we join the CoverageThreadLogger at the end of the process so if you have a lot of quick processes, you want to increase the granularity of the sleep to be able to detect the end of the Process more quickly. It just need a custom sleep loop that break on _kill_now.

    Let me know if this help you in some way or if it is possible to improve this gist.


    EDIT: It seems you do not need to monkey patch the multiprocessing module to start automatically a logger. Using the .pth in your python install you can use a environment variable to start automatically your logger on new processes:

    # Content of coverage.pth in your site-package folder
    import os
    if "COVERAGE_LOGGER_START" in os.environ:
        import atexit
        from coverage_logger import CoverageLoggerThread
        thread_cov = CoverageLoggerThread(main=False)
        thread_cov.start()
        def close_cov()
            thread_cov.shutdown()
            thread_cov.join()
        atexit.register(close_cov)
    

    You can then start your coverage logger with COVERAGE_LOGGER_START=1 python main.y

提交回复
热议问题