Time complexity of os.walk in Python

后端 未结 3 1408
离开以前
离开以前 2021-01-14 12:21

I\'ve to calculate the time-complexity of an algorithm, but in it I\'m calling os.walk which I can\'t consider as a single operation but many.

The sources of os.wal

相关标签:
3条回答
  • 2021-01-14 12:45

    Well... let's walk through the source :)

    Docs: http://docs.python.org/2/library/os.html#os.walk

    def walk(top, topdown=True, onerror=None, followlinks=False):
        islink, join, isdir = path.islink, path.join, path.isdir
    
        try:
            # Note that listdir and error are globals in this module due
            # to earlier import-*.
    
    
            # Should be O(1) since it's probably just reading your filesystem journal
            names = listdir(top)
        except error, err:
            if onerror is not None:
                onerror(err)
            return
    
        dirs, nondirs = [], []
    
    
        # O(n) where n = number of files in the directory
        for name in names:
            if isdir(join(top, name)):
                dirs.append(name)
            else:
                nondirs.append(name)
    
        if topdown:
            yield top, dirs, nondirs
    
        # Again O(n), where n = number of directories in the directory
        for name in dirs:
            new_path = join(top, name)
            if followlinks or not islink(new_path):
    
                # Generator so besides the recursive `walk()` call, no additional cost here.
                for x in walk(new_path, topdown, onerror, followlinks):
                    yield x
        if not topdown:
            yield top, dirs, nondirs
    

    Since it's a generator it all depends on how far you walk the tree, but it looks like O(n) where n is the total number of files/directories in the given path.

    0 讨论(0)
  • 2021-01-14 12:47

    os.walk (unless you prune it, or have symlink issues) guarantees to list each directory in the subtree exactly once.

    So, if you assume that listing a directory is linear on the number of entries in the directory,* then if there are N total directory entries in your subtree, os.walk will take O(N) time.

    Or, if you want the time for walk to produce each value (the root, dirnames, filenames tuple): if those N directory entries are split among M subdirectories, then each of the M iterations takes amortized O(N/M) time.


    * Really, that's up to your OS, C library, and filesystem not Python, and it can be much worse than O(N) for older filesystems… but let's ignore that.

    0 讨论(0)
  • 2021-01-14 12:49

    This is too long for a comment: in CPython, a yield passes its result to the immediate caller, not directly to the ultimate consumer of the result. So, if you have recursion going R levels deep, a chain of yields at each level delivering a result back up the call stack to the ultimate consumer takes O(R) time. It also takes O(R) time to resume the R levels of recursive call to get back to the lowest level where the first yield occurred.

    So each result yield'ed by walk() takes time proportional to the level in the directory tree at which the result is first yield'ed.

    That's the theoretical ;-) truth. In practice, however, this makes approximately no difference unless the recursion is very deep. That's because the chain of yields, and the chain of generator resumptions, occurs "at C speed". In other words, it does take O(R) time, but the constant factor is so small most programs never notice this.

    This is especially true of recursive generators like walk(), which almost never recurse deeply. Who has a directory tree nested 100 levels? Nope, me neither ;-)

    0 讨论(0)
提交回复
热议问题