Time complexity of os.walk in Python

后端未结

关注

 3  1408

I\'ve to calculate the time-complexity of an algorithm, but in it I\'m calling os.walk which I can\'t consider as a single operation but many.

The sources of os.wal

相关标签:

3条回答

日久生厌

2021-01-14 12:45

Well... let's walk through the source :)

Docs: http://docs.python.org/2/library/os.html#os.walk

def walk(top, topdown=True, onerror=None, followlinks=False):
    islink, join, isdir = path.islink, path.join, path.isdir

    try:
        # Note that listdir and error are globals in this module due
        # to earlier import-*.


        # Should be O(1) since it's probably just reading your filesystem journal
        names = listdir(top)
    except error, err:
        if onerror is not None:
            onerror(err)
        return

    dirs, nondirs = [], []


    # O(n) where n = number of files in the directory
    for name in names:
        if isdir(join(top, name)):
            dirs.append(name)
        else:
            nondirs.append(name)

    if topdown:
        yield top, dirs, nondirs

    # Again O(n), where n = number of directories in the directory
    for name in dirs:
        new_path = join(top, name)
        if followlinks or not islink(new_path):

            # Generator so besides the recursive `walk()` call, no additional cost here.
            for x in walk(new_path, topdown, onerror, followlinks):
                yield x
    if not topdown:
        yield top, dirs, nondirs

Since it's a generator it all depends on how far you walk the tree, but it looks like O(n) where n is the total number of files/directories in the given path.

0 讨论(0)

别跟我提以往

2021-01-14 12:47

os.walk (unless you prune it, or have symlink issues) guarantees to list each directory in the subtree exactly once.

So, if you assume that listing a directory is linear on the number of entries in the directory,* then if there are N total directory entries in your subtree, os.walk will take O(N) time.

Or, if you want the time for walk to produce each value (the root, dirnames, filenames tuple): if those N directory entries are split among M subdirectories, then each of the M iterations takes amortized O(N/M) time.

* Really, that's up to your OS, C library, and filesystem not Python, and it can be much worse than O(N) for older filesystems… but let's ignore that.

0 讨论(0)
发布评论:

提交评论
- 加载中...
迷失自我

2021-01-14 12:49

This is too long for a comment: in CPython, a yield passes its result to the immediate caller, not directly to the ultimate consumer of the result. So, if you have recursion going R levels deep, a chain of yields at each level delivering a result back up the call stack to the ultimate consumer takes O(R) time. It also takes O(R) time to resume the R levels of recursive call to get back to the lowest level where the first yield occurred.

So each result yield'ed by walk() takes time proportional to the level in the directory tree at which the result is first yield'ed.

That's the theoretical ;-) truth. In practice, however, this makes approximately no difference unless the recursion is very deep. That's because the chain of yields, and the chain of generator resumptions, occurs "at C speed". In other words, it does take O(R) time, but the constant factor is so small most programs never notice this.

This is especially true of recursive generators like walk(), which almost never recurse deeply. Who has a directory tree nested 100 levels? Nope, me neither ;-)

0 讨论(0)
发布评论:

提交评论
- 加载中...