How to circumvent the fallacy of Python's os.path.commonprefix?

后端 未结 5 1260
耶瑟儿~
耶瑟儿~ 2020-12-18 19:00

My problem is to find the common path prefix of a given set of files.

Literally I was expecting that \"os.path.commonprefix\" would do just that. Unfortunat

相关标签:
5条回答
  • 2020-12-18 19:15

    I've made a small python package commonpath to find common paths from a list. Comes with a few nice options.

    https://github.com/faph/Common-Path

    0 讨论(0)
  • 2020-12-18 19:19

    Assuming you want the common directory path, one way is to:

    1. Use only directory paths as input. If your input value is a file name, call os.path.dirname(filename) to get its directory path.
    2. "Normalize" all the paths so that they are relative to the same thing and don't include double separators. The easiest way to do this is by calling os.path.abspath( ) to get the path relative to the root. (You might also want to use os.path.realpath( ) to remove symbolic links.)
    3. Add a final separator (found portably with os.path.sep or os.sep) to the end of all the normalized directory paths.
    4. Call os.path.dirname( ) on the result of os.path.commonprefix( ).

    In code (without removing symbolic links):

    def common_path(directories):
        norm_paths = [os.path.abspath(p) + os.path.sep for p in directories]
        return os.path.dirname(os.path.commonprefix(norm_paths))
    
    def common_path_of_filenames(filenames):
        return common_path([os.path.dirname(f) for f in filenames])
    
    0 讨论(0)
  • 2020-12-18 19:22

    It seems that this issue has been corrected in recent versions of Python. New in version 3.5 is the function os.path.commonpath(), which returns the common path instead of the common string prefix.

    0 讨论(0)
  • 2020-12-18 19:29

    A robust approach is to split the path into individual components and then find the longest common prefix of the component lists.

    Here is an implementation which is cross-platform and can be generalized easily to more than two paths:

    import os.path
    import itertools
    
    def components(path):
        '''
        Returns the individual components of the given file path
        string (for the local operating system).
    
        The returned components, when joined with os.path.join(), point to
        the same location as the original path.
        '''
        components = []
        # The loop guarantees that the returned components can be
        # os.path.joined with the path separator and point to the same
        # location:    
        while True:
            (new_path, tail) = os.path.split(path)  # Works on any platform
            components.append(tail)        
            if new_path == path:  # Root (including drive, on Windows) reached
                break
            path = new_path
        components.append(new_path)
    
        components.reverse()  # First component first 
        return components
    
    def longest_prefix(iter0, iter1):
        '''
        Returns the longest common prefix of the given two iterables.
        '''
        longest_prefix = []
        for (elmt0, elmt1) in itertools.izip(iter0, iter1):
            if elmt0 != elmt1:
                break
            longest_prefix.append(elmt0)
        return longest_prefix
    
    def common_prefix_path(path0, path1):
        return os.path.join(*longest_prefix(components(path0), components(path1)))
    
    # For Unix:
    assert common_prefix_path('/', '/usr') == '/'
    assert common_prefix_path('/usr/var1/log/', '/usr/var2/log/') == '/usr'
    assert common_prefix_path('/usr/var/log1/', '/usr/var/log2/') == '/usr/var'
    assert common_prefix_path('/usr/var/log', '/usr/var/log2') == '/usr/var'
    assert common_prefix_path('/usr/var/log', '/usr/var/log') == '/usr/var/log'
    # Only for Windows:
    # assert common_prefix_path(r'C:\Programs\Me', r'C:\Programs') == r'C:\Programs'
    
    0 讨论(0)
  • 2020-12-18 19:30

    Awhile ago I ran into this where os.path.commonprefix is a string prefix and not a path prefix as would be expected. So I wrote the following:

    def commonprefix(l):
        # this unlike the os.path.commonprefix version
        # always returns path prefixes as it compares
        # path component wise
        cp = []
        ls = [p.split('/') for p in l]
        ml = min( len(p) for p in ls )
    
        for i in range(ml):
    
            s = set( p[i] for p in ls )         
            if len(s) != 1:
                break
    
            cp.append(s.pop())
    
        return '/'.join(cp)
    

    it could be made more portable by replacing '/' with os.path.sep.

    0 讨论(0)
提交回复
热议问题