My problem is to find the common path prefix of a given set of files.
Literally I was expecting that \"os.path.commonprefix\" would do just that. Unfortunat
I've made a small python package commonpath
to find common paths from a list. Comes with a few nice options.
https://github.com/faph/Common-Path
Assuming you want the common directory path, one way is to:
os.path.dirname(filename)
to get its directory path.os.path.abspath( )
to get the path relative to the root. (You might also want to use os.path.realpath( )
to remove symbolic links.)os.path.sep
or os.sep
) to the end of all the normalized directory paths.os.path.dirname( )
on the result of os.path.commonprefix( )
.In code (without removing symbolic links):
def common_path(directories):
norm_paths = [os.path.abspath(p) + os.path.sep for p in directories]
return os.path.dirname(os.path.commonprefix(norm_paths))
def common_path_of_filenames(filenames):
return common_path([os.path.dirname(f) for f in filenames])
It seems that this issue has been corrected in recent versions of Python. New in version 3.5 is the function os.path.commonpath(), which returns the common path instead of the common string prefix.
A robust approach is to split the path into individual components and then find the longest common prefix of the component lists.
Here is an implementation which is cross-platform and can be generalized easily to more than two paths:
import os.path
import itertools
def components(path):
'''
Returns the individual components of the given file path
string (for the local operating system).
The returned components, when joined with os.path.join(), point to
the same location as the original path.
'''
components = []
# The loop guarantees that the returned components can be
# os.path.joined with the path separator and point to the same
# location:
while True:
(new_path, tail) = os.path.split(path) # Works on any platform
components.append(tail)
if new_path == path: # Root (including drive, on Windows) reached
break
path = new_path
components.append(new_path)
components.reverse() # First component first
return components
def longest_prefix(iter0, iter1):
'''
Returns the longest common prefix of the given two iterables.
'''
longest_prefix = []
for (elmt0, elmt1) in itertools.izip(iter0, iter1):
if elmt0 != elmt1:
break
longest_prefix.append(elmt0)
return longest_prefix
def common_prefix_path(path0, path1):
return os.path.join(*longest_prefix(components(path0), components(path1)))
# For Unix:
assert common_prefix_path('/', '/usr') == '/'
assert common_prefix_path('/usr/var1/log/', '/usr/var2/log/') == '/usr'
assert common_prefix_path('/usr/var/log1/', '/usr/var/log2/') == '/usr/var'
assert common_prefix_path('/usr/var/log', '/usr/var/log2') == '/usr/var'
assert common_prefix_path('/usr/var/log', '/usr/var/log') == '/usr/var/log'
# Only for Windows:
# assert common_prefix_path(r'C:\Programs\Me', r'C:\Programs') == r'C:\Programs'
Awhile ago I ran into this where os.path.commonprefix
is a string prefix and not a path prefix as would be expected. So I wrote the following:
def commonprefix(l):
# this unlike the os.path.commonprefix version
# always returns path prefixes as it compares
# path component wise
cp = []
ls = [p.split('/') for p in l]
ml = min( len(p) for p in ls )
for i in range(ml):
s = set( p[i] for p in ls )
if len(s) != 1:
break
cp.append(s.pop())
return '/'.join(cp)
it could be made more portable by replacing '/'
with os.path.sep
.