Determine prefix from a set of (similar) strings

前端 未结 9 2063
我在风中等你
我在风中等你 2020-11-27 14:25

I have a set of strings, e.g.

my_prefix_what_ever
my_prefix_what_so_ever
my_prefix_doesnt_matter

I simply want to find the longest common p

相关标签:
9条回答
  • 2020-11-27 14:53

    The second line of this employs the reduce function on each character in the input strings. It returns a list of N+1 elements where N is length of the shortest input string.

    Each element in lot is either (a) the input character, if all input strings match at that position, or (b) None. lot.index(None) is the position of the first None in lot: the length of the common prefix. out is that common prefix.

    val = ["axc", "abc", "abc"]
    lot = [reduce(lambda a, b: a if a == b else None, x) for x in zip(*val)] + [None]
    out = val[0][:lot.index(None)]
    
    0 讨论(0)
  • 2020-11-27 14:57

    Here's my solution:

    a = ["my_prefix_what_ever", "my_prefix_what_so_ever", "my_prefix_doesnt_matter"]
    
    prefix_len = len(a[0])
    for x in a[1 : ]:
        prefix_len = min(prefix_len, len(x))
        while not x.startswith(a[0][ : prefix_len]):
            prefix_len -= 1
    
    prefix = a[0][ : prefix_len]
    
    0 讨论(0)
  • 2020-11-27 14:57

    Here is another way of doing this using OrderedDict with minimal code.

    import collections
    import itertools
    
    def commonprefix(instrings):
        """ Common prefix of a list of input strings using OrderedDict """
    
        d = collections.OrderedDict()
    
        for instring in instrings:
            for idx,char in enumerate(instring):
                # Make sure index is added into key
                d[(char, idx)] = d.get((char,idx), 0) + 1
    
        # Return prefix of keys while value == length(instrings)
        return ''.join([k[0] for k in itertools.takewhile(lambda x: d[x] == len(instrings), d)])
    
    0 讨论(0)
  • 2020-11-27 15:01

    Just out of curiosity I figured out yet another way to do this:

    def common_prefix(strings):
    
        if len(strings) == 1:#rule out trivial case
            return strings[0]
    
        prefix = strings[0]
    
        for string in strings[1:]:
            while string[:len(prefix)] != prefix and prefix:
                prefix = prefix[:len(prefix)-1]
            if not prefix:
                break
    
        return prefix
    
    strings = ["my_prefix_what_ever","my_prefix_what_so_ever","my_prefix_doesnt_matter"]
    
    print common_prefix(strings)
    #Prints "my_prefix_"
    

    As Ned pointed out it's probably better to use os.path.commonprefix, which is a pretty elegant function.

    0 讨论(0)
  • 2020-11-27 15:03

    I had a slight variation of the problem and google sends me here, so I think it will be useful to document:

    I have a list like:

    • my_prefix_what_ever
    • my_prefix_what_so_ever
    • my_prefix_doesnt_matter
    • some_noise
    • some_other_noise

    So I would expect my_prefix to be returned. That can be done with:

    from collections import Counter
    
    def get_longest_common_prefix(values, min_length):
        substrings = [value[0: i-1] for value in values for i in range(min_length, len(value))]
        counter = Counter(substrings)
        # remove count of 1
        counter -= Counter(set(substrings))
        return max(counter, key=len)
    
    0 讨论(0)
  • 2020-11-27 15:06

    Here's a simple clean solution. The idea is to use zip() function to line up all the characters by putting them in a list of 1st characters, list of 2nd characters,...list of nth characters. Then iterate each list to check if they contain only 1 value.

    a = ["my_prefix_what_ever", "my_prefix_what_so_ever", "my_prefix_doesnt_matter"]
    
    list = [all(x[i] == x[i+1] for i in range(len(x)-1)) for x in zip(*a)]
    
    print a[0][:list.index(0) if list.count(0) > 0 else len(list)]
    

    output: my_prefix_

    0 讨论(0)
提交回复
热议问题