Python - how to find all intersections of two strings?

前端 未结 6 1693
花落未央
花落未央 2021-02-04 18:21

How to find all intersections (also called the longest common substrings) of two strings and their positions in both strings?

For example, if S1=\"never\"

相关标签:
6条回答
  • 2021-02-04 18:54

    This can be done in O(n+m) where n and m are lengths of input strings.

    The pseudocode is:

    function LCSubstr(S[1..m], T[1..n])
        L := array(1..m, 1..n)
        z := 0
        ret := {}
        for i := 1..m
            for j := 1..n
                if S[i] = T[j]
                    if i = 1 or j = 1
                        L[i,j] := 1
                    else
                        L[i,j] := L[i-1,j-1] + 1
                    if L[i,j] > z
                        z := L[i,j]
                        ret := {}
                    if L[i,j] = z
                        ret := ret ∪ {S[i-z+1..z]}
        return ret
    

    See the Longest_common_substring_problem wikipedia article for more details.

    0 讨论(0)
  • 2021-02-04 19:01

    Well, you're saying that you can't include any library. However, Python's standard difflib contains a function which does exactly what you expect. Considering that it is a Python interview question, familiarity with difflib might be what the interviewer expected.

    In [31]: import difflib
    
    In [32]: difflib.SequenceMatcher(None, "never", "forever").get_matching_blocks()
    Out[32]: [Match(a=1, b=3, size=4), Match(a=5, b=7, size=0)]
    
    
    In [33]: difflib.SequenceMatcher(None, "address", "oddness").get_matching_blocks()
    Out[33]: [Match(a=1, b=1, size=2), Match(a=4, b=4, size=3), Match(a=7, b=7, size=0)]
    

    You can always ignore the last Match tuple, since it's dummy (according to documentation).

    0 讨论(0)
  • 2021-02-04 19:02

    I'm assuming you only want substrings to match if they have the same absolute position within their respective strings. For example, "abcd", and "bcde" won't have any matches, even though both contain "bcd".

    a = "address"
    b = "oddness"
    
    #matches[x] is True if a[x] == b[x]
    matches = map(lambda x: x[0] == x[1], zip(list(a), list(b)))
    
    positions = filter(lambda x: matches[x], range(len(a)))
    substrings = filter(lambda x: x.find("_") == -1 and x != "","".join(map(lambda x: ["_", a[x]][matches[x]], range(len(a)))).split("_"))
    

    positions = [1, 2, 4, 5, 6]

    substrings = ['dd', 'ess']

    If you only want substrings, you can squish it into one line:

    filter(lambda x: x.find("_") == -1 and x != "","".join(map(lambda x: ["_", a[x]][map(lambda x: x[0] == x[1], zip(list(a), list(b)))[x]], range(len(a)))).split("_"))
    
    0 讨论(0)
  • 2021-02-04 19:03
    def  IntersectStrings( first,  second):
    x = list(first)
    #print x
    y = list(second)
    lst1= []
    lst2= []
    for i in x:
        if i in y:
            lst1.append(i)
    lst2 = sorted(lst1) + []
       # This  above step is an optional if it is required to be sorted      alphabetically use this or else remove it
    return ''.join(lst2)
    
    print IntersectStrings('hello','mello' )
    
    0 讨论(0)
  • 2021-02-04 19:07

    Here's what I could come up with:

    import itertools
    
    def longest_common_substring(s1, s2):
       set1 = set(s1[begin:end] for (begin, end) in
                  itertools.combinations(range(len(s1)+1), 2))
       set2 = set(s2[begin:end] for (begin, end) in
                  itertools.combinations(range(len(s2)+1), 2))
       common = set1.intersection(set2)
       maximal = [com for com in common
                  if sum((s.find(com) for s in common)) == -1 * (len(common)-1)]
       return [(s, s1.index(s), s2.index(s)) for s in maximal]
    

    Checking some values:

    >>> longest_common_substring('address', 'oddness')
    [('dd', 1, 1), ('ess', 4, 4)]
    >>> longest_common_substring('never', 'forever')
    [('ever', 1, 3)]
    >>> longest_common_substring('call', 'wall')
    [('all', 1, 1)]
    >>> longest_common_substring('abcd1234', '1234abcd')
    [('abcd', 0, 4), ('1234', 4, 0)]
    
    0 讨论(0)
  • 2021-02-04 19:09

    Batteries included!

    The difflib module might have some help for you - here is a quick and dirty side-by-side diff:

    >>> import difflib
    >>> list(difflib.ndiff("never","forever"))
    ['- n', '+ f', '+ o', '+ r', '  e', '  v', '  e', '  r']
    >>> diffs = list(difflib.ndiff("never","forever"))
    >>> for d in diffs:
    ...   print {' ': '  ', '-':'', '+':'    '}[d[0]]+d[1:]
    ...
     n
         f
         o
         r
       e
       v
       e
       r
    
    0 讨论(0)
提交回复
热议问题