问题
What's the best way of getting just the difference from two multiline strings?
a = 'testing this is working \n testing this is working 1 \n'
b = 'testing this is working \n testing this is working 1 \n testing this is working 2'
diff = difflib.ndiff(a,b)
print ''.join(diff)
This produces:
t e s t i n g t h i s i s w o r k i n g
t e s t i n g t h i s i s w o r k i n g 1
+ + t+ e+ s+ t+ i+ n+ g+ + t+ h+ i+ s+ + i+ s+ + w+ o+ r+ k+ i+ n+ g+ + 2
What's the best way of getting exactly:
testing this is working 2
?
Would regex be the solution here?
回答1:
a = 'testing this is working \n testing this is working 1 \n'
b = 'testing this is working \n testing this is working 1 \n testing this is working 2'
splitA = set(a.split("\n"))
splitB = set(b.split("\n"))
diff = splitB.difference(splitA)
diff = ", ".join(diff) # ' testing this is working 2, more things if there were...'
Essentially making each string a set of lines, and taking the set difference - i.e. All things in B that are not in A. Then taking that result and joining it all into one string.
Edit: This is a conveluded way of saying what @ShreyasG said - [x for x if x not in y]...
回答2:
The easiest Hack, credits @Chris, by using split()
.
Note : you need to determine which is the longer string, and use that for split.
if len(a)>len(b):
res=''.join(a.split(b)) #get diff
else:
res=''.join(b.split(a)) #get diff
print(res.strip()) #remove whitespace on either sides
# driver values
IN : a = 'testing this is working \n testing this is working 1 \n'
IN : b = 'testing this is working \n testing this is working 1 \n testing this is working 2'
OUT : testing this is working 2
EDIT : thanks to @ekhumoro for another hack using replace
, with no need for any of the join
computation required.
if len(a)>len(b):
res=a.replace(b,'') #get diff
else:
res=b.replace(a,'') #get diff
回答3:
This is basically @Godron629's answer, but since I can't comment, I'm posting it here with a slight modification: changing difference
for symmetric_difference
so that the order of the sets doesn't matter.
a = 'testing this is working \n testing this is working 1 \n'
b = 'testing this is working \n testing this is working 1 \n testing this is working 2'
splitA = set(a.split("\n"))
splitB = set(b.split("\n"))
diff = splitB.symmetric_difference(splitA)
diff = ", ".join(diff) # ' testing this is working 2, some more things...'
回答4:
Building on @Chris_Rands comment, you can use the splitlines() operation too (if your strings are multi-lines and you want the line not present in one but the other):
b_s = b.splitlines()
a_s = a.splitlines()
[x for x in b_s if x not in a_s]
Expected output is:
[' testing this is working 2']
回答5:
import itertools as it
"".join(y for x, y in it.zip_longest(a, b) if x != y)
# ' testing this is working 2'
Alternatively
import collections as ct
ca = ct.Counter(a.split("\n"))
cb = ct.Counter(b.split("\n"))
diff = cb - ca
"".join(diff.keys())
回答6:
You could use the following function:
def __slave(a, b):
for i, l_a in enumerate(a):
if b == l_a:
return i
return -1
def diff(a, b):
t_b = b
c_i = 0
for c in a:
t_i = __slave(t_b, c)
if t_i != -1 and (t_i > c_i or t_i == c_i):
c_i = t_i
t_b = t_b[:c_i] + t_b[c_i+1:]
t_a = a
c_i = 0
for c in b:
t_i = __slave(t_a, c)
if t_i != -1 and (t_i > c_i or t_i == c_i):
c_i = t_i
t_a = t_a[:c_i] + t_a[c_i+1:]
return t_b + t_a
Usage sample print diff(a, b)
来源:https://stackoverflow.com/questions/46453075/python-getting-just-the-difference-between-strings