问题
I am reading the documentation for Python's difllib. According to the docs each, Differ delta gives a sequence
Code Meaning
'- ' line unique to sequence 1
'+ ' line unique to sequence 2
' ' line common to both sequences
'? ' line not present in either input sequence
But what about the "Change" operation? How do I get a "c " instruction similar to the results in Perl's sdiff?
回答1:
Show this script.
sdiff.py @ hungrysnake.net
http://hungrysnake.net/doc/software__sdiff_py.html
Perl's sdiff(Algorithm::Diff) dont think about "Matching rate", but python's sdiff.py think about it. =)
I have 2 text files.
$ cat text1.txt
aaaaaa
bbbbbb
cccccc
dddddd
eeeeee
ffffff
$ cat text2.txt
aaaaaa
bbbbbb
xxxxxxx
ccccccy
zzzzzzz
eeeeee
ffffff
I got side by side diff by sdiff command or Perl's sdiff(Algorithm::Diff).
$ sdiff text1.txt text2.txt
aaaaaa aaaaaa
bbbbbb bbbbbb
cccccc | xxxxxxx
dddddd | ccccccy
> zzzzzzz
eeeeee eeeeee
ffffff ffffff
Sdiff dont think about "Matching rate" =(
I got it by sdiff.py
$ sdiff.py text1.txt text2.txt
--- text1.txt (utf-8)
+++ text2.txt (utf-8)
1|aaaaaa 1|aaaaaa
2|bbbbbb 2|bbbbbb
| > 3|xxxxxxx
3|cccccc | 4|ccccccy
4|dddddd < |
| > 5|zzzzzzz
5|eeeeee 6|eeeeee
6|ffffff 7|ffffff
[ ] | +
[ <- ] 3|cccccc
[ -> ] 4|ccccccy
Sdiff.py think about "Matching rate" =)
I want result by sdiff.py. dont you ?
回答2:
There is no direct c
like code in difflib to show changed lines as in Perl's sdiff you talked about. But you can make one easily. In difflib's delta, the "changed lines" also have '- '
, but in contrast to the actually deleted lines, the next line in the delta is tagged with '? '
to mean that the line in the previous index of the delta is "changed", not deleted. Another purpose of this line in delta is that it acts as 'guide' as to where the changes are in the line.
So, if a line in the delta is tagged with '- '
, then there are four different cases depending on the next few lines of the delta:
Case 1: The line modified by inserting some characters
- The good bad
+ The good the bad
? ++++
Case 2: The line is modified by deleting some characters
- The good the bad
? ----
+ The good bad
Case 3: The line is modified by deleting and inserting and/or replacing some characters:
- The good the bad and ugly
? ^^ ----
+ The g00d bad and the ugly
? ^^ ++++
Case 4: The line is deleted
- The good the bad and the ugly
+ Our ratio is less than 0.75!
As you can see, the lines tagged with '? '
show exactly where what type of modification is made.
Note that difflib considers a line is deleted if the value of ratio() between the two lines being compared is less than 0.75. It is a value I found out by some tests.
So to infer a line as changed, you can do this. This will return the diffs with changed lines tagged with code 'c ', and unchanged lines tagged as 'u ', just like in Perl's sdiff:
def sdiffer(s1, s2):
differ = difflib.Differ()
diffs = list(differ.compare(s1, s2))
i = 0
sdiffs = []
length = len(diffs)
while i < length:
line = diffs[i][2:]
if diffs[i].startswith(' '):
sdiffs.append(('u', line))
elif diffs[i].startswith('+ '):
sdiffs.append(('+', line))
elif diffs[i].startswith('- '):
if i+1 < length and diffs[i+1].startswith('? '): # then diffs[i+2] starts with ('+ '), obviously
sdiffs.append(('c', line))
i += 3 if i + 3 < length and diffs[i + 3].startswith('? ') else 2
elif diffs[i+1].startswith('+ ') and i+2<length and diffs[i+2].startswith('? '):
sdiffs.append(('c', line))
i += 2
else:
sdiffs.append(('-', line))
i += 1
return sdiffs
Hope it helps.
P.S.: It is an old question, so I am not sure how well will my efforts be awarded. :-(
I just could not help answering this question, as I have been working a little with difflib lately.
回答3:
I don't know pretty much what the Perl's "Change" operation is. If it similar to PHP DIFF output, I solve my problem with this code :
def sdiffer(s1, s2):
differ = difflib.Differ()
diffs = list(differ.compare(s1, s2))
i = 0
sdiffs = []
length = len(diffs)
sequence = 0
while i < length:
line = diffs[i][2:]
if diffs[i].startswith(' '):
sequence +=1
sdiffs.append((sequence,'u', line))
elif diffs[i].startswith('+ '):
sequence +=1
sdiffs.append((sequence,'+', line))
elif diffs[i].startswith('- '):
sequence +=1
sdiffs.append((sequence,'-',diffs[i][2:]))
if i+1 < length and diffs[i+1].startswith('? '):
if diffs[i+3].startswith('?') and i+3 < length : # case 2
sequence +=1
sdiffs.append((sequence,'+',diffs[i+2][2:]))
i+=3
elif diffs[i+2].startswith('?') and i+2 < length: # case 3
sequence +=1
sdiffs.append((sequence,'+',diffs[i+2][2:]))
i+=2
elif diffs[i+1].startswith('+ ') and i+2<length and diffs[i+2].startswith('? '): # case 1
sequence +=1
sdiffs.append((sequence,'+', diffs[i+1][2:]))
i += 2
else: # the line is deleted and inserted new line # case 4
sequence +=1
sdiffs.append((sequence,'+', diffs[i+1][2:]))
i+=1
i += 1
return sdiffs
Thanks @Sнаđошƒаӽ for your code.
来源:https://stackoverflow.com/questions/15938605/python-difflib-how-to-get-sdiff-sequences-with-change-op