difflib | 易学教程

Getting more granular diffs from difflib (or a way to post-process a diff to achieve the same thing)

阅读更多关于 Getting more granular diffs from difflib (or a way to post-process a diff to achieve the same thing)

问题 Downloading this page and making a minor edit to it, changing the first 65 in this paragraph to 68 : I then parse both sources with BeauifulSoup and diff them with difflib . url = 'https://secure.ssa.gov/apps10/reference.nsf/links/02092016062645AM' response = urllib2.urlopen(url) content = response.read() # get response as list of lines url2 = 'file:///Users/Pyderman/projects/temp/02092016062645AM-modified.html' response2 = urllib2.urlopen(url2) content2 = response2.read() # get response as

difflib.get_close_matches() - Help getting desired result

阅读更多关于 difflib.get_close_matches() - Help getting desired result

问题 The basic gist of the program is to start with a list of employee names, then sort it. Wait for user to input "end" to stop populating the list of names (I have 100 names, I cut it short for the example). Afterwards, the user can enter an employee name and the program will run difflib.get_close_matches(). Here's the question; I'm getting a syntax error for get_close_matches. How should I be entering the difflib differently? Also; if you have any tips for making the code more efficient, please

In Python, is it possible to write a generators (context_diff) output to a text file?

阅读更多关于 In Python, is it possible to write a generators (context_diff) output to a text file?

问题 The difflib.context_diff method returns a generator, showing you the different lines of 2 compared strings. How can I write the result (the comparison), to a text file? In this example code, I want everything from line 4 to the end in the text file. >>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n'] >>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n'] >>> for line in context_diff(s1, s2, fromfile='before.py', tofile='after.py'): ... sys.stdout.write(line) # doctest: +NORMALIZE_WHITESPACE *

Merging dataframes

阅读更多关于 Merging dataframes

问题 I have been struggling with this problem all day. I have two dataframes as follows: Dataframe 1 - Billboards Dataframe 2 I would like to merge Dataframe 2 with Dataframe 1 based on song to end up with a dataframe that has SongId, Song, Rank and Year. The problem is that there are some variations in how the Songs are stored. ex: Song in Billboard can be macarena bayside boys mix while Song in Dataframe 2 might be macarena. I wanted to find similarities. 回答1: I think you would need to calculate

difflib.SequenceMatcher isjunk argument not considered?

阅读更多关于 difflib.SequenceMatcher isjunk argument not considered?

问题 In the python difflib library, is the SequenceMatcher class behaving unexpectedly, or am I misreading what the supposed behavior is? Why does the isjunk argument seem to not make any difference in this case? difflib.SequenceMatcher(None, "AA", "A A").ratio() return 0.8 difflib.SequenceMatcher(lambda x: x in ' ', "AA", "A A").ratio() returns 0.8 My understanding is that if space is omitted, the ratio should be 1. 回答1: This is happening because the ratio function uses total sequences' length

difflib.get_close_matches throw out names in a list if first answer isn't correct

阅读更多关于 difflib.get_close_matches throw out names in a list if first answer isn't correct

问题 Here's an updated version from my previous question here. I'm adding to the code where if the get_close_matches name isn't the name of the person they wanted, then discard the closest match and re-run the function and grab the second-closest match (now first, since the function would throw out the first match). Do you have any comments on how this can be written better? And work. >.> Here's what I have so far: def throwout(pickedName): employeeNames.remove(pickedName) pickedName = difflib.get

Multiple Spelling Results in a Dataframe 1

阅读更多关于 Multiple Spelling Results in a Dataframe 1

问题 I have some data containing spelling errors. I'm correcting them and scoring how close the spelling is using the following code: import pandas as pd import difflib Li_A = ["potato", "tomato", "squash", "apple", "pear"] Q = {'one' : pd.Series(["potat0", "toma3o", "s5uash", "ap8le", "pea7"], index=['a', 'b', 'c', 'd', 'e']), 'two' : pd.Series(["po1ato", "2omato", "squ0sh", "2pple", "p3ar"], index=['a', 'b', 'c', 'd', 'e'])} df_Q = pd.DataFrame(Q) # Define the function that Corrects & Scores the

python3, difflib SequenceMatcher

阅读更多关于 python3, difflib SequenceMatcher

问题 the following takes in two strings, compares differences and return them both as identicals as well as their differences, separated by spaces (maintaining the length of the longest sting. The commented area in the code, are the 4 strings that should be returned. from difflib import SequenceMatcher t1 = 'betty: backstreetvboysareback"give.jpg"LAlarrygarryhannyhref="ang"_self' t2 = 'bettyv: backstreetvboysareback"lifeislike"LAlarrygarryhannyhref="in.php"_self' #t1 = 'betty :

python 3, differences between two strings

阅读更多关于 python 3, differences between two strings

问题 I'd like to record the location of differences from both strings in a list (to remove them) ... preferably recording the highest separation point for each section, as these areas will have dynamic content. Compare these total chars 178. Two unique sections t1 = 'WhereTisthetotalnumberofght5y5wsjhhhhjhkmhm Thethreemethodsthatreturntheratioofmatchingtototalcharacterscangivedifferentresultsduetodifferinglevelsofapxxxxxxxproximation,although' and total chars 211. Two unique sections t2 =

python difflib character diff with unifed contextual format

阅读更多关于 python difflib character diff with unifed contextual format

问题 I need to display character difference per line in a unix unified diff like style. Is there a way to do that using difflib? I can get "unified diff" and "character per line diff" separately using difflib.unified_diff and difflib.Differ() (ndiff) respectively, but how can I combine them? This is what I am looking for: # # This is difflib.unified # >>> print ''.join(difflib.unified_diff('one\ntwo\nthree\n'.splitlines(1), 'ore\ntree\nemu\n'.splitlines(1), 'old', 'new')) --- old +++ new @@ -1,3