difflib

Getting more granular diffs from difflib (or a way to post-process a diff to achieve the same thing)

一曲冷凌霜 提交于 2019-12-19 11:26:55
问题 Downloading this page and making a minor edit to it, changing the first 65 in this paragraph to 68 : I then parse both sources with BeauifulSoup and diff them with difflib . url = 'https://secure.ssa.gov/apps10/reference.nsf/links/02092016062645AM' response = urllib2.urlopen(url) content = response.read() # get response as list of lines url2 = 'file:///Users/Pyderman/projects/temp/02092016062645AM-modified.html' response2 = urllib2.urlopen(url2) content2 = response2.read() # get response as

difflib.get_close_matches() - Help getting desired result

牧云@^-^@ 提交于 2019-12-13 04:42:33
问题 The basic gist of the program is to start with a list of employee names, then sort it. Wait for user to input "end" to stop populating the list of names (I have 100 names, I cut it short for the example). Afterwards, the user can enter an employee name and the program will run difflib.get_close_matches(). Here's the question; I'm getting a syntax error for get_close_matches. How should I be entering the difflib differently? Also; if you have any tips for making the code more efficient, please

In Python, is it possible to write a generators (context_diff) output to a text file?

故事扮演 提交于 2019-12-13 03:58:19
问题 The difflib.context_diff method returns a generator, showing you the different lines of 2 compared strings. How can I write the result (the comparison), to a text file? In this example code, I want everything from line 4 to the end in the text file. >>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n'] >>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n'] >>> for line in context_diff(s1, s2, fromfile='before.py', tofile='after.py'): ... sys.stdout.write(line) # doctest: +NORMALIZE_WHITESPACE *

Merging dataframes

自作多情 提交于 2019-12-13 03:55:58
问题 I have been struggling with this problem all day. I have two dataframes as follows: Dataframe 1 - Billboards Dataframe 2 I would like to merge Dataframe 2 with Dataframe 1 based on song to end up with a dataframe that has SongId, Song, Rank and Year. The problem is that there are some variations in how the Songs are stored. ex: Song in Billboard can be macarena bayside boys mix while Song in Dataframe 2 might be macarena. I wanted to find similarities. 回答1: I think you would need to calculate

difflib.SequenceMatcher isjunk argument not considered?

喜你入骨 提交于 2019-12-12 13:08:10
问题 In the python difflib library, is the SequenceMatcher class behaving unexpectedly, or am I misreading what the supposed behavior is? Why does the isjunk argument seem to not make any difference in this case? difflib.SequenceMatcher(None, "AA", "A A").ratio() return 0.8 difflib.SequenceMatcher(lambda x: x in ' ', "AA", "A A").ratio() returns 0.8 My understanding is that if space is omitted, the ratio should be 1. 回答1: This is happening because the ratio function uses total sequences' length

difflib.get_close_matches throw out names in a list if first answer isn't correct

百般思念 提交于 2019-12-11 19:36:31
问题 Here's an updated version from my previous question here. I'm adding to the code where if the get_close_matches name isn't the name of the person they wanted, then discard the closest match and re-run the function and grab the second-closest match (now first, since the function would throw out the first match). Do you have any comments on how this can be written better? And work. >.> Here's what I have so far: def throwout(pickedName): employeeNames.remove(pickedName) pickedName = difflib.get

Multiple Spelling Results in a Dataframe 1

孤人 提交于 2019-12-11 17:37:23
问题 I have some data containing spelling errors. I'm correcting them and scoring how close the spelling is using the following code: import pandas as pd import difflib Li_A = ["potato", "tomato", "squash", "apple", "pear"] Q = {'one' : pd.Series(["potat0", "toma3o", "s5uash", "ap8le", "pea7"], index=['a', 'b', 'c', 'd', 'e']), 'two' : pd.Series(["po1ato", "2omato", "squ0sh", "2pple", "p3ar"], index=['a', 'b', 'c', 'd', 'e'])} df_Q = pd.DataFrame(Q) # Define the function that Corrects & Scores the

python3, difflib SequenceMatcher

房东的猫 提交于 2019-12-11 16:05:24
问题 the following takes in two strings, compares differences and return them both as identicals as well as their differences, separated by spaces (maintaining the length of the longest sting. The commented area in the code, are the 4 strings that should be returned. from difflib import SequenceMatcher t1 = 'betty: backstreetvboysareback"give.jpg"LAlarrygarryhannyhref="ang"_self' t2 = 'bettyv: backstreetvboysareback"lifeislike"LAlarrygarryhannyhref="in.php"_self' #t1 = 'betty :

python 3, differences between two strings

余生颓废 提交于 2019-12-11 12:28:02
问题 I'd like to record the location of differences from both strings in a list (to remove them) ... preferably recording the highest separation point for each section, as these areas will have dynamic content. Compare these total chars 178. Two unique sections t1 = 'WhereTisthetotalnumberofght5y5wsjhhhhjhkmhm Thethreemethodsthatreturntheratioofmatchingtototalcharacterscangivedifferentresultsduetodifferinglevelsofapxxxxxxxproximation,although' and total chars 211. Two unique sections t2 =

python difflib character diff with unifed contextual format

主宰稳场 提交于 2019-12-11 09:13:33
问题 I need to display character difference per line in a unix unified diff like style. Is there a way to do that using difflib? I can get "unified diff" and "character per line diff" separately using difflib.unified_diff and difflib.Differ() (ndiff) respectively, but how can I combine them? This is what I am looking for: # # This is difflib.unified # >>> print ''.join(difflib.unified_diff('one\ntwo\nthree\n'.splitlines(1), 'ore\ntree\nemu\n'.splitlines(1), 'old', 'new')) --- old +++ new @@ -1,3