问题
I am trying to compare two csv files in python and save the difference to a third csv file in python 2.7.
import csv
f1 = open ("olddata/file1.csv")
oldFile1 = csv.reader(f1)
oldList1 = []
for row in oldFile1:
oldList1.append(row)
f2 = open ("newdata/file2.csv")
oldFile2 = csv.reader(f2)
oldList2 = []
for row in oldFile2:
oldList2.append(row)
f1.close()
f2.close()
set1 = tuple(oldList1)
set2 = tuple(oldList2)
print oldList2.difference(oldList1)
I get the error message:
Traceback (most recent call last):
File "compare.py", line 21, in <module>
print oldList2.difference(oldList1)
AttributeError: 'list' object has no attribute 'difference'
I am new to python, and coding in general, and I am not done with this code just yet (I have to make sure to store the differences to a variable and write the difference to a new csv file.). I have been trying to solve this all day and I simply can't. Your help would be greatly appreciated.
回答1:
What do you mean by difference? The answer to that gives you two distinct possibilities.
If a row is considered same when all columns are same, then you can get your answer via the following code:
import csv
f1 = open ("olddata/file1.csv")
oldFile1 = csv.reader(f1)
oldList1 = []
for row in oldFile1:
oldList1.append(row)
f2 = open ("newdata/file2.csv")
oldFile2 = csv.reader(f2)
oldList2 = []
for row in oldFile2:
oldList2.append(row)
f1.close()
f2.close()
print [row for row in oldList1 if row not in oldList2]
However, if two rows are same if a certain key field (i.e. column) is same, then the following code will give you your answer:
import csv
f1 = open ("olddata/file1.csv")
oldFile1 = csv.reader(f1)
oldList1 = []
for row in oldFile1:
oldList1.append(row)
f2 = open ("newdata/file2.csv")
oldFile2 = csv.reader(f2)
oldList2 = []
for row in oldFile2:
oldList2.append(row)
f1.close()
f2.close()
keyfield = 0 # Change this for choosing the column number
oldList2keys = [row[keyfield] for row in oldList2]
print [row for row in oldList1 if row[keyfield] not in oldList2keys]
Note: The above code might run slow for extremely large files. If instead, you wish to speed up code through hashing, you can use set
after converting the oldList
s using the following code:
set1 = set(tuple(row) for row in oldList1)
set2 = set(tuple(row) for row in oldList2)
After this, you can use set1.difference(set2)
回答2:
import csv
def read_csv_file(filename):
res = []
with open(filename) as f:
for line in csv.reader(f):
res.append(line)
oldList1 = read_csv_file("olddata/file1.csv")
oldList2 = read_csv_file("olddata/file2.csv")
difference_list = []
for a,b in zip(oldList1,oldList2):
if a != b:
difference_list.append(a + '\t' + b)
Eventually you have a list of items and you can just write them to file.
EDIT: In this situation, [a,b,c] vs [b,c,a] will fail. If you know that [a,b,c] vs [b,c,a] should return no difference, use the following code pls.
import csv
def read_csv_file(filename):
res = []
with open(filename) as f:
for line in csv.reader(f):
res.append(line)
oldList1 = read_csv_file("olddata/file1.csv")
oldList2 = read_csv_file("olddata/file2.csv")
difference_list = []
for a in oldList1:
for b in oldList2:
if a != b:
difference_list.append(a + '\t' + b)
回答3:
The error is correct: tuple has no "difference" method.
I guess you want to use set (and make the elements immutable)?
set1 = set([tuple(item) for item in oldList1])
set2 = set([tuple(item) for item in oldList2])
来源:https://stackoverflow.com/questions/30852710/compare-2-seperate-csv-files-and-write-difference-to-a-new-csv-file-python-2-7