问题
I am new to the Python family and have been trying to solve merge two Excel files for days. I have researched merging endlessly and tried to adapt my code to fit my needs, but it hasn't been working. I was wondering if I could get any help of why my code isn't working. I feel that this could be a common problem to others using Python, so hopefully this will help out others as well. I appreciate any comments!
I have two excel files, 'Chinese Scores3.csv' and 'Chinese Scores4.csv' which I am trying to merge by an ID, which is unique to each company. Other than the company ID, there are no matching columns for each excel file. Also, not all companies are listed on both files. Some are listed both, but others are listed on either one or the other. I would like to attach all the information for a company ID together in one row on an excel sheet. i.e. the first excel file columns are ID, JanSales, FebSales, etc. and the second excel file columns are ID, CreditScore, EMMAScore, etc. The excel file I would like to create has columns: ID, JanSales, FebSales, CreditScore, EMMAScore all according to company ID.
Is this making sense? It's like using VLOOKUP in excel, but I would like to do this using Python. Anyway, here is my coding, which isn't working. I try manipulating it, but it isn't working. I hope to get feedback!
import sys
import csv
def main(arg):
headers= []
for arg in 'Chinese Scores3.csv':
with open(arg) as f:
curr = 'Chinese Scores3.csv'.reader(f).next()
headers.append(curr)
try:
keys=list( set(keys) & set (curr))
except NameError:
keys = curr
header = list(keys)
for h in headers:
header += [ k for k in h if k not in keys ]
data = {}
for arg in 'Chinese Scores4.csv':
with open(arg) as f:
reader = 'Chinese Scores4.csv'.DictReader(f)
for line in reader:
data_key = tuple([ line[k] for k in keys ])
if not data_key in data: data[data_key] = {}
for k in header:
try:
data[data_key][k] = line[k]
except KeyError:
pass
for key in data.keys():
for col in header:
if key in data and not col in data[key]:
del( data[key] )
print ','.join(header)
for key in sorted(data):
row = [ data[key][col] for col in header ]
print ','.join(row)
if __name__ == '__main__':
sys.exit( main( sys.argv[1:]) )
回答1:
While we could fix your code, I'd strongly recommend looking into the pandas library if you're going to be doing this sort of work instead. It makes life a lot easier, and often borderline trivial.
For example, if we had two csv files (although we could have started straight from Excel files if we wanted):
>>> !cat scores3.csv
ID,JanSales,FebSales
1,100,200
2,200,500
3,300,400
>>> !cat scores4.csv
ID,CreditScore,EMMAScore
2,good,Watson
3,okay,Thompson
4,not-so-good,NA
We could read these into objects called DataFrames (think of them sort of like Excel sheets):
>>> import pandas as pd
>>> s3 = pd.read_csv("scores3.csv")
>>> s4 = pd.read_csv("scores4.csv")
>>> s3
ID JanSales FebSales
0 1 100 200
1 2 200 500
2 3 300 400
>>> s4
ID CreditScore EMMAScore
0 2 good Watson
1 3 okay Thompson
2 4 not-so-good NaN
And then we can merge them on the ID column:
>>> merged = s3.merge(s4, on="ID", how="outer")
>>> merged
ID JanSales FebSales CreditScore EMMAScore
0 1 100 200 NaN NaN
1 2 200 500 good Watson
2 3 300 400 okay Thompson
3 4 NaN NaN not-so-good NaN
After which we could save it to a csv file or to an Excel file:
>>> merged.to_csv("merged.csv")
>>> merged.to_excel("merged.xlsx")
来源:https://stackoverflow.com/questions/17661836/looking-to-merge-two-excel-files-by-id-into-one-excel-file-using-python-2-7