Looking to merge two Excel files by ID into one Excel file using Python 2.7

问题

I am new to the Python family and have been trying to solve merge two Excel files for days. I have researched merging endlessly and tried to adapt my code to fit my needs, but it hasn't been working. I was wondering if I could get any help of why my code isn't working. I feel that this could be a common problem to others using Python, so hopefully this will help out others as well. I appreciate any comments!

I have two excel files, 'Chinese Scores3.csv' and 'Chinese Scores4.csv' which I am trying to merge by an ID, which is unique to each company. Other than the company ID, there are no matching columns for each excel file. Also, not all companies are listed on both files. Some are listed both, but others are listed on either one or the other. I would like to attach all the information for a company ID together in one row on an excel sheet. i.e. the first excel file columns are ID, JanSales, FebSales, etc. and the second excel file columns are ID, CreditScore, EMMAScore, etc. The excel file I would like to create has columns: ID, JanSales, FebSales, CreditScore, EMMAScore all according to company ID.

Is this making sense? It's like using VLOOKUP in excel, but I would like to do this using Python. Anyway, here is my coding, which isn't working. I try manipulating it, but it isn't working. I hope to get feedback!

import sys
import csv

def main(arg):
    headers= []

    for arg in 'Chinese Scores3.csv':
        with open(arg) as f:
            curr = 'Chinese Scores3.csv'.reader(f).next()
            headers.append(curr)
            try:
                keys=list( set(keys) & set (curr))
            except NameError:
                keys = curr


    header = list(keys)
    for h in headers:
        header += [ k for k in h if k not in keys ]

    data = {}
    for arg in 'Chinese Scores4.csv':
        with open(arg) as f:
            reader = 'Chinese Scores4.csv'.DictReader(f)
            for line in reader:
                data_key = tuple([ line[k] for k in keys ])
                if not data_key in data: data[data_key] = {}
                for k in header:
                    try:
                        data[data_key][k] = line[k]
                    except KeyError:
                        pass

    for key in data.keys():
        for col in header:
            if key in data and not col in data[key]:
                del( data[key] )

    print ','.join(header)
    for key in sorted(data):
        row = [ data[key][col] for col in header ]
        print ','.join(row)

if __name__ == '__main__':
    sys.exit( main( sys.argv[1:]) )

回答1:

While we could fix your code, I'd strongly recommend looking into the pandas library if you're going to be doing this sort of work instead. It makes life a lot easier, and often borderline trivial.

For example, if we had two csv files (although we could have started straight from Excel files if we wanted):

>>> !cat scores3.csv
ID,JanSales,FebSales
1,100,200
2,200,500
3,300,400
>>> !cat scores4.csv
ID,CreditScore,EMMAScore
2,good,Watson
3,okay,Thompson
4,not-so-good,NA

We could read these into objects called DataFrames (think of them sort of like Excel sheets):

>>> import pandas as pd
>>> s3 = pd.read_csv("scores3.csv")
>>> s4 = pd.read_csv("scores4.csv")
>>> s3
   ID  JanSales  FebSales
0   1       100       200
1   2       200       500
2   3       300       400
>>> s4
   ID  CreditScore EMMAScore
0   2         good    Watson
1   3         okay  Thompson
2   4  not-so-good       NaN

And then we can merge them on the ID column:

>>> merged = s3.merge(s4, on="ID", how="outer")
>>> merged
   ID  JanSales  FebSales  CreditScore EMMAScore
0   1       100       200          NaN       NaN
1   2       200       500         good    Watson
2   3       300       400         okay  Thompson
3   4       NaN       NaN  not-so-good       NaN

After which we could save it to a csv file or to an Excel file:

>>> merged.to_csv("merged.csv")
>>> merged.to_excel("merged.xlsx")

来源：https://stackoverflow.com/questions/17661836/looking-to-merge-two-excel-files-by-id-into-one-excel-file-using-python-2-7

标签

excel

python-2.7

spreadsheet

merging-data