How do I merge two CSV files based on field and keep same number of attributes on each record?

前端 未结 3 1246
感情败类
感情败类 2021-02-06 11:00

I am attempting to merge two CSV files based on a specific field in each file.

file1.csv

id,attr1,attr2,attr3
1,True,7,\"Purple\"
2,Fal         


        
3条回答
  •  傲寒
    傲寒 (楼主)
    2021-02-06 11:48

    If we're not using pandas, I'd refactor to something like

    import csv
    from collections import OrderedDict
    
    filenames = "file1.csv", "file2.csv"
    data = OrderedDict()
    fieldnames = []
    for filename in filenames:
        with open(filename, "rb") as fp: # python 2
            reader = csv.DictReader(fp)
            fieldnames.extend(reader.fieldnames)
            for row in reader:
                data.setdefault(row["id"], {}).update(row)
    
    fieldnames = list(OrderedDict.fromkeys(fieldnames))
    with open("merged.csv", "wb") as fp:
        writer = csv.writer(fp)
        writer.writerow(fieldnames)
        for row in data.itervalues():
            writer.writerow([row.get(field, '') for field in fieldnames])
    

    which gives

    id,attr1,attr2,attr3,attr4,attr5,attr6
    1,True,7,Purple,,,
    2,False,19.8,Cucumber,python,500000.12,False
    3,False,-0.5,"A string with a comma, because it has one",Another string,-5,False
    4,True,2,Nope,,,
    5,True,4.0,Tuesday,program,3,True
    6,False,1,Failure,,,
    

    For comparison, the pandas equivalent would be something like

    df1 = pd.read_csv("file1.csv")
    df2 = pd.read_csv("file2.csv")
    merged = df1.merge(df2, on="id", how="outer").fillna("")
    merged.to_csv("merged.csv", index=False)
    

    which is much simpler to my eyes, and means you can spend more time dealing with your data and less time reinventing wheels.

提交回复
热议问题