Import multiple csv files into pandas and concatenate into one DataFrame

前端 未结 16 1738
既然无缘
既然无缘 2020-11-21 07:47

I would like to read several csv files from a directory into pandas and concatenate them into one big DataFrame. I have not been able to figure it out though. Here is what I

16条回答
  •  陌清茗
    陌清茗 (楼主)
    2020-11-21 08:02

    Based on @Sid's good answer.

    Before concatenating, you can load csv files into an intermediate dictionary which gives access to each data set based on the file name (in the form dict_of_df['filename.csv']). Such a dictionary can help you identify issues with heterogeneous data formats, when column names are not aligned for example.

    Import modules and locate file paths:

    import os
    import glob
    import pandas
    from collections import OrderedDict
    path =r'C:\DRO\DCL_rawdata_files'
    filenames = glob.glob(path + "/*.csv")
    

    Note: OrderedDict is not necessary, but it'll keep the order of files which might be useful for analysis.

    Load csv files into a dictionary. Then concatenate:

    dict_of_df = OrderedDict((f, pandas.read_csv(f)) for f in filenames)
    pandas.concat(dict_of_df, sort=True)
    

    Keys are file names f and values are the data frame content of csv files. Instead of using f as a dictionary key, you can also use os.path.basename(f) or other os.path methods to reduce the size of the key in the dictionary to only the smaller part that is relevant.

提交回复
热议问题