Import multiple csv files into pandas and concatenate into one DataFrame

前端未结

关注

 16  1839

既然无缘 2020-11-21 07:47

I would like to read several csv files from a directory into pandas and concatenate them into one big DataFrame. I have not been able to figure it out though. Here is what I

16条回答

陌清茗 (楼主)

2020-11-21 08:02
Based on @Sid's good answer.

Before concatenating, you can load csv files into an intermediate dictionary which gives access to each data set based on the file name (in the form dict_of_df['filename.csv']). Such a dictionary can help you identify issues with heterogeneous data formats, when column names are not aligned for example.

Import modules and locate file paths:
```
import os
import glob
import pandas
from collections import OrderedDict
path =r'C:\DRO\DCL_rawdata_files'
filenames = glob.glob(path + "/*.csv")
```
Note: OrderedDict is not necessary, but it'll keep the order of files which might be useful for analysis.

Load csv files into a dictionary. Then concatenate:
```
dict_of_df = OrderedDict((f, pandas.read_csv(f)) for f in filenames)
pandas.concat(dict_of_df, sort=True)
```
Keys are file names f and values are the data frame content of csv files. Instead of using f as a dictionary key, you can also use os.path.basename(f) or other os.path methods to reduce the size of the key in the dictionary to only the smaller part that is relevant.
0 讨论(0)

查看其它16个回答
发布评论:

提交评论
- 加载中...

Import multiple csv files into pandas and concatenate into one DataFrame

Import modules and locate file paths:

Load csv files into a dictionary. Then concatenate: