Import multiple csv files into pandas and concatenate into one DataFrame

前端未结

关注

 16  1829

I would like to read several csv files from a directory into pandas and concatenate them into one big DataFrame. I have not been able to figure it out though. Here is what I

相关标签:

16条回答

佛祖请我去吃肉

2020-11-21 08:06
The Dask library can read a dataframe from multiple files:
```
>>> import dask.dataframe as dd
>>> df = dd.read_csv('data*.csv')
```
(Source: http://dask.pydata.org/en/latest/examples/dataframe-csv.html)

The Dask dataframes implement a subset of the Pandas dataframe API. If all the data fits into memory, you can call df.compute() to convert the dataframe into a Pandas dataframe.
0 讨论(0)
发布评论:

提交评论
- 加载中...

轮回少年

2020-11-21 08:06

If the multiple csv files are zipped, you may use zipfile to read all and concatenate as below:

import zipfile
import numpy as np
import pandas as pd

ziptrain = zipfile.ZipFile('yourpath/yourfile.zip')

train=[]

for f in range(0,len(ziptrain.namelist())):
    if (f == 0):
        train = pd.read_csv(ziptrain.open(ziptrain.namelist()[f]))
    else:
        my_df = pd.read_csv(ziptrain.open(ziptrain.namelist()[f]))
        train = (pd.DataFrame(np.concatenate((train,my_df),axis=0), 
                          columns=list(my_df.columns.values)))

0 讨论(0)

一个人的身影

2020-11-21 08:09
Easy and Fast

Import two or more csv's without having to make a list of names.
```
import glob

df = pd.concat(map(pd.read_csv, glob.glob('data/*.csv')))
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
忘了有多久

2020-11-21 08:09
Another on-liner with list comprehension which allows to use arguments with read_csv.
```
df = pd.concat([pd.read_csv(f'dir/{f}') for f in os.listdir('dir') if f.endswith('.csv')])
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

遇见更好的自我

2020-11-21 08:10

An alternative to darindaCoder's answer:

path = r'C:\DRO\DCL_rawdata_files'                     # use your path
all_files = glob.glob(os.path.join(path, "*.csv"))     # advisable to use os.path.join as this makes concatenation OS independent

df_from_each_file = (pd.read_csv(f) for f in all_files)
concatenated_df   = pd.concat(df_from_each_file, ignore_index=True)
# doesn't create a list, nor does it append to one

0 讨论(0)

醉酒成梦

2020-11-21 08:10
one liner using map, but if you'd like to specify additional args, you could do:
```
import pandas as pd
import glob
import functools

df = pd.concat(map(functools.partial(pd.read_csv, sep='|', compression=None), 
                    glob.glob("data/*.csv")))
```
Note: map by itself does not let you supply additional args.
0 讨论(0)
发布评论:

提交评论
- 加载中...

Import multiple csv files into pandas and concatenate into one DataFrame

Easy and Fast