Import multiple csv files into pandas and concatenate into one DataFrame

前端 未结 16 1777
既然无缘
既然无缘 2020-11-21 07:47

I would like to read several csv files from a directory into pandas and concatenate them into one big DataFrame. I have not been able to figure it out though. Here is what I

相关标签:
16条回答
  • 2020-11-21 08:06

    The Dask library can read a dataframe from multiple files:

    >>> import dask.dataframe as dd
    >>> df = dd.read_csv('data*.csv')
    

    (Source: http://dask.pydata.org/en/latest/examples/dataframe-csv.html)

    The Dask dataframes implement a subset of the Pandas dataframe API. If all the data fits into memory, you can call df.compute() to convert the dataframe into a Pandas dataframe.

    0 讨论(0)
  • 2020-11-21 08:06

    If the multiple csv files are zipped, you may use zipfile to read all and concatenate as below:

    import zipfile
    import numpy as np
    import pandas as pd
    
    ziptrain = zipfile.ZipFile('yourpath/yourfile.zip')
    
    train=[]
    
    for f in range(0,len(ziptrain.namelist())):
        if (f == 0):
            train = pd.read_csv(ziptrain.open(ziptrain.namelist()[f]))
        else:
            my_df = pd.read_csv(ziptrain.open(ziptrain.namelist()[f]))
            train = (pd.DataFrame(np.concatenate((train,my_df),axis=0), 
                              columns=list(my_df.columns.values)))
    
    0 讨论(0)
  • 2020-11-21 08:09

    Easy and Fast

    Import two or more csv's without having to make a list of names.

    import glob
    
    df = pd.concat(map(pd.read_csv, glob.glob('data/*.csv')))
    
    0 讨论(0)
  • 2020-11-21 08:09

    Another on-liner with list comprehension which allows to use arguments with read_csv.

    df = pd.concat([pd.read_csv(f'dir/{f}') for f in os.listdir('dir') if f.endswith('.csv')])
    
    0 讨论(0)
  • 2020-11-21 08:10

    An alternative to darindaCoder's answer:

    path = r'C:\DRO\DCL_rawdata_files'                     # use your path
    all_files = glob.glob(os.path.join(path, "*.csv"))     # advisable to use os.path.join as this makes concatenation OS independent
    
    df_from_each_file = (pd.read_csv(f) for f in all_files)
    concatenated_df   = pd.concat(df_from_each_file, ignore_index=True)
    # doesn't create a list, nor does it append to one
    
    0 讨论(0)
  • 2020-11-21 08:10

    one liner using map, but if you'd like to specify additional args, you could do:

    import pandas as pd
    import glob
    import functools
    
    df = pd.concat(map(functools.partial(pd.read_csv, sep='|', compression=None), 
                        glob.glob("data/*.csv")))
    

    Note: map by itself does not let you supply additional args.

    0 讨论(0)
提交回复
热议问题