Import multiple csv files into pandas and concatenate into one DataFrame

前端 未结 16 1776
既然无缘
既然无缘 2020-11-21 07:47

I would like to read several csv files from a directory into pandas and concatenate them into one big DataFrame. I have not been able to figure it out though. Here is what I

相关标签:
16条回答
  • 2020-11-21 08:11
    import glob, os    
    df = pd.concat(map(pd.read_csv, glob.glob(os.path.join('', "my_files*.csv"))))
    
    0 讨论(0)
  • 2020-11-21 08:11

    Alternative using the pathlib library (often preferred over os.path).

    This method avoids iterative use of pandas concat()/apped().

    From the pandas documentation:
    It is worth noting that concat() (and therefore append()) makes a full copy of the data, and that constantly reusing this function can create a significant performance hit. If you need to use the operation over several datasets, use a list comprehension.

    import pandas as pd
    from pathlib import Path
    
    dir = Path("../relevant_directory")
    
    df = (pd.read_csv(f) for f in dir.glob("*.csv"))
    df = pd.concat(df)
    
    0 讨论(0)
  • 2020-11-21 08:12

    Almost all of the answers here are either unnecessarily complex (glob pattern matching) or rely on additional 3rd party libraries. You can do this in 2 lines using everything Pandas and python (all versions) already have built in.

    For a few files - 1 liner:

    df = pd.concat(map(pd.read_csv, ['data/d1.csv', 'data/d2.csv','data/d3.csv']))
    

    For many files:

    from os import listdir
    
    filepaths = [f for f in listdir("./data") if f.endswith('.csv')]
    df = pd.concat(map(pd.read_csv, filepaths))
    

    This pandas line which sets the df utilizes 3 things:

    1. Python's map (function, iterable) sends to the function (the pd.read_csv()) the iterable (our list) which is every csv element in filepaths).
    2. Panda's read_csv() function reads in each CSV file as normal.
    3. Panda's concat() brings all these under one df variable.
    0 讨论(0)
  • 2020-11-21 08:14

    This is how you can do using Colab on Google Drive

    import pandas as pd
    import glob
    
    path = r'/content/drive/My Drive/data/actual/comments_only' # use your path
    all_files = glob.glob(path + "/*.csv")
    
    li = []
    
    for filename in all_files:
        df = pd.read_csv(filename, index_col=None, header=0)
        li.append(df)
    
    frame = pd.concat(li, axis=0, ignore_index=True,sort=True)
    frame.to_csv('/content/drive/onefile.csv')
    
    0 讨论(0)
提交回复
热议问题