Import multiple csv files into pandas and concatenate into one DataFrame

前端 未结 16 1786
既然无缘
既然无缘 2020-11-21 07:47

I would like to read several csv files from a directory into pandas and concatenate them into one big DataFrame. I have not been able to figure it out though. Here is what I

16条回答
  •  一整个雨季
    2020-11-21 07:55

    Edit: I googled my way into https://stackoverflow.com/a/21232849/186078. However of late I am finding it faster to do any manipulation using numpy and then assigning it once to dataframe rather than manipulating the dataframe itself on an iterative basis and it seems to work in this solution too.

    I do sincerely want anyone hitting this page to consider this approach, but don't want to attach this huge piece of code as a comment and making it less readable.

    You can leverage numpy to really speed up the dataframe concatenation.

    import os
    import glob
    import pandas as pd
    import numpy as np
    
    path = "my_dir_full_path"
    allFiles = glob.glob(os.path.join(path,"*.csv"))
    
    
    np_array_list = []
    for file_ in allFiles:
        df = pd.read_csv(file_,index_col=None, header=0)
        np_array_list.append(df.as_matrix())
    
    comb_np_array = np.vstack(np_array_list)
    big_frame = pd.DataFrame(comb_np_array)
    
    big_frame.columns = ["col1","col2"....]
    

    Timing stats:

    total files :192
    avg lines per file :8492
    --approach 1 without numpy -- 8.248656988143921 seconds ---
    total records old :1630571
    --approach 2 with numpy -- 2.289292573928833 seconds ---
    

提交回复
热议问题