Merge multiple csv files with same name in 10 different subdirectory

后端 未结 2 427
清酒与你
清酒与你 2020-12-22 06:18

i have 10 different subdirectories with same file names in each directory ( 20 files per directory ) and column 0 is the index column in each file.

e.g



        
相关标签:
2条回答
  • 2020-12-22 06:33

    This can be achieved in much simple way in shell as:

    find . -name "*.csv" | xargs cat > mergedCSV
    

    (Note: Don't use .csv in extension as it will cause inconsistency with find. After this command is finished, file can be renamed as .csv

    0 讨论(0)
  • 2020-12-22 06:46

    There are many ways to do this, staying in Pandas I did the following.

    With the file structure

    root/  
    ├── dir1/  
    │   ├── data_20170101_k   
    │   ├── data_20170102_k    
    │   ├── ...  
    ├── dir2/    
    │   ├── data_20170101_k    
    │   └── data_20170101_k  
    │   └── ...   
    └── ... 
    

    This code will work, it's a little verbose for explanation but you can shorten with implementation.

    import glob
    import pandas as pd
    
    CONCAT_DIR = "/FILES_CONCAT/"
    
    # Use glob module to return all csv files under root directory. Create DF from this.
    files = pd.DataFrame([file for file in glob.glob("root/*/*")], columns=["fullpath"])
    
    #    fullpath
    # 0  root\dir1\data_20170101_k.csv
    # 1  root\dir1\data_20170102_k.csv
    # 2  root\dir2\data_20170101_k.csv
    # 3  root\dir2\data_20170102_k.csv
    
    # Split the full path into directory and filename
    files_split = files['fullpath'].str.rsplit("\\", 1, expand=True).rename(columns={0: 'path', 1:'filename'})
    
    #    path       filename
    # 0  root\dir1  data_20170101_k.csv
    # 1  root\dir1  data_20170102_k.csv
    # 2  root\dir2  data_20170101_k.csv
    # 3  root\dir2  data_20170102_k.csv
    
    # Join these into one DataFrame
    files = files.join(files_split)
    
    #    fullpath                       path        filename
    # 0  root\dir1\data_20170101_k.csv  root\dir1   data_20170101_k.csv
    # 1  root\dir1\data_20170102_k.csv  root\dir1   data_20170102_k.csv
    # 2  root\dir2\data_20170101_k.csv  root\dir2   data_20170101_k.csv
    # 3  root\dir2\data_20170102_k.csv  root\dir2   data_20170102_k.csv
    
    # Iterate over unique filenames; read CSVs, concat DFs, save file
    for f in files['filename'].unique():
        paths = files[files['filename'] == f]['fullpath'] # Get list of fullpaths from unique filenames
        dfs = [pd.read_csv(path, header=None) for path in paths] # Get list of dataframes from CSV file paths
        concat_df = pd.concat(dfs) # Concat dataframes into one
        concat_df.to_csv(CONCAT_DIR + f) # Save dataframe
    
    0 讨论(0)
提交回复
热议问题