Merge multiple csv files with same name in 10 different subdirectory

后端未结

关注

 2  427

i have 10 different subdirectories with same file names in each directory ( 20 files per directory ) and column 0 is the index column in each file.

e.g

相关标签:

2条回答

春和景丽

2020-12-22 06:33
This can be achieved in much simple way in shell as:
```
find . -name "*.csv" | xargs cat > mergedCSV
```
(Note: Don't use .csv in extension as it will cause inconsistency with find. After this command is finished, file can be renamed as .csv
0 讨论(0)
发布评论:

提交评论
- 加载中...

不思量自难忘°

2020-12-22 06:46

There are many ways to do this, staying in Pandas I did the following.

With the file structure

root/  
├── dir1/  
│   ├── data_20170101_k   
│   ├── data_20170102_k    
│   ├── ...  
├── dir2/    
│   ├── data_20170101_k    
│   └── data_20170101_k  
│   └── ...   
└── ...

This code will work, it's a little verbose for explanation but you can shorten with implementation.

import glob
import pandas as pd

CONCAT_DIR = "/FILES_CONCAT/"

# Use glob module to return all csv files under root directory. Create DF from this.
files = pd.DataFrame([file for file in glob.glob("root/*/*")], columns=["fullpath"])

#    fullpath
# 0  root\dir1\data_20170101_k.csv
# 1  root\dir1\data_20170102_k.csv
# 2  root\dir2\data_20170101_k.csv
# 3  root\dir2\data_20170102_k.csv

# Split the full path into directory and filename
files_split = files['fullpath'].str.rsplit("\\", 1, expand=True).rename(columns={0: 'path', 1:'filename'})

#    path       filename
# 0  root\dir1  data_20170101_k.csv
# 1  root\dir1  data_20170102_k.csv
# 2  root\dir2  data_20170101_k.csv
# 3  root\dir2  data_20170102_k.csv

# Join these into one DataFrame
files = files.join(files_split)

#    fullpath                       path        filename
# 0  root\dir1\data_20170101_k.csv  root\dir1   data_20170101_k.csv
# 1  root\dir1\data_20170102_k.csv  root\dir1   data_20170102_k.csv
# 2  root\dir2\data_20170101_k.csv  root\dir2   data_20170101_k.csv
# 3  root\dir2\data_20170102_k.csv  root\dir2   data_20170102_k.csv

# Iterate over unique filenames; read CSVs, concat DFs, save file
for f in files['filename'].unique():
    paths = files[files['filename'] == f]['fullpath'] # Get list of fullpaths from unique filenames
    dfs = [pd.read_csv(path, header=None) for path in paths] # Get list of dataframes from CSV file paths
    concat_df = pd.concat(dfs) # Concat dataframes into one
    concat_df.to_csv(CONCAT_DIR + f) # Save dataframe

0 讨论(0)