I have a folder trip_data
contains many csv file with date, which looks like this:
trip_data/
├── df_trip_20140803_1.csv
├── df_trip_20140803_2.
I would collect all those CSV into dictionary of DataFrames with the following structure:
df['20140803']
- DF containing concatenated data belonging to all df_trip_20140803_*.csv
CSV files.
Solution:
import os
import re
import glob
import pandas as pd
fpattern = r'D:\temp\.data\41444939\df_trip_{}_{}.csv'
files = glob.glob(fpattern.format('*','*'))
dates = sorted(set([re.split(r'_(\d{8})_(\d+)\.(\w+)', f)[1] for f in files]))
dfs = {}
for d in dates:
dfs[d] = pd.concat((pd.read_csv(f) for f in glob.glob(fpattern.format(d, '*'))), ignore_index=True)
Test:
In [95]: dfs.keys()
Out[95]: dict_keys(['20140804', '20140805', '20140803', '20140806'])
In [96]: dfs['20140803']
Out[96]:
a b c
0 0 0 7
1 3 7 1
2 9 7 3
3 7 4 7
4 5 2 4
5 0 0 4
6 7 2 2
7 8 4 1
8 0 8 3
9 3 9 0
10 7 3 9
11 1 9 8
12 6 7 2
13 3 8 1
14 3 4 5
15 0 9 2
16 5 8 7
17 8 5 4
18 2 0 2
19 9 6 6
20 6 6 6
21 2 6 9
22 1 0 8
23 3 1 1
24 7 4 2
25 7 4 2
26 8 3 7
27 7 3 2
28 1 7 7
29 3 6 5
Setup:
fn = r'D:\temp\.data\41444939\a.txt'
base_dir = r'D:\temp\.data\41444939'
files = open(fn).read().splitlines()
for f in files:
pd.DataFrame(np.random.randint(0, 10, (5, 3)), columns=list('abc')) \
.to_csv(os.path.join(base_dir, f), index=False)