问题
I have a folder trip_data
contains many csv file with date, which looks like this:
trip_data/
├── df_trip_20140803_1.csv
├── df_trip_20140803_2.csv
├── df_trip_20140803_3.csv
├── df_trip_20140803_4.csv
├── df_trip_20140803_5.csv
├── df_trip_20140803_6.csv
├── df_trip_20140804_1.csv
├── df_trip_20140804_2.csv
├── df_trip_20140804_3.csv
├── df_trip_20140804_4.csv
├── df_trip_20140804_5.csv
├── df_trip_20140804_6.csv
├── df_trip_20140805_1.csv
├── df_trip_20140805_2.csv
├── df_trip_20140805_3.csv
├── df_trip_20140805_4.csv
├── df_trip_20140805_5.csv
├── df_trip_20140805_6.csv
├── df_trip_20140806_1.csv
├── df_trip_20140806_2.csv
├── df_trip_20140806_3.csv
├── df_trip_20140806_4.csv
Now I want to load all these file separately by date with python pandas, means 4 DataFrame df_traip_20140803, df_traip_20140804, df_traip_20140805, df_traip_20140806
My code looks like this:
days = [20140803,20140804,20140805,20140806]
for day in days:
## Locate to the path
path ='./trip_data/df_trip_%d*.csv' % day
df = pd.read_csv(path, header=None, nrows=10,
names=['ID','lat','lon','status','timestamp'])
Which could not get the correct result. How can I do this?
回答1:
I would collect all those CSV into dictionary of DataFrames with the following structure:
df['20140803']
- DF containing concatenated data belonging to all df_trip_20140803_*.csv
CSV files.
Solution:
import os
import re
import glob
import pandas as pd
fpattern = r'D:\temp\.data\41444939\df_trip_{}_{}.csv'
files = glob.glob(fpattern.format('*','*'))
dates = sorted(set([re.split(r'_(\d{8})_(\d+)\.(\w+)', f)[1] for f in files]))
dfs = {}
for d in dates:
dfs[d] = pd.concat((pd.read_csv(f) for f in glob.glob(fpattern.format(d, '*'))), ignore_index=True)
Test:
In [95]: dfs.keys()
Out[95]: dict_keys(['20140804', '20140805', '20140803', '20140806'])
In [96]: dfs['20140803']
Out[96]:
a b c
0 0 0 7
1 3 7 1
2 9 7 3
3 7 4 7
4 5 2 4
5 0 0 4
6 7 2 2
7 8 4 1
8 0 8 3
9 3 9 0
10 7 3 9
11 1 9 8
12 6 7 2
13 3 8 1
14 3 4 5
15 0 9 2
16 5 8 7
17 8 5 4
18 2 0 2
19 9 6 6
20 6 6 6
21 2 6 9
22 1 0 8
23 3 1 1
24 7 4 2
25 7 4 2
26 8 3 7
27 7 3 2
28 1 7 7
29 3 6 5
Setup:
fn = r'D:\temp\.data\41444939\a.txt'
base_dir = r'D:\temp\.data\41444939'
files = open(fn).read().splitlines()
for f in files:
pd.DataFrame(np.random.randint(0, 10, (5, 3)), columns=list('abc')) \
.to_csv(os.path.join(base_dir, f), index=False)
来源:https://stackoverflow.com/questions/41444939/pandas-read-csv-with-regex