pandas read csv with regex

前端 未结 1 1019
轻奢々
轻奢々 2021-01-06 18:05

I have a folder trip_data contains many csv file with date, which looks like this:

trip_data/
├── df_trip_20140803_1.csv
├── df_trip_20140803_2.         


        
相关标签:
1条回答
  • 2021-01-06 18:42

    I would collect all those CSV into dictionary of DataFrames with the following structure:

    df['20140803'] - DF containing concatenated data belonging to all df_trip_20140803_*.csv CSV files.

    Solution:

    import os
    import re
    import glob
    import pandas as pd
    
    fpattern = r'D:\temp\.data\41444939\df_trip_{}_{}.csv'
    files = glob.glob(fpattern.format('*','*'))
    
    dates = sorted(set([re.split(r'_(\d{8})_(\d+)\.(\w+)', f)[1] for f in files]))
    
    dfs = {}
    for d in dates:
        dfs[d] = pd.concat((pd.read_csv(f) for f in glob.glob(fpattern.format(d, '*'))), ignore_index=True)
    

    Test:

    In [95]: dfs.keys()
    Out[95]: dict_keys(['20140804', '20140805', '20140803', '20140806'])
    
    In [96]: dfs['20140803']
    Out[96]:
        a  b  c
    0   0  0  7
    1   3  7  1
    2   9  7  3
    3   7  4  7
    4   5  2  4
    5   0  0  4
    6   7  2  2
    7   8  4  1
    8   0  8  3
    9   3  9  0
    10  7  3  9
    11  1  9  8
    12  6  7  2
    13  3  8  1
    14  3  4  5
    15  0  9  2
    16  5  8  7
    17  8  5  4
    18  2  0  2
    19  9  6  6
    20  6  6  6
    21  2  6  9
    22  1  0  8
    23  3  1  1
    24  7  4  2
    25  7  4  2
    26  8  3  7
    27  7  3  2
    28  1  7  7
    29  3  6  5
    

    Setup:

    fn = r'D:\temp\.data\41444939\a.txt'
    base_dir = r'D:\temp\.data\41444939'
    files = open(fn).read().splitlines()
    for f in files:
        pd.DataFrame(np.random.randint(0, 10, (5, 3)), columns=list('abc')) \
          .to_csv(os.path.join(base_dir, f), index=False)
    
    0 讨论(0)
提交回复
热议问题