pandas read csv with regex

前端未结

关注

 1  1019

I have a folder trip_data contains many csv file with date, which looks like this:

trip_data/
├── df_trip_20140803_1.csv
├── df_trip_20140803_2.


                      
              相关标签:


      
      
        
          1条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  灰色年华        
                
              
                            
                2021-01-06 18:42
              
            
            
                                                                       
I would collect all those CSV into dictionary of DataFrames with the following structure:

df['20140803'] - DF containing concatenated data belonging to all df_trip_20140803_*.csv CSV files.

Solution:

import os
import re
import glob
import pandas as pd

fpattern = r'D:\temp\.data\41444939\df_trip_{}_{}.csv'
files = glob.glob(fpattern.format('*','*'))

dates = sorted(set([re.split(r'_(\d{8})_(\d+)\.(\w+)', f)[1] for f in files]))

dfs = {}
for d in dates:
    dfs[d] = pd.concat((pd.read_csv(f) for f in glob.glob(fpattern.format(d, '*'))), ignore_index=True)


Test:

In [95]: dfs.keys()
Out[95]: dict_keys(['20140804', '20140805', '20140803', '20140806'])

In [96]: dfs['20140803']
Out[96]:
    a  b  c
0   0  0  7
1   3  7  1
2   9  7  3
3   7  4  7
4   5  2  4
5   0  0  4
6   7  2  2
7   8  4  1
8   0  8  3
9   3  9  0
10  7  3  9
11  1  9  8
12  6  7  2
13  3  8  1
14  3  4  5
15  0  9  2
16  5  8  7
17  8  5  4
18  2  0  2
19  9  6  6
20  6  6  6
21  2  6  9
22  1  0  8
23  3  1  1
24  7  4  2
25  7  4  2
26  8  3  7
27  7  3  2
28  1  7  7
29  3  6  5


Setup:

fn = r'D:\temp\.data\41444939\a.txt'
base_dir = r'D:\temp\.data\41444939'
files = open(fn).read().splitlines()
for f in files:
    pd.DataFrame(np.random.randint(0, 10, (5, 3)), columns=list('abc')) \
      .to_csv(os.path.join(base_dir, f), index=False)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复