How to read in multiple files as separate dataframes and perform calculations on a column?

前端未结

关注

 1  974

I am calculating a single stock return as follow:

data = pd.read_csv(r\'**file**.csv\')
data.index = data.Date
data[\'Return %\'] = data[\'AAPL\'].pct_change(-1)*


                      
              相关标签:


      
      
        
          1条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  星月不相逢        
                
              
                            
                2021-01-27 19:25
              
            
            
                                                                       

I think the best option for your data is to read the files into a dictionary of dataframes.

Use pathlib and .glob to create a list of all the files
Use a dict comprehension to create the dict of dataframes.


The dictionary can be iterated over in the standard way of dictionaries, with dict.items().
df_dict[k] addresses each dataframe, where k is the dict key, which is the file name.
From your last question, I expect the .csv file to be read in with one Date column, not two.
The numeric data for each file should be in the column at index 0, after Date is set as the index.

Since the column name is different for each file, it's better to use .iloc to address the column.
: means all rows and 0 is the column index for the numeric data.


df_dict.keys() will return a list of all the keys
Individually access a dataframe with df_dict[key].

import pandas as pd
from pathlib import Path

# create the path to the files
p = Path('c:/Users/<<user_name>>/Documents/stock_files')

# get all the files
files = p.glob('*.csv')

# created the dict of dataframes
df_dict = {f.stem: pd.read_csv(f, parse_dates=['Date'], index_col='Date') for f in files}

# apply calculations to each dataframe and update the dataframe
# since the stock data is in column 0 of each dataframe, use .iloc
for k, df in df_dict.items():
    df_dict[k]['Return %'] = df.iloc[:, 0].pct_change(-1)*100

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复