How to read in multiple files as separate dataframes and perform calculations on a column?

前端 未结 1 974
后悔当初
后悔当初 2021-01-27 18:47

I am calculating a single stock return as follow:

data = pd.read_csv(r\'**file**.csv\')
data.index = data.Date
data[\'Return %\'] = data[\'AAPL\'].pct_change(-1)*         


        
相关标签:
1条回答
  • 2021-01-27 19:25
    • I think the best option for your data is to read the files into a dictionary of dataframes.
      • Use pathlib and .glob to create a list of all the files
      • Use a dict comprehension to create the dict of dataframes.
    • The dictionary can be iterated over in the standard way of dictionaries, with dict.items().
    • df_dict[k] addresses each dataframe, where k is the dict key, which is the file name.
    • From your last question, I expect the .csv file to be read in with one Date column, not two.
    • The numeric data for each file should be in the column at index 0, after Date is set as the index.
      • Since the column name is different for each file, it's better to use .iloc to address the column.
      • : means all rows and 0 is the column index for the numeric data.
    • df_dict.keys() will return a list of all the keys
    • Individually access a dataframe with df_dict[key].
    import pandas as pd
    from pathlib import Path
    
    # create the path to the files
    p = Path('c:/Users/<<user_name>>/Documents/stock_files')
    
    # get all the files
    files = p.glob('*.csv')
    
    # created the dict of dataframes
    df_dict = {f.stem: pd.read_csv(f, parse_dates=['Date'], index_col='Date') for f in files}
    
    # apply calculations to each dataframe and update the dataframe
    # since the stock data is in column 0 of each dataframe, use .iloc
    for k, df in df_dict.items():
        df_dict[k]['Return %'] = df.iloc[:, 0].pct_change(-1)*100
    
    0 讨论(0)
提交回复
热议问题