How to add a column in multilevel Dataframe using pandas and yfinance?

南楼画角 提交于 2020-08-09 08:45:26

问题


I have the below code:

import yfinance as yf
import pandas as pd
import datetime as dt
end=dt.datetime.today()
start=end-dt.timedelta(59)
tickers=['WBA', 'HD']
ohlcv={}
df=pd.DataFrame
df = yf.download(tickers,group_by=tickers,start=start,end=end,interval='5m')

df['h-l']=abs(df.High-df.Low)
df['h-pc']=abs (df.High-df['Adj Close'].shift(1))
df['l-pc']=abs(df.Low-df['Adj Close'].shift(1))
df['tr']=df[['h-l','h-pc','l-pc']].max(axis=1)
df['atr']=df['tr'].rolling(window=n, min_periods=n).mean()

When I am trying to run it I am getting the below mentioned error:

return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'High'

I tried using this code:

df = df.stack(level=0).rename_axis(['Date', 'Ticker']).reset_index(level=1)

the report extracted has mathematical errors as there is no separation between the tickers.

When I actually need is for each and every ticker mentioned in the tickers list it should create a column where called "h-l" where it subtracts the high of that row with the low of that row and so on.


回答1:


Option 1: Multi-Level Column Names

  • Multi-level columns are accessed by passing a tuple
    • df[('WMB', 'High')]
  • Package versions used
    • print(pd.__version__) at least '1.0.5'
    • print(yf.__version__) is '0.1.54'
import yfinance as yf
import pandas as pd
from datetime import datetime, timedelta

end = datetime.today()
start = end - timedelta(59)
tickers = ['WBA', 'HD']

df = yf.download(tickers,group_by=tickers,start=start,end=end,interval='5m')

# iterate over level 0 ticker names
for ticker in tickers:
    df[(ticker, 'h-l')] = abs(df[(ticker, 'High')] - df[(ticker, 'Low')])
    df[(ticker, 'h-pc')] = abs(df[(ticker, 'High')] - df[(ticker, 'Adj Close')].shift(1))
    df[(ticker, 'l-pc')] = abs(df[(ticker, 'Low')] - df[(ticker, 'Adj Close')].shift(1))
    df[(ticker, 'tr')] = df[[(ticker, 'h-l'), (ticker, 'h-pc'), (ticker, 'l-pc')]].max(axis=1)
#     df[(ticker, 'atr')] = df[(ticker, 'tr')].rolling(window=n, min_periods=n).mean()  # not included becasue n is not defined

# sort the columns
df = df.reindex(sorted(df.columns), axis=1)

# display(df.head())
                                   HD                                                                                                          WBA                                                                                              
                            Adj Close       Close        High         Low        Open    Volume       h-l      h-pc      l-pc        tr  Adj Close      Close       High        Low       Open    Volume       h-l      h-pc      l-pc        tr
Datetime                                                                                                                                                                                                                                        
2020-06-08 09:30:00-04:00  253.937500  253.937500  253.960007  252.360001  252.490005  210260.0  1.600006       NaN       NaN  1.600006  46.049999  46.049999  46.070000  45.490002  45.490002  239860.0  0.579998       NaN       NaN  0.579998
2020-06-08 09:35:00-04:00  253.470001  253.470001  254.339996  253.220093  253.990005   95906.0  1.119904  0.402496  0.717407  1.119904  46.330002  46.330002  46.330002  46.040001  46.070000  104259.0  0.290001  0.280003  0.009998  0.290001
2020-06-08 09:40:00-04:00  253.580002  253.580002  253.829895  252.955002  253.429993   55868.0  0.874893  0.359894  0.514999  0.874893  46.610001  46.610001  46.660000  46.240002  46.330002  113174.0  0.419998  0.329998  0.090000  0.419998
2020-06-08 09:45:00-04:00  253.740005  253.740005  253.929993  253.289993  253.529999   61892.0  0.639999  0.349991  0.290009  0.639999  46.880001  46.880001  46.950001  46.624100  46.624100  121388.0  0.325901  0.340000  0.014099  0.340000
2020-06-08 09:50:00-04:00  253.703400  253.703400  253.910004  253.419998  253.740005   60809.0  0.490005  0.169998  0.320007  0.490005  46.919998  46.919998  46.990002  46.820000  46.880001  154239.0  0.170002  0.110001  0.060001  0.170002

Option 2: Single-Level Column Names

  • As demonstrated in How to deal with multi-level column names downloaded with yfinance?, it's easier to deal with single-level column names.
  • With the tickers in a column instead of a multi-level column headers, use pandas.DataFrame.gropuby on the Ticker column.
import yfinance as yf
import pandas as pd
from datetime import datetime, timedelta

tickerStrings = ['WBA', 'HD']
df = yf.download(tickers, group_by='Ticker', start=start ,end=end, interval='5m')

# create single level column names
df = df.stack(level=0).rename_axis(['Date', 'Ticker']).reset_index(level=1)

# function with calculations
def my_calculations(df):
    df['h-l']=abs(df.High-df.Low)
    df['h-pc']=abs(df.High-df['Adj Close'].shift(1))
    df['l-pc']=abs(df.Low-df['Adj Close'].shift(1))
    df['tr']=df[['h-l','h-pc','l-pc']].max(axis=1)
#     df['atr']=df['tr'].rolling(window=n, min_periods=n).mean()  # n is not defined in the question
    return df

# apply the function
df_updated = df.reset_index().groupby('Ticker').apply(my_calculations).sort_values(['Ticker', 'Date'])


来源:https://stackoverflow.com/questions/63262472/how-to-add-a-column-in-multilevel-dataframe-using-pandas-and-yfinance

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!