问题
I have a DataFrame that contains stock data with the following columns:
time ticker price
0 2020-04-02 09:30:35 EV 33.860
1 2020-04-02 09:00:00 AMG 60.430
2 2020-04-02 09:30:35 AMG 60.750
3 2020-04-02 09:00:00 BLK 455.350
4 2020-04-02 09:30:35 BLK 451.514
... ... ... ...
502596 2020-04-02 13:00:56 TLT 166.450
502597 2020-04-02 13:00:56 VXX 47.150
502598 2020-04-02 13:00:56 TSLA 529.800
502599 2020-04-02 13:00:56 BIDU 103.500
502600 2020-04-02 13:00:56 ON 12.700
Its stock price data for every 20 seconds for several hundred stocks. I have put the time column into datetime format.
From here, I need to separate the data into 5 minute intervals then plot the data, which I've done by using:
out = df.groupby("ticker")\
.apply(lambda x: x.set_index("time")\
.resample("5T")\
.first()\
.reset_index())\
.reset_index(drop=True)
dffinal = out.dropna(axis=0)
def plot_tick(data, ticker):
ts = data[data["ticker"]==ticker].reset_index(drop=True)
ts.plot(x="time", y="price",title=ticker,figsize=(20,20),kind='line')
plot_tick(dffinal, "A")
The chart comes out well, but the problem is I need the high ,low, open, and close prices for each 5 minute interval for each stock. I need this in order to make a candlestick chart, which I can easily do once I have high, low,open, and close columns.
The open and close is the price at the beginning and end of the 5 minute period, respectively. The high and low column is the highest price during the interval, and lowest price during the interval, respectively.
So I'm looking for and end result like this:
time ticker price open close high low
0 2020-04-02 09:00:00 A 72.6700
6 2020-04-02 09:30:00 A 72.1400
7 2020-04-02 09:35:00 A 72.5400
8 2020-04-02 09:40:00 A 72.4000
9 2020-04-02 09:45:00 A 72.3338
... ... ... ...
38895 2020-04-02 12:40:00 ZUMZ 17.6000
38896 2020-04-02 12:45:00 ZUMZ 17.6300
38897 2020-04-02 12:50:00 ZUMZ 17.6000
38898 2020-04-02 12:55:00 ZUMZ 17.7400
38899 2020-04-02 13:00:00 ZUMZ 17.560
with the open, close,high,low filled in, obviously.
回答1:
IIUC, in the groupby
you can do it by 'ticker' but also using pd.Grouper on 'time' with a frequence of 5 minutes. In the agg
method, you can use since pandas>0.25 pd.NamedAgg with first
for open, last
for close, max
for high and min
for low.
# dummy variables
np.random.seed(0)
df = pd.DataFrame({'time':pd.date_range('2020-04-01 9:30:00', freq='20s', periods=50).tolist()*2,
'ticker': ['ticker1']*50 + ['ticker2']*50,
'price':np.random.randint(30, 50, 100)})
# groupby and agg, then reset_index
df_f = df.groupby(['ticker', pd.Grouper(key='time', freq='5T')])\
.agg(open=pd.NamedAgg(column='price', aggfunc='first'),
close=pd.NamedAgg(column='price', aggfunc='last'),
high=pd.NamedAgg(column='price', aggfunc='max'),
low=pd.NamedAgg(column='price', aggfunc='min'))\
.reset_index()
print (df_f)
ticker time open close high low
0 ticker1 2020-04-01 09:30:00 42 37 49 30
1 ticker1 2020-04-01 09:35:00 44 33 49 30
2 ticker1 2020-04-01 09:40:00 47 32 49 30
3 ticker1 2020-04-01 09:45:00 30 36 36 30
4 ticker2 2020-04-01 09:30:00 38 48 48 31
5 ticker2 2020-04-01 09:35:00 30 44 45 30
6 ticker2 2020-04-01 09:40:00 45 34 48 30
7 ticker2 2020-04-01 09:45:00 32 40 46 32
for plotting candlestick with plotly, you can do:
import plotly.figure_factory
def plot_tick(data, ticker):
ts = data[data["ticker"]==ticker].reset_index(drop=True)
fig = plotly.figure_factory.create_candlestick(ts.open, ts.high, ts.low,
ts.close, dates=ts.time)
fig.show()
plot_tick(df_f, 'ticker1')
来源:https://stackoverflow.com/questions/60999601/getting-open-high-low-close-for-5-min-stock-data-python