问题
I am trying to apply the pandas module to my code in order to re-organize the messages received back from IB TWS server.
The code is
from ibapi.client import EClient
from ibapi.wrapper import EWrapper
from ibapi.contract import Contract
class MyWrapper(EWrapper):
def nextValidId(self, orderId:int):
print("Setting nextValidOrderId: %d", orderId)
self.nextValidOrderId = orderId
self.start()
def historicalData(self, reqId, bar):
print("HistoricalData. ", reqId, "Date:", bar.date, "Open:", bar.open, "High:", bar.high, "Low:", bar.low, "Close:", bar.close, "Volume:", bar.volume, "Average:", bar.average, "Count:", bar.barCount)
def historicalDataUpdate(self, reqId, bar):
print("HistoricalDataUpdate. ", reqId, "Date:", bar.date, "Open:", bar.open, "High:", bar.high, "Low:", bar.low, "Close:", bar.close, "Volume:", bar.volume, "Average:", bar.average, "Count:", bar.barCount)
def error(self, reqId, errorCode, errorString):
print("Error. Id: " , reqId, " Code: " , errorCode , " Msg: " , errorString)
def start(self):
queryTime = ""
contract = Contract()
contract.secType = "STK"
contract.symbol = "NIO"
contract.currency = "USD"
contract.exchange = "SMART"
app.reqHistoricalData(1, contract, queryTime, "1 D", "5 secs", "TRADES", 0, 1, True, [])
app = EClient(MyWrapper())
app.connect("127.0.0.1", 7496, clientId=123)
app.run()
This code retrives historical data for a given stock, then returns the most current updates.
The problem that I am facing is that the messages returned are organized as such
HistoricalDataUpdate. 1 Date: 20200708 08:31:00 Open: 14.17 High: 14.17 Low: 14.17 Close: 14.17 Volume: -1 Average: 14.15 Count: -1
While I am trying to retrieve the data in a re-organized manner such as
HistoricalDataUpdate. 1 Date: Open: High: Low: Close: Volume: Average: Count:
20200708 08:31:00 14.17 14.17 14.17 14.17 -1 14.15 -1
Help would be appreciated.
回答1:
- this is really ETL (extract, transform, load)
- I can see each data element is of form Name:. Get all name tokens using this as a reg expr
- with this list extract each token into a dict based on position of token and next token
- get the data label before the first token
- finally turn this into a pandas data frame
text= "HistoricalDataUpdate. 1 Date: 20200708 08:31:00 Open: 14.17 High: 14.17 Low: 14.17 Close: 14.17 Volume: -1 Average: 14.15 Count: -1"
tokens = re.findall("([A-Z][a-z]*:)", text)
json = {t:text[re.search(tokens[i], text).span(0)[1]:re.search(tokens[i+1], text).span(0)[0]]
if i+1<len(tokens)
else text[re.search(tokens[i], text).span(0)[1]:]
for i,t in enumerate(tokens)}
json = {"label":text[:re.search(tokens[0], text).span(0)[0]], **json}
df = pd.DataFrame([json])
df
output
label Date: Open: High: Low: Close: Volume: Average: Count:
0 HistoricalDataUpdate. 1 20200708 08:31:00 14.17 14.17 14.17 14.17 -1 14.15 -1
回答2:
The callback gives you ibapi.common.BarData which you can read it's vars to get a dict like {date:..., open:123...}
etc.
Pandas can make a dataframe from a list of dicts so store them in a list
Maybe you want date as an index, pandas can do that as well, surprisingly it can read the format.
You can save the data when you are done in a csv file.
from ibapi.client import EClient
from ibapi.wrapper import EWrapper
from ibapi.contract import Contract
import pandas as pd
class MyWrapper(EWrapper):
def __init__(self):
self.data = []
self.df=None
def nextValidId(self, orderId:int):
print("Setting nextValidOrderId: %d", orderId)
self.nextValidOrderId = orderId
self.start()
def historicalData(self, reqId, bar):
self.data.append(vars(bar));
def historicalDataUpdate(self, reqId, bar):
line = vars(bar)
# pop date and make it the index, add rest to df
# will overwrite last bar at that same time
self.df.loc[pd.to_datetime(line.pop('date'))] = line
def historicalDataEnd(self, reqId: int, start: str, end: str):
print("HistoricalDataEnd. ReqId:", reqId, "from", start, "to", end)
self.df = pd.DataFrame(self.data)
self.df['date'] = pd.to_datetime(self.df['date'])
self.df.set_index('date', inplace=True)
def error(self, reqId, errorCode, errorString):
print("Error. Id: " , reqId, " Code: " , errorCode , " Msg: " , errorString)
def start(self):
queryTime = ""
# so everyone can get data use fx
fx = Contract()
fx.secType = "CASH"
fx.symbol = "USD"
fx.currency = "JPY"
fx.exchange = "IDEALPRO"
# setting update to 1 minute still sends an update every tick? but timestamps are 1 min
# I don't think keepUpToDate sends a realtimeBar every 5 secs, just updates the last bar.
app.reqHistoricalData(1, fx, queryTime, "1 D", "1 min", "MIDPOINT", 0, 1, True, [])
wrap = MyWrapper()
app = EClient(wrap)
app.connect("127.0.0.1", 7497, clientId=123)
#I just use this in jupyter so I can interact with df
import threading
threading.Thread(target = app.run).start()
#this isn't needed in jupyter, just run another cell
import time
time.sleep(300) # in 5 minutes check the df and close
print(wrap.df)
wrap.df.to_csv("myfile.csv")#save in file
app.disconnect()
#in jupyter to show plot
%matplotlib inline
wrap.df.close.plot()
I use jupyter notebook so I added threading so I can still interact.
Here is some output. The first data received and printed comes from historicalDataEnd. A dataFrame gets made from the variables with a datetime index so bars can be added by time.
HistoricalDataEnd. ReqId: 1 from 20200707 14:23:19 to 20200708 14:23:19
Then later after 300 secs I print the dataframe. Check that ohlc is logical and notice a new bar every minute. The 14:28 bar is only the first 19 seconds I assume since my five minutes (300 secs) started at 14:23:19. This is exactly the behaviour you would want and expect for keeping a chart up to date.
2020-07-08 14:24:00 107.231 107.236 107.231 107.233 -1 -1
2020-07-08 14:25:00 107.233 107.234 107.23 107.232 -1 -1
2020-07-08 14:26:00 107.232 107.232 107.225 107.232 -1 -1
2020-07-08 14:27:00 107.232 107.239 107.231 107.239 -1 -1
2020-07-08 14:28:00 107.239 107.239 107.236 107.236 -1 -1
You can see that it gets all the bars (close only in graph) and keeps it up to date.
来源:https://stackoverflow.com/questions/62794972/pandas-dataframe-and-series-ib-tws-historicaldata