问题
I am trying to pass time-series data from Python
to q/kdb+
.
One solution out there is qPython module, offering seamless conversion from q
table/dictionary to Pandas.
The problem is when trying to pass from Pandas to q
, the time index in DataFrame
(in the column Date
) doesn't quite make it to the q
side. Reproducible code:
import pandas.io.data as web
import datetime
import numpy
import qpython.qconnection as qconnection # requires installation of qPython module from https://github.com/exxeleron/qPython
start = datetime.datetime(2010, 1, 1)
end = datetime.datetime(2015, 2, 6)
f=web.DataReader("F", 'yahoo', start, end) # download Ford stock data (ticker "F") from Yahoo Finance web service
f.ix[:5] # explore first 5 rows of the DataFrame
# Out:
# Open High Low Close Volume Adj Close
# Date
# 2010-01-04 10.17 10.28 10.05 10.28 60855800 9.43
# 2010-01-05 10.45 11.24 10.40 10.96 215620200 10.05
# 2010-01-06 11.21 11.46 11.13 11.37 200070600 10.43
# 2010-01-07 11.46 11.69 11.32 11.66 130201700 10.69
# 2010-01-08 11.67 11.74 11.46 11.69 130463000 10.72
q = qconnection.QConnection(host = 'localhost', port = 5000, pandas = True) # define connection interface parameters. Assumes we have previously started q server on port 5000 with `q.exe -p 5000` command
q.open() # open connection
q('set', numpy.string_('yahoo'), f) # pass DataFrame to q table named `yahoo`
q('5#yahoo') # display top 5 rows from newly created table on q server
# Out:
# Open High Low Close Volume Adj Close
# 0 10.17 10.28 10.05 10.28 60855800 9.43
# 1 10.45 11.24 10.40 10.96 215620200 10.05
# 2 11.21 11.46 11.13 11.37 200070600 10.43
# 3 11.46 11.69 11.32 11.66 130201700 10.69
# 4 11.67 11.74 11.46 11.69 130463000 10.72
As you can see, the q table doesn't have the Date
column that was present in f
DataFrame as index.
How to efficiently (for large data) pass the datetime index to q?
回答1:
While serializing DataFrame
objects the qPython
checks for the presence of meta
attribute. If the attribute is not present, DataFrame
is serialized as q table and index columns are skipped in the process. If you want to preserve the index columns, you have to set the meta
attribute and provide type hinting to enforce representation a q keyed table.
Please take a look at the modified sample:
import pandas.io.data as web
import datetime
import numpy
import qpython.qconnection as qconnection # requires installation of qPython module from https://github.com/exxeleron/qPython
from qpython import MetaData
from qpython.qtype import QKEYED_TABLE
start = datetime.datetime(2010, 1, 1)
end = datetime.datetime(2015, 2, 6)
f=web.DataReader("F", 'yahoo', start, end) # download Ford stock data (ticker "F") from Yahoo Finance web service
f.ix[:5] # explore first 5 rows of the DataFrame
# Out:
# Open High Low Close Volume Adj Close
# Date
# 2010-01-04 10.17 10.28 10.05 10.28 60855800 9.43
# 2010-01-05 10.45 11.24 10.40 10.96 215620200 10.05
# 2010-01-06 11.21 11.46 11.13 11.37 200070600 10.43
# 2010-01-07 11.46 11.69 11.32 11.66 130201700 10.69
# 2010-01-08 11.67 11.74 11.46 11.69 130463000 10.72
q = qconnection.QConnection(host = 'localhost', port = 5000, pandas = True) # define connection interface parameters. Assumes we have previously started q server on port 5000 with `q.exe -p 5000` command
q.open() # open connection
f.meta = MetaData(**{'qtype': QKEYED_TABLE}) # enforce to serialize DataFrame as keyed table
q('set', numpy.string_('yahoo'), f) # pass DataFrame to q table named `yahoo`
q('5#yahoo') # display top 5 rows from newly created table on q server
# Out:
# Open High Low Close Volume Adj Close
# Date
# 2010-01-04 10.17 10.28 10.05 10.28 60855800 9.43
# 2010-01-05 10.45 11.24 10.40 10.96 215620200 10.05
# 2010-01-06 11.21 11.46 11.13 11.37 200070600 10.43
# 2010-01-07 11.46 11.69 11.32 11.66 130201700 10.69
# 2010-01-08 11.67 11.74 11.46 11.69 130463000 10.72
回答2:
Have you considered trying one of the other APIs? http://www.timestored.com/kdb-guides/python-api I would also report the issue on their github page.
来源:https://stackoverflow.com/questions/28385137/pandas-dataframe-drops-index-when-passing-to-kdb-using-qpython-api