问题
The highfrequency
package has been created in a way to transform .txt
and .csv
files from the NYSE TAQ and WRDS TAQ respectively into .RData
files of xts objects, which then can be easily manipulated through the package.
The problem is that I have limited access to the WRDS database which only enables me to download tick-data from the CRSP (The Center for Research in Security Prices) database but not the TAQ (Trades and Quotes) database. So my data look like this. The downloadable file contains tick-data for the REIT index from 2014-01-01 to 2014-01-05. I changed manually the ticker header for the header PRICE as it is proposed by Kris Boudt, one of the main authors.
The code that I use is the following:
from="2014-03-01"
to="2014-04-31"
datasource="C:/Users/aris/Desktop/raw_data"
datadestination="C:/Users/aris/Desktop/xts_data"
convert(from = from,to=to,datasource = datasource,datadestination = datadestination,
trades=TRUE,quotes=FALSE,ticker="REIT",dir=FALSE,extension="csv",header = TRUE,
tradecolnames = NULL, quotecolnames = NULL,format = "%Y%m%d %H:%M:%S",onefile=TRUE)
I suspect that the problem lies at the line format = "%Y%m%d %H:%M:%S"
, as at the .csv
file the date and the time are comma separated. I tried to put a comma between %d and %H
like this format = "%Y%m%d,%H:%M:%S"
but nothing.
The error reads
Error in `$<-.data.frame`(`*tmp*`, "COND", value = numeric(0)) :
replacement has 0 rows, data has 1048575
All the suggestions are welcomed.
回答1:
Thanks to Joshua Ulrich I was able to gain some additional intuition and solve the problem(s). Actually, there is no need to manipulate the .csv file itself and add extra columns. Instead of setting tradecolnames = NULL
you let the machine know which columns are contained into your file by setting tradecolnames = c("DATE","TIME","PRICE")
. The problem with the non-existent directories is fixed by setting dir=TRUE
. The final code looks like this:
from="2014-03-01"
to="2014-04-31"
datasource="C:/Users/aris/Desktop/raw_data"
datadestination="C:/Users/aris/Desktop/xts_data"
convert(from,to,datasource,datadestination,trades=TRUE,quotes=FALSE,ticker="REIT",dir=TRUE,extension="csv",header= TRUE,tradecolnames=c("DATE","TIME","PRICE"),format = "%Y%m%d %H:%M:%S",onefile=TRUE)
回答2:
The highfrequency::convert
function calls highfrequency:::makeXtsTrades
, which expects the following columns in your text file: DATE,TIME,PRICE,SIZE,SYMBOL,EX,COND,CORR,G127.
I added empty columns to your text file, and did not get the error in your question. The edited text file looks like:
DATE,TIME,PRICE,SIZE,SYMBOL,EX,COND,CORR,G127
20140102,9:30:00,1123.77,,,,,,
20140102,9:30:01,1122.81,,,,,,
20140102,9:30:02,1122.77,,,,,,
I got another error though.
Error in gzfile(file, "wb") : cannot open the connection
In addition: Warning message:
In gzfile(file, "wb") :
cannot open compressed file '/home/josh/Desktop/z_xts/2014-01-02/REIT_trades.RData', probable reason 'No such file or directory'
So it looks like the convert
function expects all the daily output directories to exist before you run it. The function runs and creates the output after I create those directories.
来源:https://stackoverflow.com/questions/38326286/convert-csv-file-for-further-manipulation-using-highfrequency-package-on-r