问题
I must be misunderstanding how read.csv works in R. I have read the help file, but still do not understand how a csv file containing:
40900,-,-,-,241.75,0
40905,244,245.79,241.25,244,22114
40906,244,246.79,243.6,245.5,18024
40907,246,248.5,246,247,60859
read into R using: euk<-data.matrix(read.csv("path\to\csv.csv"))
produces this as a result (using tail
):
Date Open High Low Close Volume
[2713,] 15329 490 404 369 240.75 62763
[2714,] 15330 495 409 378 242.50 127534
[2715,] 15331 1 1 1 241.75 0
[2716,] 15336 504 425 385 244.00 22114
[2717,] 15337 504 432 396 245.50 18024
[2718,] 15338 512 442 405 247.00 60859
It must be something obvious that I do not understand. Please be kind in your responses, I am trying to learn.
Thanks!
回答1:
The issue is not with read.csv
, but with data.matrix
. read.csv
imports any column with characters in it as a factor. The '-' in the first row for your dataset are character, so the column is converted to a factor. Now, you pass the result of the read.csv
into data.matrix
, and as the help states, it replaces the levels of the factor with it's internal codes.
Basically, you need to insure that the columns of your data are numeric before you pass the data.frame into data.matrix
.
This should work in your case (assuming the only characters are '-'):
euk <- data.matrix(read.csv("path/to/csv.csv", na.strings = "-", colClasses = 'numeric'))
回答2:
I'm no R expert, but you may consider using scan()
instead, eg:
> data = scan("foo.csv", what = list(x = numeric(), y = numeric()), sep = ",")
Where foo.csv has two columns, x and y, and is comma delimited. I hope that helps.
回答3:
I took a cut/paste of your data, put it in a file and I get this using 'R'
> c<-data.matrix(read.csv("c:/DOCUME~1/Philip/LOCALS~1/Temp/x.csv",header=F))
> c
V1 V2 V3 V4 V5 V6
[1,] 40900 1 1 1 241.75 0
[2,] 40905 2 2 2 244.00 22114
[3,] 40906 2 3 3 245.50 18024
[4,] 40907 3 4 4 247.00 60859
>
There must be more in your data file, for one thing, data for the header line. And the output you show seems to start with row 2713. I would check:
The format of the header line, or get rid of it and add it manually later.
That each row has exactly 6 values.
The the filename uses forward slashes and has no embedded spaces
(use the 8.3 representation as shown in my filename).
Also, if you generated your csv file from MS Excel, the internal representation for a date is a number.
来源:https://stackoverflow.com/questions/16242584/r-correct-use-of-read-csv