R correct use of read.csv

守給你的承諾、 提交于 2019-12-11 05:39:43

问题


I must be misunderstanding how read.csv works in R. I have read the help file, but still do not understand how a csv file containing:

40900,-,-,-,241.75,0
40905,244,245.79,241.25,244,22114
40906,244,246.79,243.6,245.5,18024
40907,246,248.5,246,247,60859

read into R using: euk<-data.matrix(read.csv("path\to\csv.csv"))

produces this as a result (using tail):

         Date Open High Low  Close Volume
[2713,] 15329  490  404 369 240.75  62763
[2714,] 15330  495  409 378 242.50 127534
[2715,] 15331    1    1   1 241.75      0
[2716,] 15336  504  425 385 244.00  22114
[2717,] 15337  504  432 396 245.50  18024
[2718,] 15338  512  442 405 247.00  60859

It must be something obvious that I do not understand. Please be kind in your responses, I am trying to learn.

Thanks!


回答1:


The issue is not with read.csv, but with data.matrix. read.csv imports any column with characters in it as a factor. The '-' in the first row for your dataset are character, so the column is converted to a factor. Now, you pass the result of the read.csv into data.matrix, and as the help states, it replaces the levels of the factor with it's internal codes.

Basically, you need to insure that the columns of your data are numeric before you pass the data.frame into data.matrix.

This should work in your case (assuming the only characters are '-'):

euk <- data.matrix(read.csv("path/to/csv.csv", na.strings = "-", colClasses = 'numeric'))



回答2:


I'm no R expert, but you may consider using scan() instead, eg:

> data = scan("foo.csv", what = list(x = numeric(), y = numeric()), sep = ",")

Where foo.csv has two columns, x and y, and is comma delimited. I hope that helps.




回答3:


I took a cut/paste of your data, put it in a file and I get this using 'R'

> c<-data.matrix(read.csv("c:/DOCUME~1/Philip/LOCALS~1/Temp/x.csv",header=F))
> c
        V1 V2 V3 V4     V5    V6
[1,] 40900  1  1  1 241.75     0
[2,] 40905  2  2  2 244.00 22114
[3,] 40906  2  3  3 245.50 18024
[4,] 40907  3  4  4 247.00 60859
> 

There must be more in your data file, for one thing, data for the header line. And the output you show seems to start with row 2713. I would check:

The format of the header line, or get rid of it and add it manually later.
That each row has exactly 6 values.
The the filename uses forward slashes and has no embedded spaces 
(use the 8.3 representation as shown in my filename).

Also, if you generated your csv file from MS Excel, the internal representation for a date is a number.



来源:https://stackoverflow.com/questions/16242584/r-correct-use-of-read-csv

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!