Imported a csv-dataset to R but the values becomes factors

折月煮酒 提交于 2019-11-27 00:48:30

Both the data import function (here: read.csv()) as well as a global option offer you to say stringsAsFactors=FALSE which should fix this.

By default, read.csv checks the first few rows of your data to see whether to treat each variable as numeric. If it finds non-numeric values, it assumes the variable is character data, and character variables are converted to factors.

It looks like the PTS and MP variables in your dataset contain non-numerics, which is why you're getting unexpected results. You can force these variables to numeric with

point <- as.numeric(as.character(point))
time <- as.numeric(as.character(time))

But any values that can't be converted will become missing. (The R FAQ gives a slightly different method for factor -> numeric conversion but I can never remember what it is.)

Sam

You can set this globally for all read.csv/read.* commands with options(stringsAsFactors=F)

Then read the file as follows: my.tab <- read.table( "filename.csv", as.is=T )

When importing csv data files the import command should reflect both the data seperation between each column (;) and the float-number seperator for your numeric values (for numerical variable = 2,5 this would be ",").

The command for importing a csv, therefore, has to be a bit more comprehensive with more commands:

    stuckey <- read.csv2("C:/kalle/R/stuckey.csv", header=TRUE, sep=";", dec=",")

This should import all variables as either integers or numeric.

I'm new to R as well and faced the exact same problem. But then I looked at my data and noticed that it is being caused due to the fact that my csv file was using a comma separator (,) in all numeric columns (Ex: 1,233,444.56 instead of 1233444.56).

I removed the comma separator in my csv file and then reloaded into R. My data frame now recognises all columns as numbers.

I'm sure there's a way to handle this within the read.csv function itself.

This only worked right for me when including strip.white = TRUE in the read.csv command.

(I found the solution here.)

for me the solution was to include skip = 0 (number of rows to skip at the top of the file. Can be set >0)

mydata <- read.csv(file = "file.csv", header = TRUE, sep = ",", skip = 22)

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!