Reading Tab Delimited Data in to R

拜拜、爱过 提交于 2019-12-18 18:52:11

问题


I am trying to read a large tab delimited file in to R.

First I tried this:

data <- read.table("data.csv", sep="\t")

But it is reading some of the numeric variables in as factors

So I tried to read in the data based on what type I want each variable to be like this:

data <- read.table("data.csv", sep="\t", colClasses=c("character","numeric","numeric","character","boolean","numeric"))

But when I try this it gives me an error:

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : scan() expected 'a real', got '"4"'

I think it might be that there are quotes around some of the numeric values in the original raw file, but I'm not sure.


回答1:


Without seeing your data, you have one of a few things: you don't have all tabs separating the data; there are embeded tabs in single observations; or a litnay of others.

The way you can sort this out is to set options(stringsAsFactors=FALSE) then use your first line.

Check out str(data) and try to figure out which rows are the culprits. The reason some of the numeric values are reading as factors is because there is something in that column that R is interpreting as a character and so it coerces the whole column to character. It usually takes some digging but the problem is almost surely with your input file.

This is a common data munging issue, good luck!




回答2:


x <- paste("'",floor(runif(10,0,10)),"'",sep="")
x

 [1] "'7'" "'3'" "'0'" "'3'" "'9'" "'1'" "'4'" "'8'" "'5'" "'8'"

as.numeric(gsub("'", "",x))

 [1] 7 3 0 3 9 1 4 8 5 8


来源:https://stackoverflow.com/questions/11675917/reading-tab-delimited-data-in-to-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!