I have a character data frame in R which has NaN
s in it. I need to remove any row with a NaN
and then convert it to a numeric data frame.
I
As @thijs van den bergh points you to,
dat <- data.frame(x=c("NaN","2"),y=c("NaN","3"),stringsAsFactors=FALSE)
dat <- as.data.frame(sapply(dat, as.numeric)) #<- sapply is here
dat[complete.cases(dat), ]
# x y
#2 2 3
Is one way to do this.
Your error comes from trying to make a data.frame
numeric. The sapply
option I show is instead making each column vector numeric.
Note that data.frames
are not numeric
or character
, but rather are a list
which can be all numeric
columns, all character
columns, or a mix of these or other types (e.g.: Date
/logical
).
dat <- data.frame(x=c("NaN","2"),y=c("NaN","3"),stringsAsFactors=FALSE)
is.list(dat)
# [1] TRUE
The example data just has two character columns:
> str(dat)
'data.frame': 2 obs. of 2 variables:
$ x: chr "NaN" "2"
$ y: chr "NaN" "3
...which you could add a numeric column to like so:
> dat$num.example <- c(6.2,3.8)
> dat
x y num.example
1 NaN NaN 6.2
2 2 3 3.8
> str(dat)
'data.frame': 2 obs. of 3 variables:
$ x : chr "NaN" "2"
$ y : chr "NaN" "3"
$ num.example: num 6.2 3.8
So, when you try to do as.numeric
R gets confused because it is wondering how to convert this list object which may have multiple types in it. user1317221_G
's answer uses the ?sapply
function, which can be used to apply a function to the individual items of an object. You could alternatively use ?lapply
which is a very similar function (read more on the *apply
functions here - R Grouping functions: sapply vs. lapply vs. apply. vs. tapply vs. by vs. aggregate )
I.e. - in this case, to each column of your data.frame
, you can apply the as.numeric
function, like so:
data.frame(lapply(dat,as.numeric))
The lapply
call is wrapped in a data.frame
to make sure the output is a data.frame
and not a list
. That is, running:
lapply(dat,as.numeric)
will give you:
> lapply(dat,as.numeric)
$x
[1] NaN 2
$y
[1] NaN 3
$num.example
[1] 6.2 3.8
While:
data.frame(lapply(dat,as.numeric))
will give you:
> data.frame(lapply(dat,as.numeric))
x y num.example
1 NaN NaN 6.2
2 2 3 3.8