R ddply, applying if and ifelse functions

本秂侑毒 提交于 2020-01-01 09:42:53

问题


I'm trying to apply a function to a dataframe using ddply from the plyr package, but I'm getting some results that I don't understand. I have 3 questions about the results

Given:

mydf<- data.frame(c(12,34,9,3,22,55),c(1,2,1,1,2,2)
                  , c(0,1,2,1,1,2))
colnames(mydf)[1] <- 'n'
colnames(mydf)[2] <- 'x'
colnames(mydf)[3] <- 'x1'

mydf looks like this:

   n x x1
1 12 1  0
2 34 2  1
3  9 1  2
4  3 1  1
5 22 2  1
6 55 2  2

Question #1

If I do:

k <- function(x) {
  mydf$z <- ifelse(x == 1, 0, mydf$n)
  return (mydf)
}
mydf <- ddply(mydf, c("x") , .fun = k, .inform = TRUE)

I get the following error:

Error in `$<-.data.frame`(`*tmp*`, "z", value = structure(c(12, 34, 9,  : 
  replacement has 3 rows, data has 6
Error: with piece 1: 
   n x x1
1 12 1  0
2  9 1  2
3  3 1  1

I get this error regardless of whether I specify the variable to split by as c("x"), "x", or .(x). I don't understand why I'm getting this error message.

Question #2

But, what I really want to do is set up an if/else function because my dataset has variables x1, x2, x3, and x4 and I want to take those variables into account as well. But when I try something simple such as:

j <- function(x) {
  if(x == 1){
    mydf$z <- 0
  } else {
    mydf$z <- mydf$n
  }
  return(mydf)
  }

mydf <- ddply(mydf, x, .fun = j, .inform = TRUE)

I get:

Warning messages:
1: In if (x == 1) { :
  the condition has length > 1 and only the first element will be used
2: In if (x == 1) { :
  the condition has length > 1 and only the first element will be used

Question #3

I'm confused about to use function() and when to use function(x). Using function() for either j() or k() gives me a different error:

Error in .fun(piece, ...) : unused argument (piece)
Error: with piece 1: 
    n x x1  z
1  12 1  0 12
2   9 1  2  9
3   3 1  1  3
4  12 1  0 12
5   9 1  2  9
6   3 1  1  3
7  12 1  0 12
8   9 1  2  9
9   3 1  1  3
10 12 1  0 12
11  9 1  2  9
12  3 1  1  3

where column z is not correct. Yet I see a lot of functions written as function().

I sincerely appreciate any comments that can help me out with this


回答1:


There's a lot that needs explaining here. Let's start with the simplest case. In your first example, all you need is:

mydf$z <- with(mydf,ifelse(x == 1,0,n))

An equivalent ddply solution might look like this:

ddply(mydf,.(x),transform,z = ifelse(x == 1,0,n))

Probably your biggest source of confusion is that you seem to not understand what is being passed as arguments to functions within ddply.

Consider your first attempt:

k <- function(x) {
  mydf$z <- ifelse(x == 1, 0, mydf$n)
  return (mydf)
}

The way ddply works is that it splits mydf up into several, smaller data frame, based on the values in the column x. That means that each time ddply calls k, the argument passed to k is a data frame. Specifically, a subset of you primary data frame.

So within k, x is a subset of mydf, with all the columns. You should not be trying to modify mydf from within k. Modify x, and then return the modified version. (If you must, but the options I displayed above are better.) So we might re-write your k like this:

k <- function(x) {
  x$z <- ifelse(x$x == 1, 0, x$n)
  return (x)
}

Note that you've created some confusing stuff by using x as both an argument to k and as the name of one of our columns.



来源:https://stackoverflow.com/questions/18520674/r-ddply-applying-if-and-ifelse-functions

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!