How do I sub sample data by group using ddply?

丶灬走出姿态 提交于 2020-01-13 08:14:20

问题


I've got a data frame with far too many rows to be able to do a spatial correlogram. Instead, I want to grab 40 rows for each species and run my correlogram on that subset.

I wrote a function to subset a data frame as follows:

    samp <- function(dataf)
{
    dataf[sample(1:dim(dataf)[1], size=40, replace=FALSE),]
}

Now I want to apply this function to each species in a larger data frame.

When I try something like

culled_data = ddply (larger_data, .(species), subset, samp)

I get this error:

Error in subset.data.frame(piece, ...) : 
  'subset' must evaluate to logical

Anyone got ideas on how to do this?


回答1:


It looks like it should work once you remove , subset from your call.




回答2:


Dirk answer is of course correct, but to add additional explanation I post my own.

Why your call don't work?

First of all your syntax is a shorthand. It's equivalent of

ddply(larger_data, .(species), function(dfrm) subset(dfrm, samp))

so you can clearly see that you provide function (see class(samp)) as second argument of subset. You could use samp(dfrm), but it won't work too cause samp return data.frame and subset need logical vector. So you could use samp(dfrm) when it returns logical indexing.

How to use subset in this case?

Make subset work by feed him with logical vector:

ddply (larger_data, .(species), subset, sample(seq_along(species)<=40))

I create logical vector with 40 TRUE (btw it works when for some spieces is less then 40 cases, then it return all) and random it.



来源:https://stackoverflow.com/questions/2923092/how-do-i-sub-sample-data-by-group-using-ddply

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!