Creating a table with individual trials from a frequency table in R (inverse of table function)

╄→尐↘猪︶ㄣ 提交于 2019-12-19 06:31:36

问题


I have a frequency table of data in a data.frame in R listing factor levels and counts of successes and failures. I would like to turn it from frequency table into a list of events - i.e. the opposite of the "table" command. Specifically, I would like to turn this:

factor.A factor.B success.count fail.count
-------- -------- ------------- ----------
 0        1        0             2
 1        1        2             1

into this:

factor.A factor.B result 
-------- -------- -------
 0        1        0
 0        1        0
 1        1        1
 1        1        1
 1        1        0

It seems to me that reshape ought to do this, or even some obscure base function that I have not heard of, but I've had no luck. Even repeating individual rows of a data.frame is tricky - how do you pass a variable number of arguments to rbind?

Tips?

Background: Why? Because it it easier to cross-validate logistic fits to such a data set than the aggregated binomial data.

I'm analysing my with a generalised linear model as binomial regression in R and would like to cross validate to control regularisation of my data since my purpose is predictive.

However, as far as I can tell, the default cross validation routines in R are not great for binomial data, simply skipping entire rows of the frequency table, rather than trials individually. This means that lightly and heavily sampled factor combinations have the same weight in my cost function, which is inappropriate for my data.


回答1:


You may try this:

# create 'result' vector
# repeat 1s and 0s the number of times given in the respective 'count' column
result <- rep(rep(c(1, 0), nrow(df)), unlist(df[ , c("success.count", "fail.count")]))

# repeat each row in df the number of times given by the sum of 'count' columns
data.frame(df[rep(1:nrow(df), rowSums(df[ , c("success.count", "fail.count")]) ), c("factor.A", "factor.B")], result)

#     factor.A factor.B result
# 1          0        1      0
# 1.1        0        1      0
# 2          1        1      1
# 2.1        1        1      1
# 2.2        1        1      0



回答2:


Try this

  x = matrix( c(0, 1, 1, 1, 0 , 2, 2, 1), 2, 4)
  r= c()
  for(i in 1:nrow(x)) {
    r = c(r, rep(c(x[i, 1:2], 1), x[i, 3]))
    r = c(r, rep(c(x[i, 1:2], 0), x[i, 4]))
  }
  t(matrix(r, nrow= 3))



回答3:


For a tidyverse-style solution you could do

library(tidyverse)

df %>% gather(key = result, value = incidence, success.count, fail.count) %>% 
     mutate(result = if_else(result %>% str_detect("success"), 1, 0)) %>%
     pmap_dfr(function(factor.A, factor.B, result, incidence) 
                   { tibble(factor.A = factor.A,
                            factor.B = factor.B,
                            result = rep(result, times = incidence)
                            )
                   }
               )


来源:https://stackoverflow.com/questions/22822922/creating-a-table-with-individual-trials-from-a-frequency-table-in-r-inverse-of

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!