Why doesn't rle accept a factor as input?

前端 未结 1 1043
有刺的猬
有刺的猬 2021-01-18 12:03

I\'m having trouble passing this rle function on a data.frame. Function works great on another set:

fgroup <- aggregate(fevents2         


        
1条回答
  •  旧巷少年郎
    2021-01-18 12:15

    The problem is that a factor is *not* an atomic vector as the error clearly says. Either convert all the factors to characters first (and not by coercing them to numeric!) or do the conversion inside the anonymous function you are applying.

    So this, which implements the second idea, works:

    aggregate(fevents2[,3:14], list(weeks = fevents2[, 1]),
              function(x) rle(as.character(x))$values)
    

    after a fashion:

    > aggregate(fevents2[,3:14], list(weeks = fevents2[, 1]),
    +           function(x) rle(as.character(x))$values)
      weeks vv.1 vv.2 vv.3 vv.4 vv.5 vv.6 rv.1 rv.2 rv.3 rv.4 rv.5 rv.6 rv.7 ja.1
    1     1 C RR G RR  nil C AA G AA  nil  nil C VB G VB  nil C VB G VB  nil C VV
      ja.2 ja.3 ja.4 aa.1 aa.2 bv.1 bv.2 bv.3 aj.1 aj.2 aj.3 aj.4 aj.5 vb.1 vb.2
    1  nil C VV G VV C AJ  nil  nil C VR G VR C RJ  nil C RV G RV  nil C AA  nil
      vb.3 vb.4 vb.5 rj.1 rj.2   rr vr.1 vr.2 vr.3 vr.4 vr.5   bb jr.1 jr.2 jr.3
    1 C AJ  nil C AJ C JR G JR C BB C JA  nil C RJ  nil C RJ C BV  nil C VB G VB
      jr.4 jr.5
    1  nil C JA
    

    though I am not sure what you expected to get - there is only one week here and aggregate and rle have stuck all the values together. Did you want separate $values for each of the variables in fevents2 that you are aggregating over?

    Another thing:

    as.numeric(as.character(fevents2)) can't possibly work as the data are not numeric! and you can't apply those functions to a data frame and get anything like what you intended - if they work at all.

    The sapply() thing should work. Here is a version that checks whether each variable is a factor or not and coerces it if it is:

    fevents3 <- sapply(fevents2,
                       function(x) if(is.factor(x)) { as.character(x) } else { x })
    

    But note sapply() simplifies to a matrix which will change the aggregate() method dispatched:

    > class(fevents3)
    [1] "matrix"
    

    Instead perhaps

    fevents3 <- lapply(fevents2,
                       function(x) if(is.factor(x)) { as.character(x) } else { x })
    fevents3 <- data.frame(fevents3, stringsAsFactors = FALSE)
    

    Now if you wanted to apply rle() to each column of the split-up data and keep the separate how about

    spl <- split(fevents3, list(weeks = fevents3[, 1]))
    res <- lapply(spl, function(x) lapply(x[, 3:14], function(y) rle(y)$values))
    

    which gives

    > res
    $`1`
    $`1`$vv
    [1] "C RR" "G RR" "nil"  "C AA" "G AA" "nil" 
    
    $`1`$rv
    [1] "nil"  "C VB" "G VB" "nil"  "C VB" "G VB" "nil" 
    
    $`1`$ja
    [1] "C VV" "nil"  "C VV" "G VV"
    
    $`1`$aa
    [1] "C AJ" "nil" 
    
    $`1`$bv
    [1] "nil"  "C VR" "G VR"
    
    $`1`$aj
    [1] "C RJ" "nil"  "C RV" "G RV" "nil" 
    
    $`1`$vb
    [1] "C AA" "nil"  "C AJ" "nil"  "C AJ"
    
    $`1`$rj
    [1] "C JR" "G JR"
    
    $`1`$rr
    [1] "C BB"
    
    $`1`$vr
    [1] "C JA" "nil"  "C RJ" "nil"  "C RJ"
    
    $`1`$bb
    [1] "C BV"
    
    $`1`$jr
    [1] "nil"  "C VB" "G VB" "nil"  "C JA"
    

    Which is the same answer as that for aggregate() above, but with each rle() output kept separate:

    > unlist(res)
     1.vv1  1.vv2  1.vv3  1.vv4  1.vv5  1.vv6  1.rv1  1.rv2  1.rv3  1.rv4  1.rv5 
    "C RR" "G RR"  "nil" "C AA" "G AA"  "nil"  "nil" "C VB" "G VB"  "nil" "C VB" 
     1.rv6  1.rv7  1.ja1  1.ja2  1.ja3  1.ja4  1.aa1  1.aa2  1.bv1  1.bv2  1.bv3 
    "G VB"  "nil" "C VV"  "nil" "C VV" "G VV" "C AJ"  "nil"  "nil" "C VR" "G VR" 
     1.aj1  1.aj2  1.aj3  1.aj4  1.aj5  1.vb1  1.vb2  1.vb3  1.vb4  1.vb5  1.rj1 
    "C RJ"  "nil" "C RV" "G RV"  "nil" "C AA"  "nil" "C AJ"  "nil" "C AJ" "C JR" 
     1.rj2   1.rr  1.vr1  1.vr2  1.vr3  1.vr4  1.vr5   1.bb  1.jr1  1.jr2  1.jr3 
    "G JR" "C BB" "C JA"  "nil" "C RJ"  "nil" "C RJ" "C BV"  "nil" "C VB" "G VB" 
     1.jr4  1.jr5 
     "nil" "C JA" 
    > aggregate(fevents2[,3:14], list(weeks = fevents2[, 1]),
    +           function(x) rle(as.character(x))$values)
      weeks vv.1 vv.2 vv.3 vv.4 vv.5 vv.6 rv.1 rv.2 rv.3 rv.4 rv.5 rv.6 rv.7 ja.1
    1     1 C RR G RR  nil C AA G AA  nil  nil C VB G VB  nil C VB G VB  nil C VV
      ja.2 ja.3 ja.4 aa.1 aa.2 bv.1 bv.2 bv.3 aj.1 aj.2 aj.3 aj.4 aj.5 vb.1 vb.2
    1  nil C VV G VV C AJ  nil  nil C VR G VR C RJ  nil C RV G RV  nil C AA  nil
      vb.3 vb.4 vb.5 rj.1 rj.2   rr vr.1 vr.2 vr.3 vr.4 vr.5   bb jr.1 jr.2 jr.3
    1 C AJ  nil C AJ C JR G JR C BB C JA  nil C RJ  nil C RJ C BV  nil C VB G VB
      jr.4 jr.5
    1  nil C JA
    

    [Note: This is only true here because the data snippet you show has just one week. I can't recall how unlist(res)) will look if there is more than one week.]

    0 讨论(0)
提交回复
热议问题