Preserve names when coercing vector from binary to `as.numeric`?

后端 未结 2 1949
误落风尘
误落风尘 2021-02-18 22:08

In R, when you coerce a vector from binary to numeric, the names are stripped away.

There are a few possible solutions, which I\'ve outlined before. It seems dangerous t

2条回答
  •  小鲜肉
    小鲜肉 (楼主)
    2021-02-18 22:49

    Just to throw another option out there, since your input is a logical vector, you can use ifelse(). And one could argue this approach is more explicit and straightforward:

    ifelse(example_vec,1L,0L);
    ## Y N M P L J H O F D
    ## 0 1 0 0 1 1 1 1 1 1
    

    Benchmarking

    library(microbenchmark);
    
    ifelse. <- function(x) ifelse(x,1L,0L);
    sapply. <- function(x) sapply(x,as.integer);
    setstoragemode <- function(x) { storage.mode(x) <- 'integer'; x; };
    setmode <- function(x) { mode(x) <- 'integer'; x; };
    setclass <- function(x) { class(x) <- 'integer'; x; };
    as.and.setnames <- function(x) setNames(as.integer(x),names(x));
    plus <- function(x) +x;
    addzero <- function(x) x+0L;
    

    ## small scale (OP's example input)
    set.seed(1045L);
    x <- sample(c(T,F),10L,T);
    names(x) <- sample(LETTERS,10L);
    
    ex <- ifelse.(x);
    identical(ex,sapply.(x));
    ## [1] TRUE
    identical(ex,setstoragemode(x));
    ## [1] TRUE
    identical(ex,setmode(x));
    ## [1] TRUE
    identical(ex,setclass(x));
    ## [1] TRUE
    identical(ex,as.and.setnames(x));
    ## [1] TRUE
    identical(ex,plus(x));
    ## [1] TRUE
    identical(ex,addzero(x));
    ## [1] TRUE
    
    microbenchmark(ifelse.(x),sapply.(x),setstoragemode(x),setmode(x),setclass(x),as.and.setnames(x),plus(x),addzero(x));
    ## Unit: nanoseconds
    ##                expr   min      lq     mean median      uq   max neval
    ##          ifelse.(x)  6843  8126.0  9627.13   8981  9837.0 21810   100
    ##          sapply.(x) 18817 20100.5 23234.93  21383 22666.5 71418   100
    ##   setstoragemode(x)   856  1283.0  1745.54   1284  1711.0 15396   100
    ##          setmode(x)  7270  8126.0  9862.36   8982 10264.0 32074   100
    ##         setclass(x)   429  1283.0  2138.97   1284  1712.0 32075   100
    ##  as.and.setnames(x)  1283  1711.0  1997.78   1712  2139.0  7271   100
    ##             plus(x)     0   428.0   492.39    428   428.5  9837   100
    ##          addzero(x)     0   428.0   539.39    428   856.0  2566   100
    

    ## large scale
    set.seed(1L);
    N <- 1e5L;
    x <- sample(c(T,F),N,T);
    names(x) <- make.unique(rep_len(LETTERS,N));
    
    ex <- ifelse.(x);
    identical(ex,sapply.(x));
    ## [1] TRUE
    identical(ex,setstoragemode(x));
    ## [1] TRUE
    identical(ex,setmode(x));
    ## [1] TRUE
    identical(ex,setclass(x));
    ## [1] TRUE
    identical(ex,as.and.setnames(x));
    ## [1] TRUE
    identical(ex,plus(x));
    ## [1] TRUE
    identical(ex,addzero(x));
    ## [1] TRUE
    
    microbenchmark(ifelse.(x),sapply.(x),setstoragemode(x),setmode(x),setclass(x),as.and.setnames(x),plus(x),addzero(x));
    ## Unit: microseconds
    ##                expr       min          lq         mean      median         uq        max neval
    ##          ifelse.(x)  7633.598   7757.1900  16615.71251   7897.4600  29401.112  96503.642   100
    ##          sapply.(x) 86353.737 102576.0945 125547.32957 123909.1120 137900.406 264442.788   100
    ##   setstoragemode(x)    84.676     92.8015    343.46124     98.3605    113.543  23939.133   100
    ##          setmode(x)   124.020    155.0245    603.15744    167.2125    181.111  22395.736   100
    ##         setclass(x)    85.104     92.3740    328.25393    100.2850    118.460  21807.713   100
    ##  as.and.setnames(x)    70.991     78.2610    656.98177     82.3235     88.953  35710.697   100
    ##             plus(x)    40.200     42.9795     48.68026     44.9040     49.608     88.953   100
    ##          addzero(x)   181.326    186.4580    196.34882    189.6650    201.211    282.679   100
    

    ## very large scale
    set.seed(1L);
    N <- 1e7L;
    x <- sample(c(T,F),N,T);
    names(x) <- make.unique(rep_len(LETTERS,N));
    
    ex <- ifelse.(x);
    identical(ex,sapply.(x));
    ## [1] TRUE
    identical(ex,setstoragemode(x));
    ## [1] TRUE
    identical(ex,setmode(x));
    ## [1] TRUE
    identical(ex,setclass(x));
    ## [1] TRUE
    identical(ex,as.and.setnames(x));
    ## [1] TRUE
    identical(ex,plus(x));
    ## [1] TRUE
    identical(ex,addzero(x));
    ## [1] TRUE
    
    microbenchmark(ifelse.(x),sapply.(x),setstoragemode(x),setmode(x),setclass(x),as.and.setnames(x),plus(x),addzero(x),times=5L);
    ## Unit: milliseconds
    ##                expr          min           lq         mean       median           uq         max neval
    ##          ifelse.(x)  1082.220903  1308.106967  3452.639836  1473.723533  6306.320235  7092.82754     5
    ##          sapply.(x) 16766.199371 17431.458634 18401.672635 18398.345499 18843.890150 20568.46952     5
    ##   setstoragemode(x)    13.298283    13.648103   173.574496    19.661753    24.736278   796.52806     5
    ##          setmode(x)    19.043796    19.878573    75.669779    19.969235    39.683589   279.77370     5
    ##         setclass(x)    14.025292    14.119804   259.627934    14.414457    26.838618  1228.74150     5
    ##  as.and.setnames(x)    12.889875    24.241484   178.243948    24.962934    25.103631   804.02182     5
    ##             plus(x)     7.577576     7.676364     9.047674     8.245142     8.253266    13.48602     5
    ##          addzero(x)    18.861615    18.960403    71.284716    26.622226    26.950662   265.02867     5
    

    Looks like the unary plus takes the cake. (And my ifelse() idea kinda sucks.)

提交回复
热议问题