R filtering out a subset

前端 未结 6 1348
误落风尘
误落风尘 2021-01-29 13:30

I have a data.frame A and a data.frame B which contains a subset of A

How can I create a data.frame C which is data.frame A with data.frame B excluded? Thanks for your h

6条回答
  •  -上瘾入骨i
    2021-01-29 14:19

    Here are two data.table solutions that will be memory and time efficient

    render_markdown(strict = T)
    library(data.table)
    # some biggish data
    set.seed(1234)
    ADT <- data.table(x = seq.int(1e+07), y = seq.int(1e+07))
    
    .rows <- sample(nrow(ADT), 30000)
    # Random subset of A in B
    BDT <- ADT[.rows, ]
    
    # set keys for fast merge
    setkey(ADT, x)
    setkey(BDT, x)
    ## how CDT <- ADT[-ADT[BDT,which=T]] the data as `data.frames for fastest
    ## alternative
    A <- copy(ADT)
    setattr(A, "class", "data.frame")
    B <- copy(BDT)
    setattr(B, "class", "data.frame")
    f2 <- function() noBDT <- ADT[-ADT[BDT, which = T]]
    f3 <- function() noBDT2 <- ADT[-BDT[, x]]
    f1 <- function() noB <- A[-as.integer(rownames(B)), ]
    
    library(rbenchmark)
    benchmark(base = f1(),DT = f2(), DT2 = f3(), replications = 3)
    
    ##   test replications elapsed relative user.self sys.self 
    ## 2   DT            3    0.92    1.108      0.77     0.15       
    ## 1  base           3    3.72    4.482      3.19     0.52        
    ## 3  DT2            3    0.83    1.000      0.72     0.11     
    

提交回复
热议问题