Leave only those rows in matrices which have equal elements at certain column

前端 未结 4 993
一向
一向 2021-01-28 00:56

Let me show an example. Consider we have 3 tables (focus on columns N):

   Table 1         Table 2        Table 3
-------------   -------------   -------------
           


        
相关标签:
4条回答
  • 2021-01-28 01:21

    Use a set intersection to find the common values of N amongst all the tables

    > t1 <-data.frame(N=c(5,10,15),Values=c(1,2,3))
    > t2 <-data.frame(N=c(5,6,10,15),Values=c(-1,-2,-3,-4))
    > t3 <-data.frame(N=c(5,6,10,12,15),Values=c(1,21,5,6,3))
    > common<-intersect(intersect(t1$N,t2$N),t3$N)
    > common
    [1]  5 10 15
    

    Then just subset each table to find the rows with those common values

    > newt1<-t1[t1$N %in% common,]
    > newt2<-t2[t2$N %in% common,]
    > newt3<-t3[t3$N %in% common,]
    > newt3
       N Values
    1  5      1
    3 10      5
    5 15      3
    

    This approach should scale such that you can create a function and pass in a vector of data frames and a column name. It can return a vector of new data frames.

    I've used data frames. The same approach will work with matrices

    0 讨论(0)
  • 2021-01-28 01:23

    I would like to propose a generic approach which works for an arbitrary number of dataframes as well as for multiple id columns.

    The dataframes may have a different structure, i.e., different number and type of columns. The only requirement is that the dataframes share all id columns having the same name and type. In addition, it will detect if there are no common combinations of id values between the dataframes.

    Supposed, we have a list of dataframes dfl and a vector of column names cn which should be check for common value combinations across all dataframes in the list:

    dfl <- list(Table1, Table2, Table3)
    cn <- "N"
    
    library(data.table)
    # determine common combinations of id values
    common <- rbindlist(lapply(dfl, function(x) setDT(x)[, .SD, .SDcols = cn]))[
      , .(.cnt = .N), by = cn][.cnt == length(dfl)][, -".cnt"]
    # stop if there are no column id values
    stopifnot(nrow(common) > 0L)
    # join with all data tables in dfl, keeping only rows which have common id values
    result <- lapply(dfl, function(x) x[common, on = cn, nomatch = 0L])
    
    result
    
    $Table1
        N Values
    1:  5      1
    2: 10      2
    3: 15      3
    
    $Table2
        N Values
    1:  5     -1
    2: 10     -3
    3: 15     -4
    
    $Table3
        N Values
    1:  5      1
    2: 10      5
    3: 15      3
    

    Data

    dfl <- structure(list(Table1 = structure(list(N = c(5L, 10L, 15L), Values = 1:3), .Names = c("N", 
    "Values"), row.names = c(NA, 3L), class = "data.frame"), Table2 = structure(list(
        N = c(5L, 6L, 10L, 15L), Values = c(-1L, -2L, -3L, -4L)), .Names = c("N", 
    "Values"), row.names = c(NA, 4L), class = "data.frame"), Table3 = structure(list(
        N = c(5L, 6L, 10L, 12L, 15L), Values = c(1L, 21L, 5L, 6L, 
        3L)), .Names = c("N", "Values"), row.names = c(NA, 5L), class = "data.frame")), .Names = c("Table1", 
    "Table2", "Table3"))
    

    Example with multiple id columns

    # create sample data: 5 dataframes with 100 rows each and 3 id columns
    set.seed(123L)
    ndf <- 5L
    dfl <- lapply(seq_len(ndf), function(i) {
      nr <- 100L
      nseq <- 1:6
      data.frame(A = sample(LETTERS[nseq], nr, replace = TRUE),
                 b = sample(letters[nseq], nr, replace = TRUE),
                 i = sample(nseq, nr, replace = TRUE),
                 val = sample.int(nr, nr))
      })
    dfl <- setNames(dfl, paste0("df", seq_along(dfl)))
    str(dfl)
    
    List of 5
     $ df1:'data.frame':  100 obs. of  4 variables:
      ..$ A  : Factor w/ 6 levels "A","B","C","D",..: 2 5 3 6 6 1 4 6 4 3 ...
      ..$ b  : Factor w/ 6 levels "a","b","c","d",..: 4 2 3 6 3 6 6 4 3 1 ...
      ..$ i  : int [1:100] 2 6 4 4 3 6 3 2 2 2 ...
      ..$ val: int [1:100] 79 1 77 71 61 46 15 99 42 45 ...
     $ df2:'data.frame':  100 obs. of  4 variables:
      ..$ A  : Factor w/ 6 levels "A","B","C","D",..: 6 1 6 4 3 3 5 1 3 5 ...
      ..$ b  : Factor w/ 6 levels "a","b","c","d",..: 3 3 2 1 3 2 4 4 6 3 ...
      ..$ i  : int [1:100] 2 5 2 2 2 5 1 5 2 3 ...
      ..$ val: int [1:100] 85 26 3 84 33 61 52 36 18 40 ...
     $ df3:'data.frame':  100 obs. of  4 variables:
      ..$ A  : Factor w/ 6 levels "A","B","C","D",..: 3 3 1 1 2 6 3 3 5 5 ...
      ..$ b  : Factor w/ 6 levels "a","b","c","d",..: 6 4 6 4 5 4 5 6 5 1 ...
      ..$ i  : int [1:100] 2 4 1 6 6 3 5 2 1 3 ...
      ..$ val: int [1:100] 81 73 22 99 84 51 57 88 93 61 ...
     $ df4:'data.frame':  100 obs. of  4 variables:
      ..$ A  : Factor w/ 6 levels "A","B","C","D",..: 6 6 3 5 3 6 1 1 5 4 ...
      ..$ b  : Factor w/ 6 levels "a","b","c","d",..: 1 3 4 6 5 4 1 1 5 1 ...
      ..$ i  : int [1:100] 2 2 1 3 2 5 4 6 1 6 ...
      ..$ val: int [1:100] 94 98 45 23 67 53 55 41 40 100 ...
     $ df5:'data.frame':  100 obs. of  4 variables:
      ..$ A  : Factor w/ 6 levels "A","B","C","D",..: 4 1 2 5 5 1 6 1 4 3 ...
      ..$ b  : Factor w/ 6 levels "a","b","c","d",..: 5 1 3 6 6 5 1 4 6 4 ...
      ..$ i  : int [1:100] 1 6 2 5 4 1 6 4 6 4 ...
      ..$ val: int [1:100] 45 28 16 85 54 53 56 68 59 94 ...
    
    # define id columns
    cn <- c("i", "A", "b")
    
    common <- rbindlist(lapply(dfl, function(x) setDT(x)[, .SD, .SDcols = cn]))[
      , .(.cnt = .N), by = cn][.cnt == length(dfl)][, -".cnt"]
    stopifnot(nrow(common) > 0L)
    result <- lapply(dfl, function(x) x[common, on = cn, nomatch = 0L])
    
    str(result)
    
    List of 5
     $ df1:Classes ‘data.table’ and 'data.frame': 10 obs. of  4 variables:
      ..$ A  : Factor w/ 6 levels "A","B","C","D",..: 6 6 6 6 6 6 4 2 1 5
      ..$ b  : Factor w/ 6 levels "a","b","c","d",..: 4 4 4 6 6 3 2 3 4 2
      ..$ i  : int [1:10] 2 2 2 3 3 6 5 6 4 1
      ..$ val: int [1:10] 99 85 4 36 83 70 12 52 53 58
      ..- attr(*, ".internal.selfref")=<externalptr> 
     $ df2:Classes ‘data.table’ and 'data.frame': 11 obs. of  4 variables:
      ..$ A  : Factor w/ 6 levels "A","B","C","D",..: 6 6 4 4 2 1 5 5 4 1 ...
      ..$ b  : Factor w/ 6 levels "a","b","c","d",..: 4 3 2 2 3 4 4 4 1 1 ...
      ..$ i  : int [1:11] 2 6 5 5 6 4 1 1 5 3 ...
      ..$ val: int [1:11] 11 1 58 14 5 71 52 39 81 88 ...
      ..- attr(*, ".internal.selfref")=<externalptr> 
     $ df3:Classes ‘data.table’ and 'data.frame': 14 obs. of  4 variables:
      ..$ A  : Factor w/ 6 levels "A","B","C","D",..: 6 4 2 1 1 5 5 5 5 5 ...
      ..$ b  : Factor w/ 6 levels "a","b","c","d",..: 6 2 3 4 4 2 2 4 4 4 ...
      ..$ i  : int [1:14] 3 5 6 4 4 1 1 1 1 1 ...
      ..$ val: int [1:14] 25 60 18 78 59 26 32 39 77 28 ...
      ..- attr(*, ".internal.selfref")=<externalptr> 
     $ df4:Classes ‘data.table’ and 'data.frame': 14 obs. of  4 variables:
      ..$ A  : Factor w/ 6 levels "A","B","C","D",..: 6 6 6 4 2 2 5 5 4 4 ...
      ..$ b  : Factor w/ 6 levels "a","b","c","d",..: 6 3 3 2 3 3 2 2 1 1 ...
      ..$ i  : int [1:14] 3 6 6 5 6 6 1 1 5 5 ...
      ..$ val: int [1:14] 56 86 34 70 31 12 72 1 5 64 ...
      ..- attr(*, ".internal.selfref")=<externalptr> 
     $ df5:Classes ‘data.table’ and 'data.frame': 6 obs. of  4 variables:
      ..$ A  : Factor w/ 6 levels "A","B","C","D",..: 6 6 6 1 1 2
      ..$ b  : Factor w/ 6 levels "a","b","c","d",..: 4 6 3 4 1 4
      ..$ i  : int [1:6] 2 3 6 4 3 4
      ..$ val: int [1:6] 11 48 1 68 32 46
      ..- attr(*, ".internal.selfref")=<externalptr>
    

    In each dataframe, there are only a few rows left over which share common combinations of id values:

    unlist(lapply(result, nrow))
    
    df1 df2 df3 df4 df5 
     10  11  14  14   6
    
    0 讨论(0)
  • 2021-01-28 01:31

    Here's a more functional way that will work with any list of tables. First we extract all the 'N' columns and then get the intersection of all these values. Then we just filter each of the tables.

    library('tidyverse')
    
    tables <- list(Table1, Table2, Table3)
    
    common <- tables %>%
      map('N') %>%
      reduce(intersect)
    
    tables %>%
      map(filter, N %in% common)
    # [[1]]
    #    N Values
    # 1  5      1
    # 2 10      2
    # 3 15      3
    # 
    # [[2]]
    #    N Values
    # 1  5     -1
    # 2 10     -3
    # 3 15     -4
    # 
    # [[3]]
    #    N Values
    # 1  5      1
    # 2 10      5
    # 3 15      3
    
    0 讨论(0)
  • 2021-01-28 01:37

    Once you find the "common denominator" (here Table1), you could do like this:

    Table2 <- Table2[Table2$N %in% Table1$N,]
    Table3 <- Table3[Table3$N %in% Table1$N,]
    
    0 讨论(0)
提交回复
热议问题