Leave only those rows in matrices which have equal elements at certain column

前端未结

关注

 4  1013

Let me show an example. Consider we have 3 tables (focus on columns N):

   Table 1         Table 2        Table 3
-------------   -------------   -------------


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  予麋鹿        
                
              
                            
                2021-01-28 01:21
              
            
            
                                                                       
Use a set intersection to find the common values of N amongst all the tables

> t1 <-data.frame(N=c(5,10,15),Values=c(1,2,3))
> t2 <-data.frame(N=c(5,6,10,15),Values=c(-1,-2,-3,-4))
> t3 <-data.frame(N=c(5,6,10,12,15),Values=c(1,21,5,6,3))
> common<-intersect(intersect(t1$N,t2$N),t3$N)
> common
[1]  5 10 15


Then just subset each table to find the rows with those common values

> newt1<-t1[t1$N %in% common,]
> newt2<-t2[t2$N %in% common,]
> newt3<-t3[t3$N %in% common,]
> newt3
   N Values
1  5      1
3 10      5
5 15      3


This approach should scale such that you can create a function and pass in a vector of data frames and a column name. It can return a vector of new data frames.

I've used data frames. The same approach will work with matrices
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  轻奢々        
                
              
                            
                2021-01-28 01:23
              
            
            
                                                                       
I would like to propose a generic approach which works for an arbitrary number of dataframes as well as for multiple id columns. 

The dataframes may have a different structure, i.e., different number and type of columns. The only requirement is that the dataframes share all id columns having the same name and type. In addition, it will detect if there are no common combinations of id values between the dataframes.

Supposed, we have a list of dataframes dfl and a vector of column names cn which should be check for common value combinations across all dataframes in the list:

dfl <- list(Table1, Table2, Table3)
cn <- "N"

library(data.table)
# determine common combinations of id values
common <- rbindlist(lapply(dfl, function(x) setDT(x)[, .SD, .SDcols = cn]))[
  , .(.cnt = .N), by = cn][.cnt == length(dfl)][, -".cnt"]
# stop if there are no column id values
stopifnot(nrow(common) > 0L)
# join with all data tables in dfl, keeping only rows which have common id values
result <- lapply(dfl, function(x) x[common, on = cn, nomatch = 0L])

result



$Table1
    N Values
1:  5      1
2: 10      2
3: 15      3

$Table2
    N Values
1:  5     -1
2: 10     -3
3: 15     -4

$Table3
    N Values
1:  5      1
2: 10      5
3: 15      3



Data

dfl <- structure(list(Table1 = structure(list(N = c(5L, 10L, 15L), Values = 1:3), .Names = c("N", 
"Values"), row.names = c(NA, 3L), class = "data.frame"), Table2 = structure(list(
    N = c(5L, 6L, 10L, 15L), Values = c(-1L, -2L, -3L, -4L)), .Names = c("N", 
"Values"), row.names = c(NA, 4L), class = "data.frame"), Table3 = structure(list(
    N = c(5L, 6L, 10L, 12L, 15L), Values = c(1L, 21L, 5L, 6L, 
    3L)), .Names = c("N", "Values"), row.names = c(NA, 5L), class = "data.frame")), .Names = c("Table1", 
"Table2", "Table3"))


Example with multiple id columns

# create sample data: 5 dataframes with 100 rows each and 3 id columns
set.seed(123L)
ndf <- 5L
dfl <- lapply(seq_len(ndf), function(i) {
  nr <- 100L
  nseq <- 1:6
  data.frame(A = sample(LETTERS[nseq], nr, replace = TRUE),
             b = sample(letters[nseq], nr, replace = TRUE),
             i = sample(nseq, nr, replace = TRUE),
             val = sample.int(nr, nr))
  })
dfl <- setNames(dfl, paste0("df", seq_along(dfl)))
str(dfl)



List of 5
 $ df1:'data.frame':  100 obs. of  4 variables:
  ..$ A  : Factor w/ 6 levels "A","B","C","D",..: 2 5 3 6 6 1 4 6 4 3 ...
  ..$ b  : Factor w/ 6 levels "a","b","c","d",..: 4 2 3 6 3 6 6 4 3 1 ...
  ..$ i  : int [1:100] 2 6 4 4 3 6 3 2 2 2 ...
  ..$ val: int [1:100] 79 1 77 71 61 46 15 99 42 45 ...
 $ df2:'data.frame':  100 obs. of  4 variables:
  ..$ A  : Factor w/ 6 levels "A","B","C","D",..: 6 1 6 4 3 3 5 1 3 5 ...
  ..$ b  : Factor w/ 6 levels "a","b","c","d",..: 3 3 2 1 3 2 4 4 6 3 ...
  ..$ i  : int [1:100] 2 5 2 2 2 5 1 5 2 3 ...
  ..$ val: int [1:100] 85 26 3 84 33 61 52 36 18 40 ...
 $ df3:'data.frame':  100 obs. of  4 variables:
  ..$ A  : Factor w/ 6 levels "A","B","C","D",..: 3 3 1 1 2 6 3 3 5 5 ...
  ..$ b  : Factor w/ 6 levels "a","b","c","d",..: 6 4 6 4 5 4 5 6 5 1 ...
  ..$ i  : int [1:100] 2 4 1 6 6 3 5 2 1 3 ...
  ..$ val: int [1:100] 81 73 22 99 84 51 57 88 93 61 ...
 $ df4:'data.frame':  100 obs. of  4 variables:
  ..$ A  : Factor w/ 6 levels "A","B","C","D",..: 6 6 3 5 3 6 1 1 5 4 ...
  ..$ b  : Factor w/ 6 levels "a","b","c","d",..: 1 3 4 6 5 4 1 1 5 1 ...
  ..$ i  : int [1:100] 2 2 1 3 2 5 4 6 1 6 ...
  ..$ val: int [1:100] 94 98 45 23 67 53 55 41 40 100 ...
 $ df5:'data.frame':  100 obs. of  4 variables:
  ..$ A  : Factor w/ 6 levels "A","B","C","D",..: 4 1 2 5 5 1 6 1 4 3 ...
  ..$ b  : Factor w/ 6 levels "a","b","c","d",..: 5 1 3 6 6 5 1 4 6 4 ...
  ..$ i  : int [1:100] 1 6 2 5 4 1 6 4 6 4 ...
  ..$ val: int [1:100] 45 28 16 85 54 53 56 68 59 94 ...



# define id columns
cn <- c("i", "A", "b")

common <- rbindlist(lapply(dfl, function(x) setDT(x)[, .SD, .SDcols = cn]))[
  , .(.cnt = .N), by = cn][.cnt == length(dfl)][, -".cnt"]
stopifnot(nrow(common) > 0L)
result <- lapply(dfl, function(x) x[common, on = cn, nomatch = 0L])

str(result)



List of 5
 $ df1:Classes ‘data.table’ and 'data.frame': 10 obs. of  4 variables:
  ..$ A  : Factor w/ 6 levels "A","B","C","D",..: 6 6 6 6 6 6 4 2 1 5
  ..$ b  : Factor w/ 6 levels "a","b","c","d",..: 4 4 4 6 6 3 2 3 4 2
  ..$ i  : int [1:10] 2 2 2 3 3 6 5 6 4 1
  ..$ val: int [1:10] 99 85 4 36 83 70 12 52 53 58
  ..- attr(*, ".internal.selfref")=<externalptr> 
 $ df2:Classes ‘data.table’ and 'data.frame': 11 obs. of  4 variables:
  ..$ A  : Factor w/ 6 levels "A","B","C","D",..: 6 6 4 4 2 1 5 5 4 1 ...
  ..$ b  : Factor w/ 6 levels "a","b","c","d",..: 4 3 2 2 3 4 4 4 1 1 ...
  ..$ i  : int [1:11] 2 6 5 5 6 4 1 1 5 3 ...
  ..$ val: int [1:11] 11 1 58 14 5 71 52 39 81 88 ...
  ..- attr(*, ".internal.selfref")=<externalptr> 
 $ df3:Classes ‘data.table’ and 'data.frame': 14 obs. of  4 variables:
  ..$ A  : Factor w/ 6 levels "A","B","C","D",..: 6 4 2 1 1 5 5 5 5 5 ...
  ..$ b  : Factor w/ 6 levels "a","b","c","d",..: 6 2 3 4 4 2 2 4 4 4 ...
  ..$ i  : int [1:14] 3 5 6 4 4 1 1 1 1 1 ...
  ..$ val: int [1:14] 25 60 18 78 59 26 32 39 77 28 ...
  ..- attr(*, ".internal.selfref")=<externalptr> 
 $ df4:Classes ‘data.table’ and 'data.frame': 14 obs. of  4 variables:
  ..$ A  : Factor w/ 6 levels "A","B","C","D",..: 6 6 6 4 2 2 5 5 4 4 ...
  ..$ b  : Factor w/ 6 levels "a","b","c","d",..: 6 3 3 2 3 3 2 2 1 1 ...
  ..$ i  : int [1:14] 3 6 6 5 6 6 1 1 5 5 ...
  ..$ val: int [1:14] 56 86 34 70 31 12 72 1 5 64 ...
  ..- attr(*, ".internal.selfref")=<externalptr> 
 $ df5:Classes ‘data.table’ and 'data.frame': 6 obs. of  4 variables:
  ..$ A  : Factor w/ 6 levels "A","B","C","D",..: 6 6 6 1 1 2
  ..$ b  : Factor w/ 6 levels "a","b","c","d",..: 4 6 3 4 1 4
  ..$ i  : int [1:6] 2 3 6 4 3 4
  ..$ val: int [1:6] 11 48 1 68 32 46
  ..- attr(*, ".internal.selfref")=<externalptr>



In each dataframe, there are only a few rows left over which share common combinations of id values:

unlist(lapply(result, nrow))



df1 df2 df3 df4 df5 
 10  11  14  14   6


                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  盖世英雄少女心        
                
              
                            
                2021-01-28 01:31
              
            
            
                                                                       
Here's a more functional way that will work with any list of tables. First we extract all the 'N' columns and then get the intersection of all these values. Then we just filter each of the tables.

library('tidyverse')

tables <- list(Table1, Table2, Table3)

common <- tables %>%
  map('N') %>%
  reduce(intersect)

tables %>%
  map(filter, N %in% common)
# [[1]]
#    N Values
# 1  5      1
# 2 10      2
# 3 15      3
# 
# [[2]]
#    N Values
# 1  5     -1
# 2 10     -3
# 3 15     -4
# 
# [[3]]
#    N Values
# 1  5      1
# 2 10      5
# 3 15      3

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  梦谈多话        
                
              
                            
                2021-01-28 01:37
              
            
            
                                                                       
Once you find the "common denominator" (here Table1), you could do like this:

Table2 <- Table2[Table2$N %in% Table1$N,]
Table3 <- Table3[Table3$N %in% Table1$N,]

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复