Let me show an example. Consider we have 3 tables (focus on columns N):
Table 1 Table 2 Table 3
------------- ------------- -------------
Use a set intersection to find the common values of N amongst all the tables
> t1 <-data.frame(N=c(5,10,15),Values=c(1,2,3))
> t2 <-data.frame(N=c(5,6,10,15),Values=c(-1,-2,-3,-4))
> t3 <-data.frame(N=c(5,6,10,12,15),Values=c(1,21,5,6,3))
> common<-intersect(intersect(t1$N,t2$N),t3$N)
> common
[1] 5 10 15
Then just subset each table to find the rows with those common values
> newt1<-t1[t1$N %in% common,]
> newt2<-t2[t2$N %in% common,]
> newt3<-t3[t3$N %in% common,]
> newt3
N Values
1 5 1
3 10 5
5 15 3
This approach should scale such that you can create a function and pass in a vector of data frames and a column name. It can return a vector of new data frames.
I've used data frames. The same approach will work with matrices
I would like to propose a generic approach which works for an arbitrary number of dataframes as well as for multiple id columns.
The dataframes may have a different structure, i.e., different number and type of columns. The only requirement is that the dataframes share all id columns having the same name and type. In addition, it will detect if there are no common combinations of id values between the dataframes.
Supposed, we have a list of dataframes dfl
and a vector of column names cn
which should be check for common value combinations across all dataframes in the list:
dfl <- list(Table1, Table2, Table3)
cn <- "N"
library(data.table)
# determine common combinations of id values
common <- rbindlist(lapply(dfl, function(x) setDT(x)[, .SD, .SDcols = cn]))[
, .(.cnt = .N), by = cn][.cnt == length(dfl)][, -".cnt"]
# stop if there are no column id values
stopifnot(nrow(common) > 0L)
# join with all data tables in dfl, keeping only rows which have common id values
result <- lapply(dfl, function(x) x[common, on = cn, nomatch = 0L])
result
$Table1 N Values 1: 5 1 2: 10 2 3: 15 3 $Table2 N Values 1: 5 -1 2: 10 -3 3: 15 -4 $Table3 N Values 1: 5 1 2: 10 5 3: 15 3
dfl <- structure(list(Table1 = structure(list(N = c(5L, 10L, 15L), Values = 1:3), .Names = c("N",
"Values"), row.names = c(NA, 3L), class = "data.frame"), Table2 = structure(list(
N = c(5L, 6L, 10L, 15L), Values = c(-1L, -2L, -3L, -4L)), .Names = c("N",
"Values"), row.names = c(NA, 4L), class = "data.frame"), Table3 = structure(list(
N = c(5L, 6L, 10L, 12L, 15L), Values = c(1L, 21L, 5L, 6L,
3L)), .Names = c("N", "Values"), row.names = c(NA, 5L), class = "data.frame")), .Names = c("Table1",
"Table2", "Table3"))
# create sample data: 5 dataframes with 100 rows each and 3 id columns
set.seed(123L)
ndf <- 5L
dfl <- lapply(seq_len(ndf), function(i) {
nr <- 100L
nseq <- 1:6
data.frame(A = sample(LETTERS[nseq], nr, replace = TRUE),
b = sample(letters[nseq], nr, replace = TRUE),
i = sample(nseq, nr, replace = TRUE),
val = sample.int(nr, nr))
})
dfl <- setNames(dfl, paste0("df", seq_along(dfl)))
str(dfl)
List of 5 $ df1:'data.frame': 100 obs. of 4 variables: ..$ A : Factor w/ 6 levels "A","B","C","D",..: 2 5 3 6 6 1 4 6 4 3 ... ..$ b : Factor w/ 6 levels "a","b","c","d",..: 4 2 3 6 3 6 6 4 3 1 ... ..$ i : int [1:100] 2 6 4 4 3 6 3 2 2 2 ... ..$ val: int [1:100] 79 1 77 71 61 46 15 99 42 45 ... $ df2:'data.frame': 100 obs. of 4 variables: ..$ A : Factor w/ 6 levels "A","B","C","D",..: 6 1 6 4 3 3 5 1 3 5 ... ..$ b : Factor w/ 6 levels "a","b","c","d",..: 3 3 2 1 3 2 4 4 6 3 ... ..$ i : int [1:100] 2 5 2 2 2 5 1 5 2 3 ... ..$ val: int [1:100] 85 26 3 84 33 61 52 36 18 40 ... $ df3:'data.frame': 100 obs. of 4 variables: ..$ A : Factor w/ 6 levels "A","B","C","D",..: 3 3 1 1 2 6 3 3 5 5 ... ..$ b : Factor w/ 6 levels "a","b","c","d",..: 6 4 6 4 5 4 5 6 5 1 ... ..$ i : int [1:100] 2 4 1 6 6 3 5 2 1 3 ... ..$ val: int [1:100] 81 73 22 99 84 51 57 88 93 61 ... $ df4:'data.frame': 100 obs. of 4 variables: ..$ A : Factor w/ 6 levels "A","B","C","D",..: 6 6 3 5 3 6 1 1 5 4 ... ..$ b : Factor w/ 6 levels "a","b","c","d",..: 1 3 4 6 5 4 1 1 5 1 ... ..$ i : int [1:100] 2 2 1 3 2 5 4 6 1 6 ... ..$ val: int [1:100] 94 98 45 23 67 53 55 41 40 100 ... $ df5:'data.frame': 100 obs. of 4 variables: ..$ A : Factor w/ 6 levels "A","B","C","D",..: 4 1 2 5 5 1 6 1 4 3 ... ..$ b : Factor w/ 6 levels "a","b","c","d",..: 5 1 3 6 6 5 1 4 6 4 ... ..$ i : int [1:100] 1 6 2 5 4 1 6 4 6 4 ... ..$ val: int [1:100] 45 28 16 85 54 53 56 68 59 94 ...
# define id columns
cn <- c("i", "A", "b")
common <- rbindlist(lapply(dfl, function(x) setDT(x)[, .SD, .SDcols = cn]))[
, .(.cnt = .N), by = cn][.cnt == length(dfl)][, -".cnt"]
stopifnot(nrow(common) > 0L)
result <- lapply(dfl, function(x) x[common, on = cn, nomatch = 0L])
str(result)
List of 5 $ df1:Classes ‘data.table’ and 'data.frame': 10 obs. of 4 variables: ..$ A : Factor w/ 6 levels "A","B","C","D",..: 6 6 6 6 6 6 4 2 1 5 ..$ b : Factor w/ 6 levels "a","b","c","d",..: 4 4 4 6 6 3 2 3 4 2 ..$ i : int [1:10] 2 2 2 3 3 6 5 6 4 1 ..$ val: int [1:10] 99 85 4 36 83 70 12 52 53 58 ..- attr(*, ".internal.selfref")=<externalptr> $ df2:Classes ‘data.table’ and 'data.frame': 11 obs. of 4 variables: ..$ A : Factor w/ 6 levels "A","B","C","D",..: 6 6 4 4 2 1 5 5 4 1 ... ..$ b : Factor w/ 6 levels "a","b","c","d",..: 4 3 2 2 3 4 4 4 1 1 ... ..$ i : int [1:11] 2 6 5 5 6 4 1 1 5 3 ... ..$ val: int [1:11] 11 1 58 14 5 71 52 39 81 88 ... ..- attr(*, ".internal.selfref")=<externalptr> $ df3:Classes ‘data.table’ and 'data.frame': 14 obs. of 4 variables: ..$ A : Factor w/ 6 levels "A","B","C","D",..: 6 4 2 1 1 5 5 5 5 5 ... ..$ b : Factor w/ 6 levels "a","b","c","d",..: 6 2 3 4 4 2 2 4 4 4 ... ..$ i : int [1:14] 3 5 6 4 4 1 1 1 1 1 ... ..$ val: int [1:14] 25 60 18 78 59 26 32 39 77 28 ... ..- attr(*, ".internal.selfref")=<externalptr> $ df4:Classes ‘data.table’ and 'data.frame': 14 obs. of 4 variables: ..$ A : Factor w/ 6 levels "A","B","C","D",..: 6 6 6 4 2 2 5 5 4 4 ... ..$ b : Factor w/ 6 levels "a","b","c","d",..: 6 3 3 2 3 3 2 2 1 1 ... ..$ i : int [1:14] 3 6 6 5 6 6 1 1 5 5 ... ..$ val: int [1:14] 56 86 34 70 31 12 72 1 5 64 ... ..- attr(*, ".internal.selfref")=<externalptr> $ df5:Classes ‘data.table’ and 'data.frame': 6 obs. of 4 variables: ..$ A : Factor w/ 6 levels "A","B","C","D",..: 6 6 6 1 1 2 ..$ b : Factor w/ 6 levels "a","b","c","d",..: 4 6 3 4 1 4 ..$ i : int [1:6] 2 3 6 4 3 4 ..$ val: int [1:6] 11 48 1 68 32 46 ..- attr(*, ".internal.selfref")=<externalptr>
In each dataframe, there are only a few rows left over which share common combinations of id values:
unlist(lapply(result, nrow))
df1 df2 df3 df4 df5 10 11 14 14 6
Here's a more functional way that will work with any list of tables. First we extract all the 'N' columns and then get the intersection of all these values. Then we just filter each of the tables.
library('tidyverse')
tables <- list(Table1, Table2, Table3)
common <- tables %>%
map('N') %>%
reduce(intersect)
tables %>%
map(filter, N %in% common)
# [[1]]
# N Values
# 1 5 1
# 2 10 2
# 3 15 3
#
# [[2]]
# N Values
# 1 5 -1
# 2 10 -3
# 3 15 -4
#
# [[3]]
# N Values
# 1 5 1
# 2 10 5
# 3 15 3
Once you find the "common denominator" (here Table1), you could do like this:
Table2 <- Table2[Table2$N %in% Table1$N,]
Table3 <- Table3[Table3$N %in% Table1$N,]