问题
I have 2 matrices, I want to compare them (row.name wise) to find the difference.
> head(N1)
Total_Degree Transitivity Betweenness Closeness_All
2410016O06RIK 1 NaN 0.00000 0.0003124024
AGO1 4 0.1666667 37.00000 0.0003133814
APEX1 4 0.6666667 4.00000 0.0003144654
ATR 4 0.1666667 19.50000 0.0003128911
CASP3 24 0.0000000 806.00000 0.0002980626
CCND2 4 0.3333333 97.33333 0.0003132832
head(N2)
Total_Degree Transitivity Betweenness Closeness_All
2410016O06RIK 1 NaN 0.0 2.279982e-04
ADI1 1 NaN 0.0 1.728877e-05
AGO1 3 0.0000000 40.0 2.284670e-04
AIRN 1 NaN 0.0 1.721733e-05
APEX1 3 0.6666667 2.0 2.288330e-04
ATR 3 0.3333333 19.5 2.281542e-04
Many of the rows.name in N1 do exist in N2, I want to compare them and write the difference in a new matrix. Those which are unique to N1 or N2 should be mentioned that they either belong to N1 or N2.
I am not sure which is the best criteria to calculate the difference, what I can think of, is a simple addition of all values of a row in N1 and subtract that value from additive value of corresponding row in N2.
For example output should be:
> head(Compared)
Comparison Unique
2410016O06RIK 0.0002 Common
AGO1 -1.83 Common
APEX1 2.24 Common
ATR 0.0034 Common
CASP3 830.00029 N1
ADI1 1.0007288 N2
Here for row.name = 2410016O06RIK
, all values from N1 and N2 were added and then N1-N2
was written in Comparison
column, as this row was common in both matrices so common
was written in Unique
column.
回答1:
A way to go in base R, with rowSums
and merge
:
If N1
and N2
are data.frames:
# compute the row sums and merge N1 and N2
N1$rs <- rowSums(N1, na.rm=TRUE)
N2$rs <- rowSums(N2, na.rm=TRUE)
comp <- merge(N1[, "rs", drop=FALSE], N2[, "rs", drop=FALSE], by="row.names", all=TRUE)
# then compare the row sums and the variable "locations"
comp$Unique <- with(comp, c("N1", "N2", "common")[(!is.na(rs.x)) + 2*(!is.na(rs.y))])
comp$Comparison <- with(comp, rs.x-rs.y)
# keep only the variable you need:
comp <- comp[, c(1, 5, 4)]
If N1
and N2
are matrices:
# compute the row sums and merge N1 and N2
rs1 <- rowSums(N1, na.rm=TRUE)
rs2 <- rowSums(N2, na.rm=TRUE)
comp <- merge(N1, N2, by="row.names", all=TRUE)
# then compare the row sums and the variable "locations"
comp$Unique <- with(comp, c("N1", "N2", "common")[as.numeric(!is.na(Total_Degree.x)) + 2*as.numeric(!is.na(Total_Degree.y))])
comp$Comparison <- with(merge(as.data.frame(rs1), as.data.frame(rs2), all=TRUE, by="row.names"), rs1-rs2)
# keep only the variable you need:
comp <- comp[, c("Row.names", "Comparison", "Unique")]
output of both methods:
comp
# Row.names Comparison Unique
#1 2410016O06RIK 0.0000844042 common
#2 ADI1 NA N2
#3 AGO1 -1.8332483856 common
#4 AIRN NA N2
#5 APEX1 3.0000856324 common
#6 ATR 0.8334181369 common
#7 CASP3 NA N1
#8 CCND2 NA N1
回答2:
That is a part of the solution, in res
you have a data.table
to work with for the difference part:
require(data.table)
require(dplyr)
set.seed(2016)
dt1 <- data.table(V1 = c("a", "b", "c", "d"), V2 = rnorm(4))
dt2 <- data.table(V1 = c("c", "d", "e", "f"), V2 = rnorm(4))
# common <- merge(dt1, dt2, by = "V1")[, Unique := "Common"]
# unique1 <- dt1[V1 %nin% dt2[, V1], ][, Unique := "N1"]
# unique2 <- dt2[V1 %nin% dt1[, V1], ][, Unique := "N2"]
# res <- rbind(common, unique1, unique2, fill = TRUE)
Small update after @Cath answer, just for clarity.
allMerged <- merge(dt1, dt2, by = "V1", all = TRUE) %>%
.[, RowSum := rowSums(.SD, na.rm = TRUE), .SDcols = grep("V2", names(.))] %>%
.[, Unique := ((is.na(V2.x) + 2*is.na(V2.y)))]
print(allMerged)
来源:https://stackoverflow.com/questions/36420909/compare-matrices-to-find-the-differences