Compare matrices to find the differences

自古美人都是妖i 提交于 2019-12-18 09:45:03

问题


I have 2 matrices, I want to compare them (row.name wise) to find the difference.

> head(N1)
              Total_Degree Transitivity Betweenness Closeness_All
2410016O06RIK            1          NaN     0.00000  0.0003124024
AGO1                     4    0.1666667    37.00000  0.0003133814
APEX1                    4    0.6666667     4.00000  0.0003144654
ATR                      4    0.1666667    19.50000  0.0003128911
CASP3                   24    0.0000000   806.00000  0.0002980626
CCND2                    4    0.3333333    97.33333  0.0003132832

head(N2)
              Total_Degree Transitivity Betweenness Closeness_All
2410016O06RIK            1          NaN         0.0  2.279982e-04
ADI1                     1          NaN         0.0  1.728877e-05
AGO1                     3    0.0000000        40.0  2.284670e-04
AIRN                     1          NaN         0.0  1.721733e-05
APEX1                    3    0.6666667         2.0  2.288330e-04
ATR                      3    0.3333333        19.5  2.281542e-04

Many of the rows.name in N1 do exist in N2, I want to compare them and write the difference in a new matrix. Those which are unique to N1 or N2 should be mentioned that they either belong to N1 or N2.

I am not sure which is the best criteria to calculate the difference, what I can think of, is a simple addition of all values of a row in N1 and subtract that value from additive value of corresponding row in N2.

For example output should be:

> head(Compared)
                       Comparison Unique 
    2410016O06RIK        0.0002     Common
    AGO1                 -1.83      Common
    APEX1                 2.24      Common
    ATR                  0.0034     Common
    CASP3               830.00029   N1
    ADI1                1.0007288   N2

Here for row.name = 2410016O06RIK, all values from N1 and N2 were added and then N1-N2 was written in Comparison column, as this row was common in both matrices so common was written in Unique column.


回答1:


A way to go in base R, with rowSums and merge:

If N1 and N2 are data.frames:

# compute the row sums and merge N1 and N2
N1$rs <- rowSums(N1, na.rm=TRUE)
N2$rs <- rowSums(N2, na.rm=TRUE)
comp <- merge(N1[, "rs", drop=FALSE], N2[, "rs", drop=FALSE], by="row.names", all=TRUE)

# then compare the row sums and the variable "locations"
comp$Unique <- with(comp, c("N1", "N2", "common")[(!is.na(rs.x)) + 2*(!is.na(rs.y))])
comp$Comparison <- with(comp, rs.x-rs.y)

# keep only the variable you need:
comp <- comp[, c(1, 5, 4)]

If N1 and N2 are matrices:

# compute the row sums and merge N1 and N2
rs1 <- rowSums(N1, na.rm=TRUE)
rs2 <- rowSums(N2, na.rm=TRUE)
comp <- merge(N1, N2, by="row.names", all=TRUE)

# then compare the row sums and the variable "locations"
comp$Unique <- with(comp, c("N1", "N2", "common")[as.numeric(!is.na(Total_Degree.x)) + 2*as.numeric(!is.na(Total_Degree.y))])
comp$Comparison <- with(merge(as.data.frame(rs1), as.data.frame(rs2), all=TRUE, by="row.names"), rs1-rs2)

# keep only the variable you need:
comp <- comp[, c("Row.names", "Comparison", "Unique")]

output of both methods:

comp
#      Row.names    Comparison Unique
#1 2410016O06RIK  0.0000844042 common
#2          ADI1            NA     N2
#3          AGO1 -1.8332483856 common
#4          AIRN            NA     N2
#5         APEX1  3.0000856324 common
#6           ATR  0.8334181369 common
#7         CASP3            NA     N1
#8         CCND2            NA     N1



回答2:


That is a part of the solution, in res you have a data.table to work with for the difference part:

require(data.table)
require(dplyr)

set.seed(2016)
dt1 <- data.table(V1 = c("a", "b", "c", "d"), V2 = rnorm(4))
dt2 <- data.table(V1 = c("c", "d", "e", "f"), V2 = rnorm(4))

# common <- merge(dt1, dt2, by = "V1")[, Unique := "Common"]
# unique1 <- dt1[V1 %nin% dt2[, V1], ][, Unique := "N1"]
# unique2 <- dt2[V1 %nin% dt1[, V1], ][, Unique := "N2"]
# res <- rbind(common, unique1, unique2, fill = TRUE)

Small update after @Cath answer, just for clarity.

allMerged <- merge(dt1, dt2, by = "V1", all = TRUE) %>%
  .[, RowSum := rowSums(.SD, na.rm = TRUE), .SDcols = grep("V2", names(.))] %>%
  .[, Unique := ((is.na(V2.x) + 2*is.na(V2.y)))]

print(allMerged)


来源:https://stackoverflow.com/questions/36420909/compare-matrices-to-find-the-differences

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!