How to zero-normalize a molten dataframe?

问题

Let's say I have this molten data.frame

molten <- data.frame(
  gene = c("a1", "b1", "a1", "b1", "a1", "b1"),
  count = c(3, 4, 5, 2, 6, 7),
  condition = c("A", "A", "B", "B", "C", "C")
)
#   gene count condition
# 1   a1     3         A
# 2   b1     4         A
# 3   a1     5         B
# 4   b1     2         B
# 5   a1     6         C
# 6   b1     7         C

Which looks like this unmolten

molten %>% 
  dcast(gene ~ condition, value.var = "count")
#   gene A B C
# 1   a1 3 5 6
# 2   b1 4 2 7

How can I subtract column A from all the other numeric columns (B and C in this example). I want the final output to be molten, but I don't know if this can be done directly, or if I have to unmelt, subtract, and then melt. The final output should look like this:

#   gene A B C
# 1   a1 0 2 3
# 2   b1 0 -2 3

Update:

I'm also interested in a more complex scenario:

molten <- data.frame(
  gene = c("a1", "b1", "a1", "b1", "a1", "b1"),
  count = c(3, 4, 5, 2, 6, 7),
  condition = c("A", "A", "B", "B", "C", "C"),
  day = c(0, 0, 1, 1, 2, 2)
)

The solution proposed by @eipi10 gives an error:

molten %>% 
  group_by(gene, condition) %>%
  mutate(count = count - count[day == 0])
Error: incompatible size (0), expecting 1 (the group size) or 1

This is my workaround:

x <- list(a1 = 3, b1 = 4)
molten %>% 
  group_by(gene, condition) %>%
  mutate(count = count - x[[gene]])

回答1:

library(dplyr)

molten %>% group_by(gene) %>%
  mutate(count = count - count[condition=="A"])

    gene count condition
  (fctr) (dbl)    (fctr)
1     a1     0         A
2     b1     0         A
3     a1     2         B
4     b1    -2         B
5     a1     3         C
6     b1     3         C

UPDATE: To answer your comment, in your second example, you group by gene and condition. Then you want to subtract the value of count for day==0. But day equals zero only when condition=="A". For condition "B" or "C" there's never a row where day==0. Here's what happens in an example where we do the subsetting ourselves:

m = molten

x = m$count[m$gene=="a1" & m$condition=="B"]

x
[1] 5

y = m$count[m$gene=="a1" & m$condition=="B" & m$condition=="A"]

y
numeric(0)

numeric(0) is a numeric vector of length zero. Since x=5 and y=numeric(0) and we want x - y, we've asked R to return the result of 5 - numeric(0).

5 - numeric(0)

numeric(0)

length(numeric(0))

[1] 0

mutate is expecting the calculation to return a vector of length equal to either the number of rows in the group (1 in this case) or 1, but the length of the returned value was zero, causing the error.

I'm not exactly sure why 5 - numeric(0) returns numeric(0) while, for example, sum(numeric(0), 5) returns 5. Maybe there's a good reason for this, or maybe it's just one of those enchanting quirks that keep R programmers on their toes. In any case, the error is good here, because it helps us realize us that there's actually no value to subtract when condition != "A" and that our code is therefore not doing what we thought it was.

回答2:

require(reshape2)
require(magrittr)

subtract_num <- function(x, colname){
  ind = which(sapply(x, is.numeric))
  x[ind] = sapply(x[ind], subtract, x[colname])
  x
}

molten %>% 
  dcast(gene ~ condition, value.var = "count") %>% 
  subtract_num("A")

Result:

  gene A  B C
1   a1 0  2 3
2   b1 0 -2 3

P.S.: Seams like i understood the desired output very different than @eipi10

来源：https://stackoverflow.com/questions/35349977/how-to-zero-normalize-a-molten-dataframe

标签

dataframe

melt