Understand the `Reduce` function

前端 未结 3 1564
孤独总比滥情好
孤独总比滥情好 2020-12-23 11:34

I have a question about the Reduce function in R. I read its documentation, but I am still confused a bit. So, I have 5 vectors with genes name. For example:



        
相关标签:
3条回答
  • 2020-12-23 12:06

    Reduce takes a binary function and a list of data items and successively applies the function to the list elements in a recursive fashion. For example:

    Reduce(intersect,list(a,b,c))
    

    is the same as

    intersect((intersect(a,b),c)
    

    However, I don't think that construct will help you here as it will only return those elements that are common to all vectors.

    To count the number of vectors that a gene appears in you could do the following:

    vlist <- list(v1,v2,v3,v4,v5)
    addmargins(table(gene=unlist(vlist), vec=rep(paste0("v",1:5),times=sapply(vlist,length))),2,list(Count=function(x) sum(x[x>0])))
           vec
    gene    v1 v2 v3 v4 v5 Count
      geneA  1  1  0  1  0     3
      geneB  1  0  0  0  1     2
      geneC  0  1  0  0  1     2
      geneD  0  0  1  0  0     1
      geneE  0  0  1  1  0     2
    
    0 讨论(0)
  • 2020-12-23 12:12

    A nice way to see what Reduce() is doing is to run it with its argument accumulate=TRUE. When accumulate=TRUE, it will return a vector or list in which each element shows its state after processing the first n elements of the list in x. Here are a couple of examples:

    Reduce(`*`, x=list(5,4,3,2), accumulate=TRUE)
    # [1]   5  20  60 120
    
    i2 <- seq(0,100,by=2)
    i3 <- seq(0,100,by=3)
    i5 <- seq(0,100,by=5)
    Reduce(intersect, x=list(i2,i3,i5), accumulate=TRUE)
    # [[1]]
    #  [1]   0   2   4   6   8  10  12  14  16  18  20  22  24  26  28  30  32  34  36
    # [20]  38  40  42  44  46  48  50  52  54  56  58  60  62  64  66  68  70  72  74
    # [39]  76  78  80  82  84  86  88  90  92  94  96  98 100
    # 
    # [[2]]
    #  [1]  0  6 12 18 24 30 36 42 48 54 60 66 72 78 84 90 96
    # 
    # [[3]]
    # [1]  0 30 60 90
    
    0 讨论(0)
  • 2020-12-23 12:13

    Assuming the input values given at the end of this answer, the expression

    Reduce(intersect,list(a,b,c,d,e))
    ## character(0)
    

    gives the genes that are present in all vectors, not the genes that are present in at least two vectors. It means:

    intersect(intersect(intersect(intersect(a, b), c), d), e)
    ## character(0)
    

    If we want the genes that are in at least two vectors:

    L <- list(a, b, c, d, e)
    u <- unlist(lapply(L, unique)) # or:  Reduce(c, lapply(L, unique))
    
    tab <- table(u)
    names(tab[tab > 1])
    ## [1] "geneA" "geneB" "geneC" "geneE"
    

    or

    sort(unique(u[duplicated(u)]))
    ## [1] "geneA" "geneB" "geneC" "geneE"
    

    Note: We used:

    a <- c("geneA","geneB")
    b <- c("geneA","geneC")
    c <- c("geneD","geneE")
    d <- c("geneA","geneE")
    e <- c("geneB","geneC")
    
    0 讨论(0)
提交回复
热议问题