Returning above and below rows of specific rows in r dataframe

后端 未结 4 677
清酒与你
清酒与你 2020-12-09 23:02

Consider any dataframe

            col1   col2    col3   col4
row.name11    A     23      x       y
row.name12    A     29      x       y
row.name13    B            


        
相关标签:
4条回答
  • I would simply proceed as follow:

    dat[(grep("row.name12",row.names(dat))-4):(grep("row.name13",row.names(dat))+4),]
    

    grep("row.name12",row.names(dat)) gives you the row number that have "row.name12" as name, so

    (grep("row.name12",row.names(dat))-4):(grep("row.name13",row.names(dat))+4)
    

    gives you a serie of row numbers ranging from the 4th row preceding the row named "row.name12" to the 4th row after the one named "row.name13".

    0 讨论(0)
  • 2020-12-09 23:12

    Create some sample data:

    > dat=data.frame(col1=letters,col2=sample(26),col3=sample(letters))
    > dat
       col1 col2 col3
    1     a   26    x
    2     b   12    i
    3     c   15    v
    ...
    

    Set our target vector (note I choose an edge case and overlapping cases), and find matching rows:

    > target=c("a","e","g","s")
    > match = which(dat$col1 %in% target)
    

    Create sequences from -2 to +2 of the matches (adjust for your needs) and merge:

    > getThese = unique(as.vector(mapply(seq,match-2,match+2)))
    > getThese
     [1] -1  0  1  2  3  4  5  6  7  8  9 17 18 19 20 21
    

    Fix the edge cases:

    > getThese = getThese[getThese > 0 & getThese <= nrow(dat)]
    > dat[getThese,]
       col1 col2 col3
    1     a   26    x
    2     b   12    i
    3     c   15    v
    4     d   22    d
    5     e    2    j
    6     f    9    l
    7     g    1    w
    8     h   21    n
    9     i   17    p
    17    q   18    a
    18    r   10    m
    19    s   24    o
    20    t   13    e
    21    u    3    k
    > 
    

    Remember our targets were a, e, g and s. You've now got those plus two rows above and two rows below for each, with no duplicates.

    If you are using row names, just create 'match' from those. I was using a column.

    I'd write a bunch more tests using the testthat package if this were my problem.

    0 讨论(0)
  • 2020-12-09 23:32

    Perhaps a combination of which() and %in% would help you:

    dat[which(rownames(dat) %in% c("row.name13")) + c(-1, 1), ]
    #            col1 col2 col3 col4
    # row.name12    A   29    x    y
    # row.name14    A   77    x    y
    

    In the above, we are trying to identify which row names in "dat" are "row.name13" (using which()), and the + c(-1, 1) tells R to return the row before and the row after. If you wanted to include the row, you could do something like + c(-1:1).

    To get the range of rows, switch the comma to a colon:

    dat[which(rownames(dat) %in% c("row.name13")) + c(-1:1), ]
    #            col1 col2 col3 col4
    # row.name12    A   29    x    y
    # row.name13    B   17    x    y
    # row.name14    A   77    x    y
    

    Update

    Matching a list is a little bit trickier, but without thinking about it too much, here is a possibility:

    myRows <- c("row.name12", "row.name13")
    rowRanges <- lapply(which(rownames(dat) %in% myRows), function(x) x + c(-1:1))
    # [[1]]
    # [1] 1 2 3
    # 
    # [[2]]
    # [1] 2 3 4
    #
    lapply(rowRanges, function(x) dat[x, ])
    # [[1]]
    #            col1 col2 col3 col4
    # row.name11    A   23    x    y
    # row.name12    A   29    x    y
    # row.name13    B   17    x    y
    # 
    # [[2]]
    #            col1 col2 col3 col4
    # row.name12    A   29    x    y
    # row.name13    B   17    x    y
    # row.name14    A   77    x    y
    

    This outputs a list of data.frames which might be handy since you might have duplicated rows (as there are in this example).

    Update 2: Using grep if it is more appropriate

    Here is a variation of your question, one which would be less convenient to solve using the which()...%in% approach.

    set.seed(1)
    dat1 <- data.frame(ID = 1:25, V1 = sample(100, 25, replace = TRUE))
    rownames(dat1) <- paste("rowname", sample(apply(combn(LETTERS[1:4], 2), 
                                                   2, paste, collapse = ""), 
                                             25, replace = TRUE), 
                           sprintf("%02d", 1:25), sep = ".")
    head(dat1)
    #               ID V1
    # rowname.AD.01  1 27
    # rowname.AB.02  2 38
    # rowname.AD.03  3 58
    # rowname.CD.04  4 91
    # rowname.AD.05  5 21
    # rowname.AD.06  6 90
    

    Now, imagine you wanted to identify the rows with AB and AC, but you don't have a list of the numeric suffixes.

    Here's a little function that can be used in such a scenario. It borrows a little from @Spacedman to make sure that the rows returned are within the range of the data (as per @flodel's suggestion).

    getMyRows <- function(data, matches, range) {
      rowMatches = lapply(unlist(lapply(matches, function(x)
        grep(x, rownames(data)))), function(y) y + range)
      rowMatches = lapply(rowMatches, function(x) x[x > 0 & x <= nrow(data)])
      lapply(rowMatches, function(x) data[x, ])
    }
    

    You can use it as follows (but I won't print the results here). First, specify the dataset, then the pattern(s) you want matched, then the range (in this example, three rows before and four rows after).

    getMyRows(dat1, c("AB", "AC"), -3:4)
    

    Applying it to the earlier example of matching row.name12 and row.name13, you can use it as follows: getMyRows(dat, c(12, 13), -1:1).

    You can also modify the function to make it more general (for example, to specify matching with a column instead of row names).

    0 讨论(0)
  • 2020-12-09 23:35

    Try that:

    extract.with.context <- function(x, rows, after = 0, before = 0) {
    
      match.idx  <- which(rownames(x) %in% rows)
      span       <- seq(from = -before, to = after)
      extend.idx <- c(outer(match.idx, span, `+`))
      extend.idx <- Filter(function(i) i > 0 & i <= nrow(x), extend.idx)
      extend.idx <- sort(unique(extend.idx))
    
      return(x[extend.idx, , drop = FALSE])
    }
    
    dat <- data.frame(x = 1:26, row.names = letters)
    extract.with.context(dat, c("a", "b", "j", "y"), after = 3, before = 1)
    #    x
    # a  1
    # b  2
    # c  3
    # d  4
    # e  5
    # i  9
    # j 10
    # k 11
    # l 12
    # m 13
    # x 24
    # y 25
    # z 26
    
    0 讨论(0)
提交回复
热议问题