Returning above and below rows of specific rows in r dataframe

后端未结

关注

 4  679

Consider any dataframe

            col1   col2    col3   col4
row.name11    A     23      x       y
row.name12    A     29      x       y
row.name13    B


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  不要未来只要你来        
                
              
                            
                2020-12-09 23:11
              
            
            
                                                                       
I would simply proceed as follow:

dat[(grep("row.name12",row.names(dat))-4):(grep("row.name13",row.names(dat))+4),]


grep("row.name12",row.names(dat)) gives you the row number that have "row.name12" as name, so 

(grep("row.name12",row.names(dat))-4):(grep("row.name13",row.names(dat))+4)


gives you a serie of row numbers ranging from the 4th row preceding the row named "row.name12" to the 4th row after the one named "row.name13".
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  清酒与你        
                
              
                            
                2020-12-09 23:12
              
            
            
                                                                       
Create some sample data:

> dat=data.frame(col1=letters,col2=sample(26),col3=sample(letters))
> dat
   col1 col2 col3
1     a   26    x
2     b   12    i
3     c   15    v
...


Set our target vector (note I choose an edge case and overlapping cases), and find matching rows:

> target=c("a","e","g","s")
> match = which(dat$col1 %in% target)


Create sequences from -2 to +2 of the matches (adjust for your needs) and merge:

> getThese = unique(as.vector(mapply(seq,match-2,match+2)))
> getThese
 [1] -1  0  1  2  3  4  5  6  7  8  9 17 18 19 20 21


Fix the edge cases:

> getThese = getThese[getThese > 0 & getThese <= nrow(dat)]
> dat[getThese,]
   col1 col2 col3
1     a   26    x
2     b   12    i
3     c   15    v
4     d   22    d
5     e    2    j
6     f    9    l
7     g    1    w
8     h   21    n
9     i   17    p
17    q   18    a
18    r   10    m
19    s   24    o
20    t   13    e
21    u    3    k
> 


Remember our targets were a, e, g and s. You've now got those plus two rows above and two rows below for each, with no duplicates. 

If you are using row names, just create 'match' from those. I was using a column.

I'd write a bunch more tests using the testthat package if this were my problem.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  野的像风        
                
              
                            
                2020-12-09 23:32
              
            
            
                                                                       
Perhaps a combination of which() and %in% would help you:

dat[which(rownames(dat) %in% c("row.name13")) + c(-1, 1), ]
#            col1 col2 col3 col4
# row.name12    A   29    x    y
# row.name14    A   77    x    y


In the above, we are trying to identify which row names in "dat" are "row.name13" (using which()), and the + c(-1, 1) tells R to return the row before and the row after. If you wanted to include the row, you could do something like + c(-1:1).

To get the range of rows, switch the comma to a colon:

dat[which(rownames(dat) %in% c("row.name13")) + c(-1:1), ]
#            col1 col2 col3 col4
# row.name12    A   29    x    y
# row.name13    B   17    x    y
# row.name14    A   77    x    y




Update

Matching a list is a little bit trickier, but without thinking about it too much, here is a possibility:

myRows <- c("row.name12", "row.name13")
rowRanges <- lapply(which(rownames(dat) %in% myRows), function(x) x + c(-1:1))
# [[1]]
# [1] 1 2 3
# 
# [[2]]
# [1] 2 3 4
#
lapply(rowRanges, function(x) dat[x, ])
# [[1]]
#            col1 col2 col3 col4
# row.name11    A   23    x    y
# row.name12    A   29    x    y
# row.name13    B   17    x    y
# 
# [[2]]
#            col1 col2 col3 col4
# row.name12    A   29    x    y
# row.name13    B   17    x    y
# row.name14    A   77    x    y


This outputs a list of data.frames which might be handy since you might have duplicated rows (as there are in this example).

Update 2: Using grep if it is more appropriate

Here is a variation of your question, one which would be less convenient to solve using the which()...%in% approach.

set.seed(1)
dat1 <- data.frame(ID = 1:25, V1 = sample(100, 25, replace = TRUE))
rownames(dat1) <- paste("rowname", sample(apply(combn(LETTERS[1:4], 2), 
                                               2, paste, collapse = ""), 
                                         25, replace = TRUE), 
                       sprintf("%02d", 1:25), sep = ".")
head(dat1)
#               ID V1
# rowname.AD.01  1 27
# rowname.AB.02  2 38
# rowname.AD.03  3 58
# rowname.CD.04  4 91
# rowname.AD.05  5 21
# rowname.AD.06  6 90


Now, imagine you wanted to identify the rows with AB and AC, but you don't have a list of the numeric suffixes.

Here's a little function that can be used in such a scenario. It borrows a little from @Spacedman to make sure that the rows returned are within the range of the data (as per @flodel's suggestion).

getMyRows <- function(data, matches, range) {
  rowMatches = lapply(unlist(lapply(matches, function(x)
    grep(x, rownames(data)))), function(y) y + range)
  rowMatches = lapply(rowMatches, function(x) x[x > 0 & x <= nrow(data)])
  lapply(rowMatches, function(x) data[x, ])
}


You can use it as follows (but I won't print the results here). First, specify the dataset, then the pattern(s) you want matched, then the range (in this example, three rows before and four rows after). 

getMyRows(dat1, c("AB", "AC"), -3:4)


Applying it to the earlier example of matching row.name12 and row.name13, you can use it as follows: getMyRows(dat, c(12, 13), -1:1).

You can also modify the function to make it more general (for example, to specify matching with a column instead of row names).
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  我寻月下人不归        
                
              
                            
                2020-12-09 23:35
              
            
            
                                                                       
Try that:

extract.with.context <- function(x, rows, after = 0, before = 0) {

  match.idx  <- which(rownames(x) %in% rows)
  span       <- seq(from = -before, to = after)
  extend.idx <- c(outer(match.idx, span, `+`))
  extend.idx <- Filter(function(i) i > 0 & i <= nrow(x), extend.idx)
  extend.idx <- sort(unique(extend.idx))

  return(x[extend.idx, , drop = FALSE])
}

dat <- data.frame(x = 1:26, row.names = letters)
extract.with.context(dat, c("a", "b", "j", "y"), after = 3, before = 1)
#    x
# a  1
# b  2
# c  3
# d  4
# e  5
# i  9
# j 10
# k 11
# l 12
# m 13
# x 24
# y 25
# z 26

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复