How to select range of columns in a dataframe based on their name and not their indexes?

前端 未结 5 1535
梦谈多话
梦谈多话 2020-12-21 09:50

In a pandas dataframe created like this:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randint(10, size=(6, 6)),
                  colu         


        
相关标签:
5条回答
  • 2020-12-21 10:21

    A solution using dplyr package but you need to specify the row you want to select before hand

    rowName2Match <- c("r1", "r5")
    
    df1 <- df %>% 
      select(matches("2"):matches("4")) %>% 
      add_rownames() %>% 
      mutate(idRow = match(rowname, rowName2Match)) %>% 
      slice(which(!is.na(idRow))) %>% 
      select(-idRow)
    df1
    
    > df1
    Source: local data frame [2 x 4]
    
      rowname    c2    c3    c4
       <chr> <int> <int> <int>
    1      r1     2     3     4
    2      r5     6     7     8
    
    0 讨论(0)
  • 2020-12-21 10:30

    This seems way too easy so perhaps I'm doing something wrong.

    df <- data.frame(c1=1:6, c2=2:7, c3=3:8, c4=4:9, c5=5:10, c6=6:11,
                     row.names=c('r1', 'r2', 'r3', 'r4', 'r5', 'r6'))
    
    
    df[c('r1','r2'),c('c1','c2')]
    
       c1 c2
    r1  1  2
    r2  2  3
    
    0 讨论(0)
  • 2020-12-21 10:32

    Adding onto @evan058's answer:

    subset(df[rownames(df) %in% c("r3", "r4", "r5"),], select=c1:c4)
    
    c1 c2 c3 c4
    r3  3  4  5  6
    r4  4  5  6  7
    r5  5  6  7  8
    

    But note, the : operator will probably not work here; you will have to write out the name of each row you want to include explicitly. It might be easier to group by a particular value of one of your other columns or to create an index column as @evan058 mentioned in comments.

    0 讨论(0)
  • 2020-12-21 10:33

    An alternative approach to subset if you don't mind to work with data.table would be:

    data.table::setDT(df)
    df[1:3, c2:c4, with=F]
       c2 c3 c4
    1:  2  3  4
    2:  3  4  5
    3:  4  5  6
    

    This still does not solve the problem of subsetting row range though.

    0 讨论(0)
  • 2020-12-21 10:35

    It looks like you can accomplish this with a subset:

    > df <- data.frame(c1=1:6, c2=2:7, c3=3:8, c4=4:9, c5=5:10, c6=6:11)
    > rownames(df) <- c('r1', 'r2', 'r3', 'r4', 'r5', 'r6')
    > subset(df, select=c1:c4)
       c1 c2 c3 c4
    r1  1  2  3  4
    r2  2  3  4  5
    r3  3  4  5  6
    r4  4  5  6  7
    r5  5  6  7  8
    r6  6  7  8  9
    > subset(df, select=c1:c2)
       c1 c2
    r1  1  2
    r2  2  3
    r3  3  4
    r4  4  5
    r5  5  6
    r6  6  7
    

    If you want to subset by row name range, this hack would do:

    > gRI <- function(df, rName) {which(match(rNames, rName) == 1)}
    > df[gRI(df,"r2"):gRI(df,"r4"),]
       c1 c2 c3 c4 c5 c6
    r2  2  3  4  5  6  7
    r3  3  4  5  6  7  8
    r4  4  5  6  7  8  9
    
    0 讨论(0)
提交回复
热议问题