Python equivalent of R c() function, for dataframe column indices?

后端 未结 3 1905
面向向阳花
面向向阳花 2021-01-13 14:56

I would like to select from a pandas dataframe specific columns using column index.

In particular, I would like to select columns index by the column index generate

相关标签:
3条回答
  • 2021-01-13 15:04

    To answer the actual question,

    Python equivalent of R c() function, for dataframe column indices?

    I'm using this definition of c()

    c = lambda v: v.split(',') if ":" not in v else eval(f'np.r_[{v}]')
    

    Then we can do things like:

    df = pd.DataFrame({'x': np.random.randn(1000),
                       'y': np.random.randn(1000)})
    # row selection
    df.iloc[c('2:4,7:11,21:25')] 
    
    # columns by name
    df[c('x,y')] 
    
    # columns by range
    df.T[c('12:15,17:25,500:750')]
    

    That's pretty much as close as it gets in terms of R-like syntax.

    To the curious mind

    Note there is a performance penality in using c() as per above v.s. np.r_. To paraphrase Knuth, let's not optimize prematurely ;-)

    %timeit np.r_[2:4, 7:11, 21:25]
    27.3 µs ± 786 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
    
    %timeit c("2:4, 7:11, 21:25")
    53.7 µs ± 977 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
    
    0 讨论(0)
  • 2021-01-13 15:19

    The equivalent is numpy's r_. It combines integer slices without needing to call ranges for each of them:

    np.r_[2:4, 7:11, 21:25]
    Out: array([ 2,  3,  7,  8,  9, 10, 21, 22, 23, 24])
    

    df = pd.DataFrame(np.random.randn(1000))
    df.iloc[np.r_[2:4, 7:11, 21:25]]
    Out: 
               0
    2   2.720383
    3   0.656391
    7  -0.581855
    8   0.047612
    9   1.416250
    10  0.206395
    21 -1.519904
    22  0.681153
    23 -1.208401
    24 -0.358545
    
    0 讨论(0)
  • 2021-01-13 15:26

    Putting @hrbrmstr 's comment into an answer, because it solved my issue and I want to make it clear that this question is resolved. In addition, please note that range(a,b) gives the numbers (a, a+1, ..., b-2, b-1), and doesn't include b.

    R's combine function

    c(4,12:26,69:85,96:99,134:928,933:935)
    

    is translated into Python as

    [4] + list(range(12,27)) + list(range(69,86)) + list(range(96,100)) + list(range(134,929)) + list(range(933,936))
    
    0 讨论(0)
提交回复
热议问题