Python equivalent of R c() function, for dataframe column indices?

后端未结

关注

 3  1907

面向向阳花

I would like to select from a pandas dataframe specific columns using column index.

In particular, I would like to select columns index by the column index generate

相关标签:

3条回答

佛祖请我去吃肉

2021-01-13 15:04

To answer the actual question,

Python equivalent of R c() function, for dataframe column indices?

I'm using this definition of c()

c = lambda v: v.split(',') if ":" not in v else eval(f'np.r_[{v}]')

Then we can do things like:

df = pd.DataFrame({'x': np.random.randn(1000),
                   'y': np.random.randn(1000)})
# row selection
df.iloc[c('2:4,7:11,21:25')] 

# columns by name
df[c('x,y')] 

# columns by range
df.T[c('12:15,17:25,500:750')]

That's pretty much as close as it gets in terms of R-like syntax.

To the curious mind

Note there is a performance penality in using c() as per above v.s. np.r_. To paraphrase Knuth, let's not optimize prematurely ;-)

%timeit np.r_[2:4, 7:11, 21:25]
27.3 µs ± 786 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit c("2:4, 7:11, 21:25")
53.7 µs ± 977 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

0 讨论(0)

伪装坚强ぢ

2021-01-13 15:19

The equivalent is numpy's r_. It combines integer slices without needing to call ranges for each of them:

np.r_[2:4, 7:11, 21:25]
Out: array([ 2,  3,  7,  8,  9, 10, 21, 22, 23, 24])

df = pd.DataFrame(np.random.randn(1000))
df.iloc[np.r_[2:4, 7:11, 21:25]]
Out: 
           0
2   2.720383
3   0.656391
7  -0.581855
8   0.047612
9   1.416250
10  0.206395
21 -1.519904
22  0.681153
23 -1.208401
24 -0.358545

0 讨论(0)

死守一世寂寞

2021-01-13 15:26
Putting @hrbrmstr 's comment into an answer, because it solved my issue and I want to make it clear that this question is resolved. In addition, please note that range(a,b) gives the numbers (a, a+1, ..., b-2, b-1), and doesn't include b.

R's combine function
```
c(4,12:26,69:85,96:99,134:928,933:935)
```
is translated into Python as
```
[4] + list(range(12,27)) + list(range(69,86)) + list(range(96,100)) + list(range(134,929)) + list(range(933,936))
```
0 讨论(0)
发布评论:

提交评论
- 加载中...