Select multiple columns by labels in pandas

前端未结

关注

 3  1470

I\'ve been looking around for ways to select columns through the python documentation and the forums but every example on indexing columns are too simplistic.

Suppo

相关标签:

3条回答

悲&欢浪女

2020-12-02 14:51
Just pick the columns you want directly....
```
df[['A','E','I','C']]
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
清酒与你

2020-12-02 14:58
Name- or Label-Based (using regular expression syntax)
```
df.filter(regex='[A-CEG-I]')   # does NOT depend on the column order
```
Note that any regular expression is allowed here, so this approach can be very general. E.g. if you wanted all columns starting with a capital or lowercase "A" you could use: df.filter(regex='^[Aa]')

Location-Based (depends on column order)
```
df[ list(df.loc[:,'A':'C']) + ['E'] + list(df.loc[:,'G':'I']) ]
```
Note that unlike the label-based method, this only works if your columns are alphabetically sorted. This is not necessarily a problem, however. For example, if your columns go ['A','C','B'], then you could replace 'A':'C' above with 'A':'B'.

The Long Way

And for completeness, you always have the option shown by @Magdalena of simply listing each column individually, although it could be much more verbose as the number of columns increases:
```
df[['A','B','C','E','G','H','I']]   # does NOT depend on the column order
```
Results for any of the above methods
```
          A         B         C         E         G         H         I
0 -0.814688 -1.060864 -0.008088  2.697203 -0.763874  1.793213 -0.019520
1  0.549824  0.269340  0.405570 -0.406695 -0.536304 -1.231051  0.058018
2  0.879230 -0.666814  1.305835  0.167621 -1.100355  0.391133  0.317467
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
迷失自我

2020-12-02 15:01
How do I select multiple columns by labels in pandas?

Multiple label-based range slicing is not easily supported with pandas, but position-based slicing is, so let's try that instead:
```
loc = df.columns.get_loc
df.iloc[:, np.r_[loc('A'):loc('C')+1, loc('E'), loc('G'):loc('I')+1]]

          A         B         C         E         G         H         I
0 -1.666330  0.321260 -1.768185 -0.034774  0.023294  0.533451 -0.241990
1  0.911498  3.408758  0.419618 -0.462590  0.739092  1.103940  0.116119
2  1.243001 -0.867370  1.058194  0.314196  0.887469  0.471137 -1.361059
3 -0.525165  0.676371  0.325831 -1.152202  0.606079  1.002880  2.032663
4  0.706609 -0.424726  0.308808  1.994626  0.626522 -0.033057  1.725315
5  0.879802 -1.961398  0.131694 -0.931951 -0.242822 -1.056038  0.550346
6  0.199072  0.969283  0.347008 -2.611489  0.282920 -0.334618  0.243583
7  1.234059  1.000687  0.863572  0.412544  0.569687 -0.684413 -0.357968
8 -0.299185  0.566009 -0.859453 -0.564557 -0.562524  0.233489 -0.039145
9  0.937637 -2.171174 -1.940916 -1.553634  0.619965 -0.664284 -0.151388
```
Note that the +1 is added because when using iloc the rightmost index is exclusive.

Comments on Other Solutions
- filter is a nice and simple method for OP's headers, but this might not generalise well to arbitrary column names.
- The "location-based" solution with loc is a little closer to the ideal, but you cannot avoid creating intermediate DataFrames (that are eventually thrown out and garbage collected) to compute the final result range -- something that we would ideally like to avoid.
- Lastly, "pick your columns directly" is good advice as long as you have a manageably small number of columns to pick. It will, however not be applicable in some cases where ranges span dozens (or possibly hundreds) of columns.
0 讨论(0)
发布评论:

提交评论
- 加载中...

Select multiple columns by labels in pandas

Name- or Label-Based (using regular expression syntax)

Location-Based (depends on column order)

The Long Way

Results for any of the above methods

How do I select multiple columns by labels in pandas?

Comments on Other Solutions