Select multiple columns by labels in pandas

前端 未结 3 1470
梦毁少年i
梦毁少年i 2020-12-02 14:15

I\'ve been looking around for ways to select columns through the python documentation and the forums but every example on indexing columns are too simplistic.

Suppo

相关标签:
3条回答
  • 2020-12-02 14:51

    Just pick the columns you want directly....

    df[['A','E','I','C']]
    
    0 讨论(0)
  • 2020-12-02 14:58

    Name- or Label-Based (using regular expression syntax)

    df.filter(regex='[A-CEG-I]')   # does NOT depend on the column order
    

    Note that any regular expression is allowed here, so this approach can be very general. E.g. if you wanted all columns starting with a capital or lowercase "A" you could use: df.filter(regex='^[Aa]')

    Location-Based (depends on column order)

    df[ list(df.loc[:,'A':'C']) + ['E'] + list(df.loc[:,'G':'I']) ]
    

    Note that unlike the label-based method, this only works if your columns are alphabetically sorted. This is not necessarily a problem, however. For example, if your columns go ['A','C','B'], then you could replace 'A':'C' above with 'A':'B'.

    The Long Way

    And for completeness, you always have the option shown by @Magdalena of simply listing each column individually, although it could be much more verbose as the number of columns increases:

    df[['A','B','C','E','G','H','I']]   # does NOT depend on the column order
    

    Results for any of the above methods

              A         B         C         E         G         H         I
    0 -0.814688 -1.060864 -0.008088  2.697203 -0.763874  1.793213 -0.019520
    1  0.549824  0.269340  0.405570 -0.406695 -0.536304 -1.231051  0.058018
    2  0.879230 -0.666814  1.305835  0.167621 -1.100355  0.391133  0.317467
    
    0 讨论(0)
  • 2020-12-02 15:01

    How do I select multiple columns by labels in pandas?

    Multiple label-based range slicing is not easily supported with pandas, but position-based slicing is, so let's try that instead:

    loc = df.columns.get_loc
    df.iloc[:, np.r_[loc('A'):loc('C')+1, loc('E'), loc('G'):loc('I')+1]]
    
              A         B         C         E         G         H         I
    0 -1.666330  0.321260 -1.768185 -0.034774  0.023294  0.533451 -0.241990
    1  0.911498  3.408758  0.419618 -0.462590  0.739092  1.103940  0.116119
    2  1.243001 -0.867370  1.058194  0.314196  0.887469  0.471137 -1.361059
    3 -0.525165  0.676371  0.325831 -1.152202  0.606079  1.002880  2.032663
    4  0.706609 -0.424726  0.308808  1.994626  0.626522 -0.033057  1.725315
    5  0.879802 -1.961398  0.131694 -0.931951 -0.242822 -1.056038  0.550346
    6  0.199072  0.969283  0.347008 -2.611489  0.282920 -0.334618  0.243583
    7  1.234059  1.000687  0.863572  0.412544  0.569687 -0.684413 -0.357968
    8 -0.299185  0.566009 -0.859453 -0.564557 -0.562524  0.233489 -0.039145
    9  0.937637 -2.171174 -1.940916 -1.553634  0.619965 -0.664284 -0.151388
    

    Note that the +1 is added because when using iloc the rightmost index is exclusive.


    Comments on Other Solutions

    • filter is a nice and simple method for OP's headers, but this might not generalise well to arbitrary column names.

    • The "location-based" solution with loc is a little closer to the ideal, but you cannot avoid creating intermediate DataFrames (that are eventually thrown out and garbage collected) to compute the final result range -- something that we would ideally like to avoid.

    • Lastly, "pick your columns directly" is good advice as long as you have a manageably small number of columns to pick. It will, however not be applicable in some cases where ranges span dozens (or possibly hundreds) of columns.

    0 讨论(0)
提交回复
热议问题