select columns based on columns names containing a specific string in pandas

后端 未结 4 774
伪装坚强ぢ
伪装坚强ぢ 2020-12-03 01:51

I created a dataframe using the following:

df = pd.DataFrame(np.random.rand(10, 3), columns=[\'alp1\', \'alp2\', \'bet1\'])

I\'d like to ge

相关标签:
4条回答
  • 2020-12-03 02:15

    alternative methods:

    In [13]: df.loc[:, df.columns.str.startswith('alp')]
    Out[13]:
           alp1      alp2
    0  0.357564  0.108907
    1  0.341087  0.198098
    2  0.416215  0.644166
    3  0.814056  0.121044
    4  0.382681  0.110829
    5  0.130343  0.219829
    6  0.110049  0.681618
    7  0.949599  0.089632
    8  0.047945  0.855116
    9  0.561441  0.291182
    
    In [14]: df.loc[:, df.columns.str.contains('alp')]
    Out[14]:
           alp1      alp2
    0  0.357564  0.108907
    1  0.341087  0.198098
    2  0.416215  0.644166
    3  0.814056  0.121044
    4  0.382681  0.110829
    5  0.130343  0.219829
    6  0.110049  0.681618
    7  0.949599  0.089632
    8  0.047945  0.855116
    9  0.561441  0.291182
    
    0 讨论(0)
  • 2020-12-03 02:24

    You've several options, here's a couple:

    1 - filter with like:

    df.filter(like='alp')
    

    2 - filter with regex:

    df.filter(regex='alp')
    
    0 讨论(0)
  • 2020-12-03 02:27

    option 1
    Full numpy + pd.DataFrame

    m = np.core.defchararray.find(df.columns.values.astype(str), 'alp') >= 0
    pd.DataFrame(df.values[:, m], df.index, df.columns[m])
    
           alp1      alp2
    0  0.819189  0.356867
    1  0.900406  0.968947
    2  0.201382  0.658768
    3  0.700727  0.946509
    4  0.176423  0.290426
    5  0.132773  0.378251
    6  0.749374  0.983251
    7  0.768689  0.415869
    8  0.292140  0.457596
    9  0.214937  0.976780
    

    option 2
    numpy + loc

    m = np.core.defchararray.find(df.columns.values.astype(str), 'alp') >= 0
    df.loc[:, m]
    
           alp1      alp2
    0  0.819189  0.356867
    1  0.900406  0.968947
    2  0.201382  0.658768
    3  0.700727  0.946509
    4  0.176423  0.290426
    5  0.132773  0.378251
    6  0.749374  0.983251
    7  0.768689  0.415869
    8  0.292140  0.457596
    9  0.214937  0.976780
    

    timing
    numpy is faster

    0 讨论(0)
  • 2020-12-03 02:33

    In case @Pedro answer doesn't work here is official way of doing it for pandas 0.25

    Sample dataframe:

    >>> df = pd.DataFrame(np.array(([1, 2, 3], [4, 5, 6])),
    ...                   index=['mouse', 'rabbit'],
    ...                   columns=['one', 'two', 'three'])
    
             one two three
    mouse     1   2   3
    rabbit    4   5   6
    

    Select columns by name

    df.filter(items=['one', 'three'])
             one  three
    mouse     1      3
    rabbit    4      6
    

    Select columns by regular expression

    df.filter(regex='e$', axis=1) #ending with *e*, for checking containing just use it without *$* in the end
             one  three
    mouse     1      3
    rabbit    4      6
    

    Select rows containing 'bbi'

    df.filter(like='bbi', axis=0)
             one  two  three
    rabbit    4    5      6
    
    0 讨论(0)
提交回复
热议问题