Selecting pandas dataframe column by list

前端 未结 4 1985
闹比i
闹比i 2020-12-24 02:27

in one of my scripts I\'m selecting several columns of a dataframe, by a list of the column names. The following code works:

data = df[lst]

相关标签:
4条回答
  • 2020-12-24 02:50

    Few other ways, and list comprehension is much faster

    In [1357]: df[df.columns & lst]
    Out[1357]:
       A  B
    0  1  4
    1  2  5
    2  3  6
    
    In [1358]: df[[c for c in df.columns if c in lst]]
    Out[1358]:
       A  B
    0  1  4
    1  2  5
    2  3  6
    

    Timings

    In [1360]: %timeit [c for c in df.columns if c in lst]
    100000 loops, best of 3: 2.54 µs per loop
    
    In [1359]: %timeit df.columns & lst
    1000 loops, best of 3: 231 µs per loop
    
    In [1362]: %timeit df.columns.intersection(lst)
    1000 loops, best of 3: 236 µs per loop
    
    In [1363]: %timeit np.intersect1d(df.columns, lst)
    10000 loops, best of 3: 26.6 µs per loop
    

    Details

    In [1365]: df
    Out[1365]:
       A  B  C  D  E  F
    0  1  4  7  1  5  7
    1  2  5  8  3  3  4
    2  3  6  9  5  6  3
    
    In [1366]: lst
    Out[1366]: ['A', 'R', 'B']
    
    0 讨论(0)
  • 2020-12-24 03:08

    I think you need Index.intersection:

    df = pd.DataFrame({'A':[1,2,3],
                       'B':[4,5,6],
                       'C':[7,8,9],
                       'D':[1,3,5],
                       'E':[5,3,6],
                       'F':[7,4,3]})
    
    print (df)
       A  B  C  D  E  F
    0  1  4  7  1  5  7
    1  2  5  8  3  3  4
    2  3  6  9  5  6  3
    
    lst = ['A','R','B']
    
    print (df.columns.intersection(lst))
    Index(['A', 'B'], dtype='object')
    
    data = df[df.columns.intersection(lst)]
    print (data)
       A  B
    0  1  4
    1  2  5
    2  3  6
    

    Another solution with numpy.intersect1d:

    data = df[np.intersect1d(df.columns, lst)]
    print (data)
       A  B
    0  1  4
    1  2  5
    2  3  6
    
    0 讨论(0)
  • 2020-12-24 03:11

    please try this:

    syntax : Dataframe[[List of Columns]]

    for example : df[['a','b']]

    a
    
    Out[5]: 
        a  b   c
    0   1  2   3
    1  12  3  44
    

    X is the list of req columns to slice

    x = ['a','b']
    

    this would give you the req slice:

    a[x]
    
    Out[7]: 
        a  b
    0   1  2
    1  12  3
    

    Performance:

    %timeit a[x]
    333 µs ± 9.27 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    
    0 讨论(0)
  • 2020-12-24 03:13

    Use * with list

    data = df[[*lst]]

    It will give the desired result.

    0 讨论(0)
提交回复
热议问题