Selecting multiple columns in a pandas dataframe

后端 未结 19 1768
醉话见心
醉话见心 2020-11-22 00:08

I have data in different columns but I don\'t know how to extract it to save it in another variable.

index  a   b   c
1      2   3   4
2      3   4   5


        
相关标签:
19条回答
  • 2020-11-22 00:15

    Try to use pandas.DataFrame.get (see docs):

    import pandas as pd
    import numpy as np
    dates = pd.date_range('20200102', periods=6)
    df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))
    df.get(['A','C'])
    
    0 讨论(0)
  • 2020-11-22 00:20

    Starting with 0.21.0, using .loc or [] with a list with one or more missing labels is deprecated in favor of .reindex. So, the answer to your question is:

    df1 = df.reindex(columns=['b','c'])

    In prior versions, using .loc[list-of-labels] would work as long as at least 1 of the keys was found (otherwise it would raise a KeyError). This behavior is deprecated and now shows a warning message. The recommended alternative is to use .reindex().

    Read more at Indexing and Selecting Data

    0 讨论(0)
  • 2020-11-22 00:22

    you can also use df.pop()

    >>> df = pd.DataFrame([('falcon', 'bird',    389.0),
    ...                    ('parrot', 'bird',     24.0),
    ...                    ('lion',   'mammal',   80.5),
    ...                    ('monkey', 'mammal', np.nan)],
    ...                   columns=('name', 'class', 'max_speed'))
    >>> df
         name   class  max_speed
    0  falcon    bird      389.0
    1  parrot    bird       24.0
    2    lion  mammal       80.5
    3  monkey  mammal 
    
    >>> df.pop('class')
    0      bird
    1      bird
    2    mammal
    3    mammal
    Name: class, dtype: object
    
    >>> df
         name  max_speed
    0  falcon      389.0
    1  parrot       24.0
    2    lion       80.5
    3  monkey        NaN
    

    let me know if this helps so for you , please use df.pop(c)

    0 讨论(0)
  • 2020-11-22 00:23

    As of version 0.11.0, columns can be sliced in the manner you tried using the .loc indexer:

    df.loc[:, 'C':'E']
    

    is equivalent of

    df[['C', 'D', 'E']]  # or df.loc[:, ['C', 'D', 'E']]
    

    and returns columns C through E.


    A demo on a randomly generated DataFrame:

    import pandas as pd
    import numpy as np
    np.random.seed(5)
    df = pd.DataFrame(np.random.randint(100, size=(100, 6)), 
                      columns=list('ABCDEF'), 
                      index=['R{}'.format(i) for i in range(100)])
    df.head()
    
    Out: 
         A   B   C   D   E   F
    R0  99  78  61  16  73   8
    R1  62  27  30  80   7  76
    R2  15  53  80  27  44  77
    R3  75  65  47  30  84  86
    R4  18   9  41  62   1  82
    

    To get the columns from C to E (note that unlike integer slicing, 'E' is included in the columns):

    df.loc[:, 'C':'E']
    
    Out: 
          C   D   E
    R0   61  16  73
    R1   30  80   7
    R2   80  27  44
    R3   47  30  84
    R4   41  62   1
    R5    5  58   0
    ...
    

    Same works for selecting rows based on labels. Get the rows 'R6' to 'R10' from those columns:

    df.loc['R6':'R10', 'C':'E']
    
    Out: 
          C   D   E
    R6   51  27  31
    R7   83  19  18
    R8   11  67  65
    R9   78  27  29
    R10   7  16  94
    

    .loc also accepts a boolean array so you can select the columns whose corresponding entry in the array is True. For example, df.columns.isin(list('BCD')) returns array([False, True, True, True, False, False], dtype=bool) - True if the column name is in the list ['B', 'C', 'D']; False, otherwise.

    df.loc[:, df.columns.isin(list('BCD'))]
    
    Out: 
          B   C   D
    R0   78  61  16
    R1   27  30  80
    R2   53  80  27
    R3   65  47  30
    R4    9  41  62
    R5   78   5  58
    ...
    
    0 讨论(0)
  • 2020-11-22 00:24

    I've seen several answers on that, but on remained unclear to me. How would you select those columns of interest? The answer to that is that if you have them gathered in a list, you can just reference the columns using the list.

    Example

    print(extracted_features.shape)
    print(extracted_features)
    
    (63,)
    ['f000004' 'f000005' 'f000006' 'f000014' 'f000039' 'f000040' 'f000043'
     'f000047' 'f000048' 'f000049' 'f000050' 'f000051' 'f000052' 'f000053'
     'f000054' 'f000055' 'f000056' 'f000057' 'f000058' 'f000059' 'f000060'
     'f000061' 'f000062' 'f000063' 'f000064' 'f000065' 'f000066' 'f000067'
     'f000068' 'f000069' 'f000070' 'f000071' 'f000072' 'f000073' 'f000074'
     'f000075' 'f000076' 'f000077' 'f000078' 'f000079' 'f000080' 'f000081'
     'f000082' 'f000083' 'f000084' 'f000085' 'f000086' 'f000087' 'f000088'
     'f000089' 'f000090' 'f000091' 'f000092' 'f000093' 'f000094' 'f000095'
     'f000096' 'f000097' 'f000098' 'f000099' 'f000100' 'f000101' 'f000103']
    

    I have the following list/numpy array extracted_features, specifying 63 columns. The original dataset has 103 columns, and I would like to extract exactly those, then I would use

    dataset[extracted_features]
    

    And you will end up with this

    This something you would use quite often in Machine Learning (more specifically, in feature selection). I would like to discuss other ways too, but I think that has already been covered by other stackoverflowers. Hope this've been helpful!

    0 讨论(0)
  • 2020-11-22 00:29

    If you want to get one element by row index and column name, you can do it just like df['b'][0]. It is as simple as you can image.

    Or you can use df.ix[0,'b'],mixed usage of index and label.

    Note: Since v0.20 ix has been deprecated in favour of loc / iloc.

    0 讨论(0)
提交回复
热议问题