Filter pandas dataframe with specific column names in python

后端 未结 2 1415
[愿得一人]
[愿得一人] 2021-02-13 02:43

I have a pandas dataframe and a list as follows

mylist = [\'nnn\', \'mmm\', \'yyy\']
mydata =
   xxx   yyy zzz nnn ffffd mmm
0  0  10      5    5   5  5
1  1   9           


        
相关标签:
2条回答
  • 2021-02-13 03:21

    Just pass a list of column names to index df:

    df[['nnn', 'mmm', 'yyy']]
    
       nnn  mmm  yyy
    0    5    5   10
    1    3    4    9
    2    7    0    8
    

    If you need to handle non-existent column names in your list, try filtering with df.columns.isin -

    df.loc[:, df.columns.isin(['nnn', 'mmm', 'yyy', 'zzzzzz'])]
    
       yyy  nnn  mmm
    0   10    5    5
    1    9    3    4
    2    8    7    0
    
    0 讨论(0)
  • 2021-02-13 03:31

    You can just put mylist inside [] and pandas will select it for you.

    mydata_new = mydata[mylist]
    

    Not sure whether your yyy is a typo.

    The reason that you are wrong is that you are assigning mydata_new to a new series every time in the loop.

    for item in mylist:
        mydata_new = mydata[item]  # <-  
    

    Thus, it will create a series rather than the whole df you want.


    If some names in the list is not in your data frame, you can always check it with,

    len(set(mylist) - set(mydata.columns)) > 0
    

    and print it out

    print(set(mylist) - set(mydata.columns))
    

    Then see if there are typos or other unintended behaviors.

    0 讨论(0)
提交回复
热议问题