map multiple columns by a single dictionary in pandas

前端 未结 3 633
轻奢々
轻奢々 2021-01-03 04:06

I have a DataFrame with a multiple columns with \'yes\' and \'no\' strings. I want all of them to convert to a boolian dtype. To map one column, I would use

         


        
相关标签:
3条回答
  • 2021-01-03 04:48

    You could use a stack/unstack idiom

    df.stack().map(dict_map_yn_bool).unstack()
    

    Using @jezrael's setup

    df = pd.DataFrame({'nearby_subway_station':['yes','no'], 'Station':['no','yes']})
    dict_map_yn_bool={'yes':True, 'no':False}
    

    Then

    df.stack().map(dict_map_yn_bool).unstack()
    
      Station nearby_subway_station
    0   False                  True
    1    True                 False
    

    timing
    small data

    bigger data

    0 讨论(0)
  • 2021-01-03 04:48

    You can use applymap:

    df = pd.DataFrame({'nearby_subway_station':['yes','no'], 'Station':['no','yes']})
    print (df)
      Station nearby_subway_station
    0      no                   yes
    1     yes                    no
    
    dict_map_yn_bool={'yes':True, 'no':False}
    
    df = df.applymap(dict_map_yn_bool.get)
    print (df)
      Station nearby_subway_station
    0   False                  True
    1    True                 False
    

    Another solution:

    for x in df:
        df[x] = df[x].map(dict_map_yn_bool)
    print (df)
      Station nearby_subway_station
    0   False                  True
    1    True                 False
    

    Thanks Jon Clements for very nice idea - using replace:

    df = df.replace({'yes': True, 'no': False})
    print (df)
      Station nearby_subway_station
    0   False                  True
    1    True                 False
    

    Some differences if data are no in dict:

    df = pd.DataFrame({'nearby_subway_station':['yes','no','a'], 'Station':['no','yes','no']})
    print (df)
      Station nearby_subway_station
    0      no                   yes
    1     yes                    no
    2      no                     a
    

    applymap create None for boolean, strings, for numeric NaN.

    df = df.applymap(dict_map_yn_bool.get)
    print (df)
      Station nearby_subway_station
    0   False                  True
    1    True                 False
    2   False                  None
    

    map create NaN:

    for x in df:
        df[x] = df[x].map(dict_map_yn_bool)
    
    print (df)
      Station nearby_subway_station
    0   False                  True
    1    True                 False
    2   False                   NaN
    

    replace dont create NaN or None, but original data are untouched:

    df = df.replace(dict_map_yn_bool)
    print (df)
      Station nearby_subway_station
    0   False                  True
    1    True                 False
    2   False                     a
    
    0 讨论(0)
  • 2021-01-03 05:01

    I would work with pandas.DataFrame.replace as I think it is the simplest and has built-in arguments to support this task. Also a one-liner solution, as requested.

    First case, replace all instances of 'yes' or 'no':

    import pandas as pd
    import numpy as np
    from numpy import random
    
    # Generating the data, 20 rows by 5 columns.
    data = random.choice(['yes','no'], size=(20, 5), replace=True)
    col_names = ['col_{}'.format(a) for a in range(1,6)]
    df = pd.DataFrame(data, columns=col_names)
    
    # Supplying lists of values to what they will replace. No dict needed.
    df_bool = df.replace(to_replace=['yes','no'], value=[True, False])
    

    Second case, where you only want to replace in a subset of columns, as described in the documentation for DataFrame.replace. Use a nested dictionary where the first set of keys are columns with values to replace, and values are dictionaries mapping values to their replacements:

    dict_map_yn_bool={'yes':True, 'no':False}
    replace_dict = {'col_1':dict_map_yn_bool, 
               'col_2':dict_map_yn_bool}
    df_bool = df.replace(to_replace=replace_dict)
    
    0 讨论(0)
提交回复
热议问题