pandas DataFrame combine_first and update methods have strange behavior

后端 未结 2 1523
悲&欢浪女
悲&欢浪女 2021-01-24 21:36

I\'m running into a strange issue (or intended?) where combine_first or update are causing values stored as bool to be upcasted into

相关标签:
2条回答
  • 2021-01-24 21:57

    Before updating, the dateframe b is been filled by reindex_link, so that b becomes

    In [5]: b.reindex_like(a)
    Out[5]: 
        a   b  isBool  isBool2
    0  45  45     NaN      NaN
    1 NaN NaN     NaN      NaN
    

    And then use numpy.where to update the data frame.

    The tragedy is that for numpy.where, if two data have different type, the more general one would be used. For example

    In [20]: np.where(True, [True], [0])
    Out[20]: array([1])
    
    In [21]: np.where(True, [True], [1.0])
    Out[21]: array([ 1.])
    

    Since NaN in numpy is floating type, it'll also return an floating type.

    In [22]: np.where(True, [True], [np.nan])
    Out[22]: array([ 1.])
    

    Therefore, after updating, your 'isBool' and 'isBool2' column become floating type.

    I've added this issue on the issue tracker for pandas

    0 讨论(0)
  • 2021-01-24 22:17

    this is a bug, update shouldn't touch unspecified columns, fixed here https://github.com/pydata/pandas/pull/3021

    0 讨论(0)
提交回复
热议问题