Why does pd.concat change the resulting datatype from int to float?

后端 未结 2 1541
小鲜肉
小鲜肉 2021-02-07 22:04

I have three dataframes: timestamp (with timestamps), dataSun (with timestamps of sunrise and sunset), dataData (with different climate data). Dataframe timestamp h

相关标签:
2条回答
  • 2021-02-07 22:42

    As of pandas 1.0.0 I believe you have another option, which is to first use convert_dtypes. This converts the dataframe columns to dtypes that support pd.NA, avoiding the issues with NaNs discussed in this answer.

    0 讨论(0)
  • 2021-02-07 22:46

    Because of this -

    timestamp      7188 non-null int64
    sunrise        7176 non-null float64
    ...
    

    timestamp has 7188 non-null values, while sunrise and onwards have 7176. It goes without saying that there are 12 values that are not non-null... meaning they're NaNs.

    Since NaNs are of dtype=float, every other value in that column is automatically upcasted to float, and float numbers that big are usually represented in scientific notation.

    That's the why, but that doesn't really solve your problem. Your options at this point are

    1. drop those rows with NaNs using dropna
    2. fill those NaNs with some default integeral value using fillna

    (Now you may downcast these rows to int.)

    1. Alternatively, if you perform pd.concat with join='inner', NaNs are not introduced and the dtypes are preserved.

      pd.concat((timestamp, dataSun, dataData), axis=1, join='inner')
      
             timestamp        sunrise         sunset  temperature     pressure  \    
      0  1521681600000  1521696105000  1521740761000     2.490000  1018.000000   
      1  1521681900000  1521696105000  1521740761000     2.408333  1017.833333   
      2  1521682200000  1521696105000  1521740761000     2.326667  1017.666667   
      3  1521682500000  1521696105000  1521740761000     2.245000  1017.500000   
      4  1521682800000  1521696105000  1521740761000     2.163333  1017.333333   
      
         humidity  
      0      99.0  
      1      99.0  
      2      99.0  
      3      99.0  
      4      99.0 
      

    With option 3, an inner join is performed on the indexes of each dataframe.

    0 讨论(0)
提交回复
热议问题