Convert Pandas column containing NaNs to dtype `int`

后端 未结 17 2211
终归单人心
终归单人心 2020-11-22 11:18

I read data from a .csv file to a Pandas dataframe as below. For one of the columns, namely id, I want to specify the column type as int. The probl

17条回答
  •  盖世英雄少女心
    2020-11-22 11:51

    I ran into this issue working with pyspark. As this is a python frontend for code running on a jvm, it requires type safety and using float instead of int is not an option. I worked around the issue by wrapping the pandas pd.read_csv in a function that will fill user-defined columns with user-defined fill values before casting them to the required type. Here is what I ended up using:

    def custom_read_csv(file_path, custom_dtype = None, fill_values = None, **kwargs):
        if custom_dtype is None:
            return pd.read_csv(file_path, **kwargs)
        else:
            assert 'dtype' not in kwargs.keys()
            df = pd.read_csv(file_path, dtype = {}, **kwargs)
            for col, typ in custom_dtype.items():
                if fill_values is None or col not in fill_values.keys():
                    fill_val = -1
                else:
                    fill_val = fill_values[col]
                df[col] = df[col].fillna(fill_val).astype(typ)
        return df
    

提交回复
热议问题