Convert Pandas column containing NaNs to dtype `int`

后端未结

关注

 17  2223

I read data from a .csv file to a Pandas dataframe as below. For one of the columns, namely id, I want to specify the column type as int. The probl

相关标签:

17条回答

野的像风

2020-11-22 11:43

Assuming your DateColumn formatted 3312018.0 should be converted to 03/31/2018 as a string. And, some records are missing or 0.

df['DateColumn'] = df['DateColumn'].astype(int)
df['DateColumn'] = df['DateColumn'].astype(str)
df['DateColumn'] = df['DateColumn'].apply(lambda x: x.zfill(8))
df.loc[df['DateColumn'] == '00000000','DateColumn'] = '01011980'
df['DateColumn'] = pd.to_datetime(df['DateColumn'], format="%m%d%Y")
df['DateColumn'] = df['DateColumn'].apply(lambda x: x.strftime('%m/%d/%Y'))

0 讨论(0)

既然无缘

2020-11-22 11:45

The lack of NaN rep in integer columns is a pandas "gotcha".

The usual workaround is to simply use floats.

0 讨论(0)
发布评论:

提交评论
- 加载中...
野的像风

2020-11-22 11:45
As of Pandas 1.0.0 you can now use pandas.NA values. This does not force integer columns with missing values to be floats.

When reading in your data all you have to do is:
```
df= pd.read_csv("data.csv", dtype={'id': 'Int64'})  
```
Notice the 'Int64' is surrounded by quotes and the I is capitalized. This distinguishes Panda's 'Int64' from numpy's int64.

As a side note, this will also work with .astype()
```
df['id'] = df['id'].astype('Int64')
```
Documentation here https://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html
0 讨论(0)
发布评论:

提交评论
- 加载中...
闹比i

2020-11-22 11:48

First remove the rows which contain NaN. Then do Integer conversion on remaining rows. At Last insert the removed rows again. Hope it will work

0 讨论(0)
发布评论:

提交评论
- 加载中...
走了就别回头了

2020-11-22 11:50
If you absolutely want to combine integers and NaNs in a column, you can use the 'object' data type:
```
df['col'] = (
    df['col'].fillna(0)
    .astype(int)
    .astype(object)
    .where(df['col'].notnull())
)
```
This will replace NaNs with an integer (doesn't matter which), convert to int, convert to object and finally reinsert NaNs.
0 讨论(0)
发布评论:

提交评论
- 加载中...

盖世英雄少女心

2020-11-22 11:51

I ran into this issue working with pyspark. As this is a python frontend for code running on a jvm, it requires type safety and using float instead of int is not an option. I worked around the issue by wrapping the pandas pd.read_csv in a function that will fill user-defined columns with user-defined fill values before casting them to the required type. Here is what I ended up using:

def custom_read_csv(file_path, custom_dtype = None, fill_values = None, **kwargs):
    if custom_dtype is None:
        return pd.read_csv(file_path, **kwargs)
    else:
        assert 'dtype' not in kwargs.keys()
        df = pd.read_csv(file_path, dtype = {}, **kwargs)
        for col, typ in custom_dtype.items():
            if fill_values is None or col not in fill_values.keys():
                fill_val = -1
            else:
                fill_val = fill_values[col]
            df[col] = df[col].fillna(fill_val).astype(typ)
    return df

0 讨论(0)

1 2 3 下一页