How to replace a value in pandas, with NaN?

后端未结

关注

 6  1654

I am new to pandas , I am trying to load the csv in Dataframe. My data has missing values represented as ? , and I am trying to replace it with standard Missing values - NaN

相关标签:

6条回答

野的像风

2020-12-01 07:11
Use numpy.nan

Numpy - Replace a number with NaN
```
import numpy as np
df.applymap(lambda x: np.nan if x == '?' else x)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

眼角桃花

2020-12-01 07:12

okay I got it by :

 #========trying to replace ?
    newraw= rawfile.replace('[?]', np.nan, regex=True)
    print newraw[25:40]

0 讨论(0)

情深已故

2020-12-01 07:15
There are many ways folks, this is best, if you figure that your CSV file has any object for NAN like "missing", just use
```
    rawfile = pd.read_csv("Property_train.csv", na_values=["missing"])
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
[愿得一人]

2020-12-01 07:17

df=df.replace({'?':np.NaN})

Using Dictionary to replace any value by NaN

0 讨论(0)
发布评论:

提交评论
- 加载中...
死守一世寂寞

2020-12-01 07:24
You can replace this just for that column using replace:
```
df['workclass'].replace('?', np.NaN)
```
or for the whole df:
```
df.replace('?', np.NaN)
```
UPDATE

OK I figured out your problem, by default if you don't pass a separator character then read_csv will use commas ',' as the separator.

Your data and in particular one example where you have a problematic line:
```
54, ?, 180211, Some-college, 10, Married-civ-spouse, ?, Husband, Asian-Pac-Islander, Male, 0, 0, 60, South, >50K
```
has in fact a comma and a space as the separator so when you passed the na_value=['?'] this didn't match because all your values have a space character in front of them all which you can't observe.

if you change your line to this:
```
rawfile = pd.read_csv(filename, header=None, names=DataLabels, sep=',\s', na_values=["?"])
```
then you should find that it all works:
```
27      54               NaN  180211  Some-college             10 
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
执念已碎

2020-12-01 07:33
some times there will be white spaces with the ? in the file generated by systems like informatica or HANA

first you Need to strip the white spaces in the DataFrame
```
temp_df_trimmed = temp_df.apply(lambda x: x.str.strip() if x.dtype == "object" else x)
```
And later apply the function to replace the data
```
temp_df_trimmed['RC'] = temp_df_trimmed['RC'].map(lambda x: np.nan if x=="?"  else x)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...