问题
I import pandas as pd and run the code below and get the following result
Code:
traindataset = pd.read_csv(\'/Users/train.csv\')
print traindataset.dtypes
print traindataset.shape
print traindataset.iloc[25,3]
traindataset.dropna(how=\'any\')
print traindataset.iloc[25,3]
print traindataset.shape
Output
TripType int64
VisitNumber int64
Weekday object
Upc float64
ScanCount int64
DepartmentDescription object
FinelineNumber float64
dtype: object
(647054, 7)
nan
nan
(647054, 7)
[Finished in 2.2s]
From the result, the dropna line doesn\'t work because the row number doesn\'t change and there is still NAN in the dataframe. How that comes? I am craaaazy right now.
回答1:
You need to read the documentation (emphasis added):
Return object with labels on given axis omitted
dropna
returns a new DataFrame. If you want it to modify the existing DataFrame, all you have to do is read further in the documentation:
inplace : boolean, default False
If True, do operation inplace and return None.
So to modify it in place, do traindataset.dropna(how='any', inplace=True)
.
回答2:
Alternatively, you can also use notnull()
method to select the rows which are not null
.
For example if you want to select Non null
values from columns country
and variety
of the dataframe reviews:
answer=reviews.loc[(reviews.country.notnull()) & (reviews.variety.notnull())]
But here we are just selecting relevant data;to remove null
values you should use dropna()
method.
回答3:
pd.DataFrame.dropna uses inplace=False
by default. This is the norm with most Pandas operations; exceptions do exist, e.g. update.
Therefore, you must either assign back to your variable, or state explicitly inplace=True
:
df = df.dropna(how='any') # assign back
df.dropna(how='any', inplace=True) # set inplace parameter
Stylistically, the former is often preferred as it supports operator chaining, and the latter often does not yield any or significant performance benefits.
回答4:
This is my first post. I just spent a few hours debugging this exact issue and I would like to share how I fixed this issue.
I was converting my entire dataframe to a string and then placing that value back into the dataframe using similar code to what is displayed below: (please note, the code below will only convert the value to a string)
row_counter = 0
for ind, row in dataf.iterrows():
cell_value = str(row['column_header'])
dataf.loc[row_counter, 'column_header'] = cell_value
row_counter += 1
After converting the entire dataframe to a string, I then used the dropna()
function. The values that were previously NaN
(considered a null value by pandas) were converted to the string 'nan'
.
In conclusion, drop blank values FIRST, before you start manipulating data in the CSV and converting its data type.
来源:https://stackoverflow.com/questions/33643843/cant-drop-nan-with-dropna-in-pandas