I import pandas as pd and run the code below and get the following result
Code:
traindataset = pd.read_csv(\'/Users/train.csv\')
print traindataset.d
pd.DataFrame.dropna uses inplace=False
by default. This is the norm with most Pandas operations; exceptions do exist, e.g. update.
Therefore, you must either assign back to your variable, or state explicitly inplace=True
:
df = df.dropna(how='any') # assign back
df.dropna(how='any', inplace=True) # set inplace parameter
Stylistically, the former is often preferred as it supports operator chaining, and the latter often does not yield any or significant performance benefits.
This is my first post. I just spent a few hours debugging this exact issue and I would like to share how I fixed this issue.
I was converting my entire dataframe to a string and then placing that value back into the dataframe using similar code to what is displayed below: (please note, the code below will only convert the value to a string)
row_counter = 0
for ind, row in dataf.iterrows():
cell_value = str(row['column_header'])
dataf.loc[row_counter, 'column_header'] = cell_value
row_counter += 1
After converting the entire dataframe to a string, I then used the dropna()
function. The values that were previously NaN
(considered a null value by pandas) were converted to the string 'nan'
.
In conclusion, drop blank values FIRST, before you start manipulating data in the CSV and converting its data type.
You need to read the documentation (emphasis added):
Return object with labels on given axis omitted
dropna
returns a new DataFrame. If you want it to modify the existing DataFrame, all you have to do is read further in the documentation:
inplace : boolean, default False
If True, do operation inplace and return None.
So to modify it in place, do traindataset.dropna(how='any', inplace=True)
.
Alternatively, you can also use notnull()
method to select the rows which are not null
.
For example if you want to select Non null
values from columns country
and variety
of the dataframe reviews:
answer=reviews.loc[(reviews.country.notnull()) & (reviews.variety.notnull())]
But here we are just selecting relevant data;to remove null
values you should use dropna()
method.