What is the best way to account for (not a number) nan values in a pandas DataFrame?
The following code:
import numpy as np
import pandas as pd
dfd =
Yet another way to count all the nans in a df:
num_nans = df.size - df.count().sum()
Timings:
import timeit
import numpy as np
import pandas as pd
df_scale = 100000
df = pd.DataFrame(
[[1, np.nan, 100, 63], [2, np.nan, 101, 63], [2, 12, 102, 63],
[2, 14, 102, 63], [2, 14, 102, 64], [1, np.nan, 200, 63]] * df_scale,
columns=['group', 'value', 'value2', 'dummy'])
repeat = 3
numbers = 100
setup = """import pandas as pd
from __main__ import df
"""
def timer(statement, _setup=None):
print (min(
timeit.Timer(statement, setup=_setup or setup).repeat(
repeat, numbers)))
timer('df.size - df.count().sum()')
timer('df.isna().sum().sum()')
timer('df.isnull().sum().sum()')
prints:
3.998805362999999
3.7503365439999996
3.689461442999999
so pretty much equivalent