I have dataframe
site1 time1 site2 time2 site3 time3 site4 time4 site5 time5 ... time6 site7 time7 site8 time8 site9 time
I'd do this with just count
:
train_df[sites].count(axis=1)
count
specifically counts non-null values. The issue with your current implementation is that notnull
yields boolean values, and bool
s are certainly not-null, meaning they are always counted.
df
one two three four five
a -0.166778 0.501113 -0.355322 bar False
b NaN NaN NaN NaN NaN
c -0.337890 0.580967 0.983801 bar False
d NaN NaN NaN NaN NaN
e 0.057802 0.761948 -0.712964 bar True
f -0.443160 -0.974602 1.047704 bar False
g NaN NaN NaN NaN NaN
h -0.717852 -1.053898 -0.019369 bar False
df.count(axis=1)
a 5
b 0
c 5
d 0
e 5
f 5
g 0
h 5
dtype: int64
And...
df.notnull().count(axis=1)
a 5
b 5
c 5
d 5
e 5
f 5
g 5
h 5
dtype: int64
Also trading count(axis=1)
for sum()
should do the trick
train_df[sites].notnull().sum()