How to fill null values in a Dataset using python that matches with two other columns?

会有一股神秘感。 提交于 2019-12-02 08:46:39

问题


I have a titanic Dataset. It has attributes and i was working manly on 1.Age 2.Embark ( from which port passengers embarked..There are total 3 ports..S,Q and C) 3.Survived ( 0 for did not survived,1 for survived)

I was filtering the useless data. Then i needed to fill Null values present in Age. So i counted how many passengers survived and didn't survived in each Embark i.e. S,Q and C

I find out the mean age of Passengers who survived and who did not survived after embarking from each S,Q and C port. But now i have no idea how to fill these 6 values ( 3 for survived from each S,Q and C and 3 for who did not survived from each S,Q and C...So total 6) in the original titanic Age column. If i do simply titanic.Age.fillna('With one of the six values') it will fill All the Null values of Age with that one value which i don't want.

After giving some time,i tried this.

titanic[titanic.Survived==1][titanic.Embarked=='S'].Age.fillna(SurvivedS.Age.mean(),inplace=True)
titanic[titanic.Survived==1][titanic.Embarked=='Q'].Age.fillna(SurvivedQ.Age.mean(),inplace=True)
titanic[titanic.Survived==1][titanic.Embarked=='C'].Age.fillna(SurvivedC.Age.mean(),inplace=True)
titanic[titanic.Survived==0][titanic.Embarked=='S'].Age.fillna(DidntSurvivedS.Age.mean(),inplace=True)
titanic[titanic.Survived==0][titanic.Embarked=='Q'].Age.fillna(DidntSurvivedQ.Age.mean(),inplace=True)
titanic[titanic.Survived==0][titanic.Embarked=='C'].Age.fillna(DidntSurvivedC.Age.mean(),inplace=True)

This showed no error but still it doesn't work. Any idea what should i do?


回答1:


I think you need groupby with apply with fillna by mean:

titanic['age'] = titanic.groupby(['survived','embarked'])['age']
                        .apply(lambda x: x.fillna(x.mean()))

import seaborn as sns

titanic = sns.load_dataset('titanic')
#check NaN rows in age
print (titanic[titanic['age'].isnull()].head(10))
    survived  pclass     sex  age  sibsp  parch      fare embarked   class  \
5          0       3    male  NaN      0      0    8.4583        Q   Third   
17         1       2    male  NaN      0      0   13.0000        S  Second   
19         1       3  female  NaN      0      0    7.2250        C   Third   
26         0       3    male  NaN      0      0    7.2250        C   Third   
28         1       3  female  NaN      0      0    7.8792        Q   Third   
29         0       3    male  NaN      0      0    7.8958        S   Third   
31         1       1  female  NaN      1      0  146.5208        C   First   
32         1       3  female  NaN      0      0    7.7500        Q   Third   
36         1       3    male  NaN      0      0    7.2292        C   Third   
42         0       3    male  NaN      0      0    7.8958        C   Third   

      who  adult_male deck  embark_town alive  alone  
5     man        True  NaN   Queenstown    no   True  
17    man        True  NaN  Southampton   yes   True  
19  woman       False  NaN    Cherbourg   yes   True  
26    man        True  NaN    Cherbourg    no   True  
28  woman       False  NaN   Queenstown   yes   True  
29    man        True  NaN  Southampton    no   True  
31  woman       False    B    Cherbourg   yes  False  
32  woman       False  NaN   Queenstown   yes   True  
36    man        True  NaN    Cherbourg   yes   True  
42    man        True  NaN    Cherbourg    no   True 

idx = titanic[titanic['age'].isnull()].index
titanic['age'] = titanic.groupby(['survived','embarked'])['age']
                        .apply(lambda x: x.fillna(x.mean()))

#check if values was replaced
print (titanic.loc[idx].head(10))
    survived  pclass     sex        age  sibsp  parch      fare embarked  \
5          0       3    male  30.325000      0      0    8.4583        Q   
17         1       2    male  28.113184      0      0   13.0000        S   
19         1       3  female  28.973671      0      0    7.2250        C   
26         0       3    male  33.666667      0      0    7.2250        C   
28         1       3  female  22.500000      0      0    7.8792        Q   
29         0       3    male  30.203966      0      0    7.8958        S   
31         1       1  female  28.973671      1      0  146.5208        C   
32         1       3  female  22.500000      0      0    7.7500        Q   
36         1       3    male  28.973671      0      0    7.2292        C   
42         0       3    male  33.666667      0      0    7.8958        C   

     class    who  adult_male deck  embark_town alive  alone  
5    Third    man        True  NaN   Queenstown    no   True  
17  Second    man        True  NaN  Southampton   yes   True  
19   Third  woman       False  NaN    Cherbourg   yes   True  
26   Third    man        True  NaN    Cherbourg    no   True  
28   Third  woman       False  NaN   Queenstown   yes   True  
29   Third    man        True  NaN  Southampton    no   True  
31   First  woman       False    B    Cherbourg   yes  False  
32   Third  woman       False  NaN   Queenstown   yes   True  
36   Third    man        True  NaN    Cherbourg   yes   True  
42   Third    man        True  NaN    Cherbourg    no   True  

#check mean values
print (titanic.groupby(['survived','embarked'])['age'].mean())
survived  embarked
0         C           33.666667
          Q           30.325000
          S           30.203966
1         C           28.973671
          Q           22.500000
          S           28.113184
Name: age, dtype: float64


来源:https://stackoverflow.com/questions/44586346/how-to-fill-null-values-in-a-dataset-using-python-that-matches-with-two-other-co

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!