COUNTIF in pandas python over multiple columns with multiple conditions

前端 未结 3 499
旧时难觅i
旧时难觅i 2020-12-06 06:58

I have a dataset wherein I am trying to determine the number of risk factors per person. So I have the following data:

Person_ID  Age  Smoker  Diabetes
              


        
相关标签:
3条回答
  • 2020-12-06 07:26

    If you want to stick with pandas. You can use the following...

    Solution

    isY = lambda x:int(x=='Y')
    countRiskFactors = lambda row: isY(row['Smoker']) + isY(row['Diabetes']) + int(row["Age"]>45)
    
    df['Risk_Factors'] = df.apply(countRiskFactors,axis=1)
    

    How it works

    isY - is a stored lambda function that checks if the value of a cell is Y returns 1 if it is otherwise 0 countRiskFactors - adds up the risk factors

    the final line uses the apply method, with the paramater key set to 1, which applies the method -first parameter - row wise along the DataFrame and Returns a Series which is appended to the DataFrame.

    output of print df

       Person_ID  Age Smoker Diabetes  Risk_Factors
    0          1   30      Y        N             1
    1          2   45      N        N             0
    2          3   27      N        Y             1
    3          4   18      Y        Y             2
    4          5   55      Y        Y             3
    
    0 讨论(0)
  • 2020-12-06 07:34

    If you are starting from excel and want to go to the next evolution then I would recommend MS access. It will be a lot easier then learning Panda for python. You should just replace the CountIf() with:

    Risk Factor: IIF(Age>45, 1, 0) + IIF(Smoker="Y", 1, 0) + IIF(Diabetes="Y", 1, 0)

    0 讨论(0)
  • 2020-12-06 07:38

    I would do this the following way.

    1. For each column, create a new boolean series using the column's condition
    2. Add those series row-wise

    (Note that this is simpler if your Smoker and Diabetes column is already boolean (True/False) instead of in strings.)

    It might look like this:

    df = pd.DataFrame({'Age': [30,45,27,18,55],
                       'Smoker':['Y','N','N','Y','Y'],
                       'Diabetes': ['N','N','Y','Y','Y']})
    
       Age Diabetes Smoker
    0   30        N      Y
    1   45        N      N
    2   27        Y      N
    3   18        Y      Y
    4   55        Y      Y
    
    #Step 1
    risk1 = df.Age > 45
    risk2 = df.Smoker == "Y"
    risk3 = df.Diabetes == "Y"
    risk_df = pd.concat([risk1,risk2,risk3],axis=1)
    
         Age Smoker Diabetes
    0  False   True    False
    1  False  False    False
    2  False  False     True
    3  False   True     True
    4   True   True     True
    
    df['Risk_Factors'] = risk_df.sum(axis=1)
    
       Age Diabetes Smoker  Risk_Factors
    0   30        N      Y             1
    1   45        N      N             0
    2   27        Y      N             1
    3   18        Y      Y             2
    4   55        Y      Y             3
    
    0 讨论(0)
提交回复
热议问题