I have a dataset wherein I am trying to determine the number of risk factors per person. So I have the following data:
Person_ID Age Smoker Diabetes
If you want to stick with pandas. You can use the following...
isY = lambda x:int(x=='Y')
countRiskFactors = lambda row: isY(row['Smoker']) + isY(row['Diabetes']) + int(row["Age"]>45)
df['Risk_Factors'] = df.apply(countRiskFactors,axis=1)
How it works
isY - is a stored lambda function that checks if the value of a cell is Y returns 1 if it is otherwise 0 countRiskFactors - adds up the risk factors
the final line uses the apply method, with the paramater key set to 1, which applies the method -first parameter - row wise along the DataFrame and Returns a Series which is appended to the DataFrame.
output of print df
Person_ID Age Smoker Diabetes Risk_Factors
0 1 30 Y N 1
1 2 45 N N 0
2 3 27 N Y 1
3 4 18 Y Y 2
4 5 55 Y Y 3
If you are starting from excel and want to go to the next evolution then I would recommend MS access. It will be a lot easier then learning Panda for python. You should just replace the CountIf() with:
Risk Factor: IIF(Age>45, 1, 0) + IIF(Smoker="Y", 1, 0) + IIF(Diabetes="Y", 1, 0)
I would do this the following way.
(Note that this is simpler if your Smoker and Diabetes column is already boolean (True/False) instead of in strings.)
It might look like this:
df = pd.DataFrame({'Age': [30,45,27,18,55],
'Smoker':['Y','N','N','Y','Y'],
'Diabetes': ['N','N','Y','Y','Y']})
Age Diabetes Smoker
0 30 N Y
1 45 N N
2 27 Y N
3 18 Y Y
4 55 Y Y
#Step 1
risk1 = df.Age > 45
risk2 = df.Smoker == "Y"
risk3 = df.Diabetes == "Y"
risk_df = pd.concat([risk1,risk2,risk3],axis=1)
Age Smoker Diabetes
0 False True False
1 False False False
2 False False True
3 False True True
4 True True True
df['Risk_Factors'] = risk_df.sum(axis=1)
Age Diabetes Smoker Risk_Factors
0 30 N Y 1
1 45 N N 0
2 27 Y N 1
3 18 Y Y 2
4 55 Y Y 3