I have the data:
I also need to use the library
My goal is to write a SELECT statement that returns the survival rates by gender. My statement must include the etitanic data frame (treated like a database table).
I do not know SQL very well but from my understanding I have to write something like
SELECT survival, gender
FROM etitanic
I am not sure how to achieve this in R, any suggestions would be helpful. I tried the following:
df = sqldf('select count(*) total from etitanic where survived group by sex')
df2 = t(df)
which gave me this:
Female Male
total 292 135
But I believe I need the percentages.
Is SQL returning fractions a requirement? Why not simply let SQL return counts and then calculate fractions in R:
df <- sqldf('select count(*) Total from etitanic where survived group by sex');
df / sum(df);
# Total
#1 0.6838407
#2 0.3161593
SQL does not let you compute percentages directly. What you need to do is to compute number of survived people and total people, and then divide the two. The query looks like this:
, sum(case when survived then 1 else 0 end) / count(1) as survival_pct
from etitanic
group by sex
Use avg
like this:
sqldf('select sex, 100 * avg(survived) [%Survived] from etitanic group by sex')
sex %Survived
1 female 75.25773
2 male 20.51672
To double check these numbers note from with(etitanic, table(sex, survived))
that 292 females survived and 96 did not so the survival rate is 100 * 292 / (292 + 96) = 75.25773% and similarly for males we get 100 * 135 / (135 + 523) = 20.51672%.