Using the sqldf library from R to write a SELECT statement

心已入冬 提交于 2019-12-24 18:41:43

问题


I have the data:

library(earth)
data(etitanic)

I also need to use the library

library(sqldf)

My goal is to write a SELECT statement that returns the survival rates by gender. My statement must include the etitanic data frame (treated like a database table).

I do not know SQL very well but from my understanding I have to write something like

SELECT survival, gender
FROM   etitanic 

I am not sure how to achieve this in R, any suggestions would be helpful. I tried the following:

df = sqldf('select count(*) total from etitanic where survived group by sex')
df2 = t(df)
colnames(df2)=c('Female','Male')

which gave me this:

      Female Male
total    292  135

But I believe I need the percentages.


回答1:


Is SQL returning fractions a requirement? Why not simply let SQL return counts and then calculate fractions in R:

df <- sqldf('select count(*) Total from etitanic where survived group by sex');
df / sum(df);
#      Total
#1 0.6838407
#2 0.3161593



回答2:


SQL does not let you compute percentages directly. What you need to do is to compute number of survived people and total people, and then divide the two. The query looks like this:

select
    sex 
  , sum(case when survived then 1 else 0 end) / count(1) as survival_pct
from etitanic
group by sex
;



回答3:


Use avg like this:

sqldf('select sex, 100 * avg(survived) [%Survived] from etitanic group by sex')

giving:

     sex %Survived
1 female  75.25773
2   male  20.51672

To double check these numbers note from with(etitanic, table(sex, survived)) that 292 females survived and 96 did not so the survival rate is 100 * 292 / (292 + 96) = 75.25773% and similarly for males we get 100 * 135 / (135 + 523) = 20.51672%.



来源:https://stackoverflow.com/questions/48553697/using-the-sqldf-library-from-r-to-write-a-select-statement

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!