I am trying to calculate prevalence in sql. kind of stuck in writing the code. I want to make automative code.
I have check that I have 1453477 of sample size and numbe
I am pretty sure that the logic that you want is something like this:
select avg( (condition_id = 12345)::int )
from disease;
Your version doesn't have the sample size, because you are filtering out people without the condition.
If you have duplicate people in the data, then this is a little more complicated. One method is:
select (count(distinct person_id) filter (where condition_id = 12345)::numeric /
count(distinct person_id
)
from disease;
In your current query you count the number of rows in the disease table, once using the column condition_id, once using the column person_id. But the number of rows is the same - this is why you get 1 as a result.
I think you need to find the number of different values for these columns. This can be done using count distinct:
select (COUNT(DISTINCT condition_id)/COUNT(DISTINCT person_id)) as prevalence
from disease
where condition_id=12345;
You can cast by
count(...)/count(...)::numeric(6,4)
or
count(...)/count(...)::decimal
as two options.
Important point is apply cast to denominator
or numerator
part(in this case denominator
), Do not apply to division as
(count(...)/count(...))::numeric(6,4)
which again results an integer.