EDIT: Here is a more complete set of code that shows exactly what\'s going on per the answer below.
libname output \'/data/files/jeff\'
%let DateStart = \'01Jan
You imply an assumption that the 90k records in your first query are all unique id
s. Is that definite?
I ask because the implication from your second query is that they're not unique.
- One id
can have multiple values over time, and have different somevalue
s
If the id
s are not unique in the first dataset, you need to GROUP BY id
or use DISTINCT
, in the first query.
Imagine that the 90k rows consists of 30k unique id
s, and so have an average of 3 rows per id
.
And then imagine those 30k unique id
s actually have 9 records in your time window, including rows where somevalue <> x
.
You will then get 3x9 records back per id
.
And as those two numbers grow, the number of records in your second query grows geometrically.
Alternative Query
If that's not the problem, an alternative query (which is not ideal, but possible) would be...
SELECT
bigTable.id,
SUM(bigTable.value) AS total
FROM
bigTable
WHERE
bigTable.date BETWEEN a AND b
GROUP BY
bigTable.id
HAVING
MAX(CASE WHEN bigTable.somevalue = x THEN 1 ELSE 0 END) = 1