Writing Efficient Queries in SAS Using Proc sql with Teradata

后端 未结 5 526
[愿得一人]
[愿得一人] 2021-02-06 11:58

EDIT: Here is a more complete set of code that shows exactly what\'s going on per the answer below.

libname output \'/data/files/jeff\'
%let DateStart = \'01Jan         


        
5条回答
  •  一个人的身影
    2021-02-06 12:32

    You imply an assumption that the 90k records in your first query are all unique ids. Is that definite?

    I ask because the implication from your second query is that they're not unique.
    - One id can have multiple values over time, and have different somevalues

    If the ids are not unique in the first dataset, you need to GROUP BY id or use DISTINCT, in the first query.

    Imagine that the 90k rows consists of 30k unique ids, and so have an average of 3 rows per id.

    And then imagine those 30k unique ids actually have 9 records in your time window, including rows where somevalue <> x.

    You will then get 3x9 records back per id.

    And as those two numbers grow, the number of records in your second query grows geometrically.


    Alternative Query

    If that's not the problem, an alternative query (which is not ideal, but possible) would be...

    SELECT
      bigTable.id,
      SUM(bigTable.value) AS total
    FROM
      bigTable
    WHERE
      bigTable.date BETWEEN a AND b
    GROUP BY
      bigTable.id
    HAVING
      MAX(CASE WHEN bigTable.somevalue = x THEN 1 ELSE 0 END) = 1
    

提交回复
热议问题