SAS: Hidden retain statement inside set statement?

前端 未结 1 1676
滥情空心
滥情空心 2021-01-18 14:35

Consider the following example:

/* Create two not too interesting datasets: */
Data ones (keep = A);
Do i = 1 to 3;
A = 1;
output;
End;
run;

Data numbers;
D         


        
相关标签:
1条回答
  • 2021-01-18 14:59

    Yes, any variable that is defined in a SET, MERGE, or UPDATE statement is automatically retained (not set to missing at the top of the data step loop). You can effectively ignore that with

    output;
    call missing(of <list of variables to clear out>);
    run;
    

    at the end of your data step.

    This is how MERGE works for many-to-one merges, by the way, and the reason that many-to-many merges don't usually work the way you want them to.


    The difference between the 'together' and the 'separate' cases is that in the separate case, you have two data sets with different variables. If you are running this in interactive mode, ie SAS Program Editor or Enhanced Editor (not EG or batch mode), you can use the data step debugger to see this a little more clearly. You would see the following:

    At the end of the last row of the ones dataset:

    i A B
    3 1 .
    

    Notice B exists, but is missing. Then it goes back to the top of the data step loop. All three variables are left alone since they're all from the data sets. Then it attempts to read from ones one more time, which generates:

    i A B
    . . .
    

    Then it realizes it cannot read from ones, and starts to read from numbers. At the end of the first row of the numbers dataset:

    i A B
    . . 1
    

    Then it goes to the top, again changes nothing; then it reads in a 2 for B.

    i A B
    . . 2
    

    Then it sets A to 2, per your program:

    i A B
    . 2 2
    

    Then it returns to the start of the data step loop again.

    i A B
    . 2 2
    

    Then it reads in B=3:

    i A B
    . 2 3
    

    Then it continues looping, for B=4, 5.

    Now, compare that to the single dataset. It will be nearly the same (with a small difference at the switch between datasets that does not yield a different result). Now we go to the step where A=2 B=2:

    i A B
    . 2 2
    

    Now when the data step reads in the next row, it has all three variables on it. So it yields:

    i A B
    . . 3
    

    Since it read in A=. from the row, it is setting it to missing. In the one-data-step version, it didn't have a value for A to read in, so it didn't replace the 2 with missing.

    0 讨论(0)
提交回复
热议问题