Consider the following example:
/* Create two not too interesting datasets: */
Data ones (keep = A);
Do i = 1 to 3;
A = 1;
output;
End;
run;
Data numbers;
D
Yes, any variable that is defined in a SET
, MERGE
, or UPDATE
statement is automatically retained (not set to missing at the top of the data step loop). You can effectively ignore that with
output;
call missing(of <list of variables to clear out>);
run;
at the end of your data step.
This is how MERGE
works for many-to-one merges, by the way, and the reason that many-to-many merges don't usually work the way you want them to.
The difference between the 'together' and the 'separate' cases is that in the separate case, you have two data sets with different variables. If you are running this in interactive mode, ie SAS Program Editor or Enhanced Editor (not EG or batch mode), you can use the data step debugger to see this a little more clearly. You would see the following:
At the end of the last row of the ones
dataset:
i A B
3 1 .
Notice B exists, but is missing. Then it goes back to the top of the data step loop. All three variables are left alone since they're all from the data sets. Then it attempts to read from ones
one more time, which generates:
i A B
. . .
Then it realizes it cannot read from ones
, and starts to read from numbers
. At the end of the first row of the numbers
dataset:
i A B
. . 1
Then it goes to the top, again changes nothing; then it reads in a 2 for B.
i A B
. . 2
Then it sets A to 2, per your program:
i A B
. 2 2
Then it returns to the start of the data step loop again.
i A B
. 2 2
Then it reads in B=3:
i A B
. 2 3
Then it continues looping, for B=4, 5.
Now, compare that to the single dataset. It will be nearly the same (with a small difference at the switch between datasets that does not yield a different result). Now we go to the step where A=2 B=2:
i A B
. 2 2
Now when the data step reads in the next row, it has all three variables on it. So it yields:
i A B
. . 3
Since it read in A=. from the row, it is setting it to missing. In the one-data-step version, it didn't have a value for A to read in, so it didn't replace the 2 with missing.