问题
I want to calculate the sum by var1. Could you use two methods to do the calculation. SQL and data step with if first.var1.
data have;
input var1 var2$ var3;
datalines;
1 a 3
1 a 4
1 a 3
2 b 5
2 b 3
3 c 1
;
run;
data want;
input var1 var2 $ var3 sum_by_var1;
datalines;
1 a 3 10
1 a 4 10
1 a 3 10
2 b 5 9
2 b 3 9
3 c 1 9
;
run;
my two ways:
The code below works on this small data set, but I wonder if it will work on large data sets because it is hard to check the results.
proc sql;
create table new as
select
*
,sum(var3) as sum_by_var1
from have
group by var1
order by var1
;
run;
The code below doesn't work
data new2;
set have;
by var1;
if first.var1 then
by_var1 + var3;
run;
回答1:
To fix your calculation using the data step you need to use:
Retain
keyword to calculate the sum by var1,Output
keyword to output only once sum by var1 is calculated; that's when reaching the last observation for var1,- If you want the segregated data you have to join back to your Have table.
Fix:
data new2;
set have;
by var1;
retain sum_by_var1;
if first.var1 then do; sum_by_var1=0; end;
sum_by_var1 + var3;
if last.var1 then do; output; end;
run;
Output:
var1=1 var2=a var3=3 sum_by_var1=10
var1=2 var2=b var3=3 sum_by_var1=8
var1=3 var2=c var3=1 sum_by_var1=1
回答2:
Here's two fully worked examples that illustrate how to do this with a grouping variable. One method uses SQL and the second uses PROC MEANS. In this example I'm doing an average, but you can replace the word mean with SUM and get your desired result.
******************************************************;
*Add average value to a dataset;
*Solution 1 - PROC MEANS + Data step;
******************************************************;
proc means data=sashelp.class noprint;
output out=avg_values mean(height)=avg_height;
run;
data class_data;
set sashelp.class;
if _n_=1 then
set avg_values;
run;
proc print data=class;
run;
*Solution 2 - PROC SQL - note the warning in the log;
PROC SQL;
Create table class_sql as
select *, mean(height) as avg_height
from sashelp.class;
quit;
******************************************************;
*Add average value to a dataset - with grouping variables;
*Solution 1 - PROC MEANS + Data step;
******************************************************;
proc means data=sashelp.class noprint nway;
class sex;
output out=avg_values mean(height)=avg_height;
run;
*sort data before merge;
proc sort data=sashelp.class out=class;
by sex;
run;
data class_data;
merge class avg_values;
by sex;
run;
proc print data=class_data;
run;
*Solution 2 - PROC SQL - note the warning in the log;
PROC SQL;
Create table class_sql as
select *, mean(height) as avg_height
from sashelp.class
group by sex;
quit;
回答3:
When the sum operator (+) is used to accumulate you will need to reset it to missing at the start of each group. Additionally, since you want the group sum to be associated with each row in the group you will need to compute that sum before associating it. The double DOW loop is a common solution -- the DOW approach places the SET
and BY
statement inside a DO
loop. In this case the double means there is one loop for computing a statistic over a group and a second loop for output.
data want;
* loop over all rows in group to compute the sum;
var3_sum_over_var1 = .;
do _n_ = 1 by 1 until (last.var1);
set have;
by var1;
var3_sum_over_var1 + var3;
end;
* associate var3_sum_over_var1 with each row as it is output;
do _n_ = 1 to _n_;
set have;
OUTPUT;
end;
run;
Note: DO loop start and stop bounds are computed once at the start of the loop and can not be changed while the loop is iterating -- thus do _n_ = 1 to _n_;
works as desired.
来源:https://stackoverflow.com/questions/50032292/sas-sum-by-group