问题
I have over 200k small datasets with the same variables (n<1000 and usually n<100) that I want to concatenate into a master dataset. I have tried using a macro that uses a data step to just iterate through all of the new datasets and concatenate with the master with "set master new:", but this is taking a really long time. Also, if I try to run at the same time, the call execute data step says that I am out of memory on a huge server box. For reference, all of the small datasets together are just over 5 Gigs. Any suggestions would be greatly appreciated. Here is what I have so far:
%macro catDat(name, nbr) ;
/*call in new dataset */
data new ;
set libin.&name ;
run ;
/* reorder names */
proc sql noprint;
create table new as
select var1, var2, var3
from new;
quit;
%if &nbr = 1 %then %do ;
data master;
set new;
run;
%end;
%if &nbr > 1 %then %do ;
data master;
set master new ;
run;
%end ;
%mend;
/* concatenate datasets */
data runthis ;
set datasetNames ;
call execute('%catdat('||datasetname||','||_n_||')');
run;
Resolved: see Bob's comments below.
回答1:
Try using PROC APPEND
instead of your "new" dataset; that will be much, much faster:
%macro DOIT;
proc sql noprint;
select count(*) into : num_recs
from datasetNames;
quit;
%do i=1 %to &num_recs;
data _null_;
i = &i;
set datasetNames point=i;
call symput('ds_name',datasetname);
stop;
run; /* UPDATE: added this line */
%if &i = 1 %then %do;
/* Initialize MASTER with variables in the order you wish */
data master(keep=var1 var2 var3);
retain var1 var2 var3;
set libin.&ds_name;
stop;
run;
%end;
proc append base=master data=libin.&ds_name(keep=var1 var2 var3);
run;
%end;
%mend DOIT;
PROC APPEND
will add each dataset into your new "master" without rebuilding it each time as you are doing now.
This also avoids using CALL EXECUTE
, removing that memory issue you were running into (caused by generating so much code into the execution stack).
来源:https://stackoverflow.com/questions/13442882/efficiently-concatenate-many-sas-datasets