I have ~2300 CSV files and colunm 1 variable name is different for each CSV file. I want to merge all files by panelistID (colunm 2) and run frequencies on column 1 to get f
filename mycsv "*.csv";
data mydataset(drop=tmp);
infile mycsv dsd dlm=',' eov=eov;
retain mat_pen_id;
if _n_ = 1 or eov then do; *when using wildcard-concatenated input files, ;
input mat_pen_id $20. tmp $20.; *eov is true for first line of second file.;
eov = 0;
else do; * _n_ =1 is true for first line of first file only;
input mat_pen panelistID;
end;
run;
proc sort data= mydataset;
by panelistID;
run;
proc transpose
data=mydataset out=wide_data;
by panelistID;
id mat_pen_id;
var mat_pen;
run;
proc print data=wide_data;
run;
This will give you a dataset called wide_data like:
obs panelistID mat1_pen1 mat2_pen2 mat3_pen3 etc
1 10075001 0 22 33
Simply use a wildcard on the infile
statement to read in all the files, and the filename=
option to store the current file in a temporary variable _f
, storing it into f
.
Then manipulate f
and var
accordingly.
data big ; length _f f $256. ; infile "*.csv" truncover filename=_f dlm=',' ; f = _f ; input var panellistID ; run ;