Split large SAS dataset into smaller datasets

前端未结

关注

 6  487

长情又很酷 2021-01-13 01:10

I need some assistance with splitting a large SAS dataset into smaller datasets.

Each month I\'ll have a dataset containing a few million records. This number will

6条回答

攒了一身酷 (楼主)

2021-01-13 01:45
A more efficient option, if you have room in memory to store one of the smaller datasets, is a hash solution. Here's an example using basically what you're describing in the question:
```
data in_data;
  do recid = 1 to 1.000001e7;
    datavar = 1;
    output;
  end;
run;


data _null_;
  if 0 then set in_data;
  declare hash h_out();
  h_out.defineKey('_n_');
  h_out.defineData('recid','datavar');
  h_out.defineDone();

  do filenum = 1 by 1 until (eof);
    do _n_ = 1 to 250000 until (eof);
      set in_data end=eof;
      h_out.add();
    end;
    h_out.output(dataset:cats('file_',filenum));
    h_out.clear();
  end;
  stop;
run;
```
We define a hash object with the appropriate parameters, and simply tell it to output every 250k records, and clear it. We could do a hash-of-hashes here also, particularly if it weren't just "Every 250k records" but some other criteria drove things, but then you'd have to fit all of the records in memory, not just 250k of them.

Note also that we could do this without specifying the variables explicitly, but it requires having a useful ID on the dataset:
```
data _null_;
  if 0 then set in_data;
  declare hash h_out(dataset:'in_data(obs=0)');
  h_out.defineKey('recid');
  h_out.defineData(all:'y');
  h_out.defineDone();

  do filenum = 1 by 1 until (eof);
    do _n_ = 1 to 250000 until (eof);
      set in_data end=eof;
      h_out.add();
    end;
    h_out.output(dataset:cats('file_',filenum));
    h_out.clear();
  end;
  stop;
run;
```
Since we can't use _n_ anymore for the hash ID due to using the dataset option on the constructor (necessary for the all:'y' functionality), we have to have a record ID. Hopefully there is such a variable, or one could be added with a view.
0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...