Duration of state in SAS

China☆狼群 提交于 2019-12-12 01:28:19

问题


I have a question concerning SAS and the analysis of the duration of a certain state of a variable. I want to find how long each individual in my dataset stays in state a continiously until state b occurs. If state c occurs after state a the duration should be set to zero. Note that I would also set the duration to zero if pre_period is in state a, but if I get another state a afterwards that should be counted.

The data looks kindof like this:

    pre_period    week1 week2 week3 week4 week5 week6 week7 ...
id1 b             b     a     a     a     b     c     c     ...
id2 a             a     a     a     b     a     b     b     ...
id3 b             b     a     a     b     a     a     b     ...
id4 c             c     c     a     a     a     a     a     ...
id5 a             b     a     b     b     a     a     b     ...
id6 b             a     a     a     a     a     a     a     ...

The sample set in sas code:

data work.sample_data;
input id $ pre_period $  (week1-week7) ($);
datalines;
id1 b b a a a b c c
id2 a a a a b a b b
id3 b b a a b a a b
id4 c c c a a a a a
id5 a b a b b a a b
id6 b a a a a a a a
;

So for id1 that should give me a duration of 3, for id2 1, for id3 3 and 1, for id4 5 for id5 1 and 2 and for id6 7.

So that the output should look somewhat like this:

    dur1 dur2 dur3 dur4 ...
id1 3    .    .    .    ...
id2 1    .    .    .    ...
id3 3    1    .    .    ...
id4 5    .    .    .    ...
id5 1    2    .    .    ...
id6 7    .    .    .    ...

I am a beginner in SAS and did not found a way to solve this problem. Note that the dataset contains several thousand rows and roughly a thousand columns, so that for one individual I might have several intervals of state a which I all want to capture (therefore several duration variables in the output).

I am grateful for any advice. Thanks!


回答1:


In these cases, it could be wise and think in terms of a finite state machine. In this way, it is quite easy to extend the state machine later on if your requirements changes.

The duration is valid in three cases (including the inplicit one given from your result set):

  • The continious duration of state a should be counted if
    • it ends with state b,
    • it is still in state a when the data set ends,
    • and as long it does not start in the first week when the pre period state is a.

First of all, we have to take care of pre period requirement, we can call this state for pre_period_locked_state:

    do week = 1 to last_week;
        if current_state = pre_period_locked_state then do;
            if 'a' not = pre_period or 'a' not = week_state then do;
            current_state = duration_state;
        end;

The next thing is disect is when the state is not a, here called no_duration_state:

        if current_state = no_duration_state then do;
            if 'a' = week_state then do;
                 current_state = duration_state;
            end;
        end;

This is our idle state and will only change when a new duration starts. This next state is named duration_state and defined as:

        if current_state = duration_state then do;
            if 'a' = week_state then do;
                duration_count = duration_count + 1;
            end;
            if ('a' not = week_state or week = last_week) and 0 < duration_count then do;
                current_state = dispatch_state;
             end;
        end;

The first part is probably quite self declaring, the duration counter. The second part takes care of when a duration ends.

Now on to the dispatch_state:

        if current_state = dispatch_state then do;
            if 'b' = week_state or 'a' = week_state and week = last_week then do;
                duration{duration_index} = duration_count;
                duration_index = duration_index + 1;
            end;
            duration_count = 0; 
            current_state = no_duration_state;
        end;

This takes care of the indexing of the output table and will also make sure that only store valid durations.

I added id7 below, since the sample data did not have any duration that ended with a status other than b.

data work.sample_data;
input id $ pre_period $  (week1-week7) ($);
datalines;
id1 b b a a a b c c
id2 a a a a b a b b
id3 b b a a b a a b
id4 c c c a a a a a
id5 a b a b b a a b
id6 b a a a a a a a
id7 b a a c a a a a
;

The full sas code state machine:

 data work.duration_fsm;
    set work.sample_data;
    array weeks{*} week1-week7;
    array duration{*} dur1-dur7;

    *states;
    initial_reset_state = 'initial_reset_state';
    pre_period_locked_state = 'pre_period_locked_state';
    duration_state = 'duration_state';
    no_duration_state = 'no_duration_state';
    dispatch_state = 'dispatch_state';
    length current_state $ 50;

    *initial values;
    current_state = initial_reset_state;
    last_week = dim(weeks);

    keep id dur1-dur7;

    do week = 1 to last_week;
        if current_state = initial_reset_state then do;
            duration_count = 0;
            duration_index = 1;
        current_state = pre_period_locked_state;
        end;
        week_state = weeks{week};
        if current_state = pre_period_locked_state then do;
            if 'a' not = pre_period and 'a' = week_state then do;
                    current_state = duration_state;
                end;
            else if 'a' = pre_period and 'a' not = week_state then do;
                current_state = no_duration_state;
            end;
        end;
        if current_state = no_duration_state then do;
            if 'a' = week_state then do;
                 current_state = duration_state;
            end;
        end;
        if current_state = duration_state then do;
            if 'a' = week_state then do;
                duration_count = duration_count + 1;
            end;
            if ('a' not = week_state or week = last_week) and 0 < duration_count then do;
                current_state = dispatch_state;
             end;
        end;
        if current_state = dispatch_state then do;
            if 'b' = week_state or  'a' = week_state and week = last_week then do;
                duration{duration_index} = duration_count;
                duration_index = duration_index + 1;
            end;
            duration_count = 0; 
            current_state = no_duration_state;
        end;
    end;
    run;

This will output work.duration_fsm:

+-----+------+------+------+------+------+------+------+
| id  | dur1 | dur2 | dur3 | dur4 | dur5 | dur6 | dur7 |
+-----+------+------+------+------+------+------+------+
| id1 |    3 |      |      |      |      |      |      |
| id2 |    1 |      |      |      |      |      |      |
| id3 |    2 |    2 |      |      |      |      |      |
| id4 |    5 |      |      |      |      |      |      |
| id5 |    1 |    2 |      |      |      |      |      |
| id6 |    7 |      |      |      |      |      |      |
| id7 |    4 |      |      |      |      |      |      |
+-----+------+------+------+------+------+------+------+


来源:https://stackoverflow.com/questions/29165340/duration-of-state-in-sas

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!