How to remove duplicated records\observations WITHOUT sorting in SAS?

前端 未结 8 1642
孤独总比滥情好
孤独总比滥情好 2021-02-08 14:21

I wonder if there is a way to unduplicate records WITHOUT sorting?Sometimes, I want to keep original order and just want to remove duplicated records.

I

相关标签:
8条回答
  • 2021-02-08 15:02

    The two examples given in the original post are not identical.

    • distinct in proc sql only removes lines which are fully identical
    • nodupkey in proc sort removes any line where key variables are identical (even if other variables are not identical). You need the option noduprecs to remove fully identical lines.

    If you are only looking for records having common key variables, another solution I could think of would be to create a dataset with only the key variable(s) and find out which one are duplicates and then apply a format on the original data to flag duplicate records. If more than one key variable is present in the dataset, one would need to create a new variable containing the concatenation of all the key variable values - converted to character if needed.

    0 讨论(0)
  • 2021-02-08 15:03

    This is the fastest way I can think of. It requires no sorting.

    data output_data_name;
        set input_data_name (
            sortedby = person_id stay
            keep =
                person_id
                stay
                ... more variables ...);
        by person_id stay;
        if first.stay > 0 then output;
    run;
    
    0 讨论(0)
提交回复
热议问题