问题
I have the following data set:
Date ID Company
Jan05 1 Coca-Cola
Jan05 2 Coca-Cola
Jan05 3 Coca-Cola
Jan05 4 Apple
Jan05 5 Apple
Jan05 6 Apple
Jan05 7 Microsoft
Feb05 1 McDonald
Feb05 2 McDonald
Feb05 3 McDonald
Feb05 4 McDonald
Feb05 5 McDonald
Feb05 6 Microsoft
.
.
.
Jan06 1 Apple
Jan06 2 Apple
Jan06 3 Apple
Jan06 4 Apple
Jan06 5 Apple
Jan06 6 Apple
Jan06 7 Apple
Feb06 1 McDonald
Feb06 2 McDonald
Feb06 3 McDonald
Feb06 4 McDonald
Feb06 5 McDonald
Feb06 6 Lenova
Feb06 7 Lenova
.
.
Jan07 1 Apple
Jan07 2 Apple
Jan07 3 Apple
Jan07 4 Microsoft
Jan07 5 Lenovo
Jan07 6 Apple
Jan07 7 Apple
Feb07 1 TJmax
Feb07 2 TJMax
Feb07 3 TJMax
Feb07 4 TJMax
Feb07 5 TJMax
Feb07 6 TJMax
Feb07 7 TJMax
.
.
.
.
until July15
What I want to do are the following: 1: Compare January 05 with January 06, then January 06 with January 07...February 05 with February 06, February 06 with February 07....so on for each month get compute a median for ID when the same companies are present for both dates. 2: I don't want a new dataset each time I compute a median for ID. I merely want to make sure that both companies are present for lets say in Jan05 and Jan06, then compute a median for ID.
Whats the best way to do this in SAS?
My end result will look like this:
Date Median_ID
Jan05 2
Jan06 4
Jan06 4
Jan07 3
Feb05 3
Feb06 3
Feb06 0
Feb07 0
As you can see from the result: In Jan05 and 06, the only company that matches is Apple. In Jan06 and Jan07, the only company that matches again is Apple. So we take the median of ID for the time the companies match.
回答1:
It isn't clear how you've calculated the end results from your sample data - it would be easier to follow your explanation if you included all the intermediate steps for one month, e.g. Jan05
. However, this seems like something that you could approach with some SQL similar to the following:
data have;
input Date monyy5. ID Company $32.;
format Date monyy5.;
cards;
Jan05 1 Coca-Cola
Jan05 2 Coca-Cola
Jan05 3 Coca-Cola
Jan05 4 Apple
Jan05 5 Apple
Jan05 6 Apple
Jan05 7 Microsoft
Feb05 1 McDonald
Feb05 2 McDonald
Feb05 3 McDonald
Feb05 4 McDonald
Feb05 5 McDonald
Feb05 6 Microsoft
Jan06 1 Apple
Jan06 2 Apple
Jan06 3 Apple
Jan06 4 Apple
Jan06 5 Apple
Jan06 6 Apple
Jan06 7 Apple
Feb06 1 McDonald
Feb06 2 McDonald
Feb06 3 McDonald
Feb06 4 McDonald
Feb06 5 McDonald
Feb06 6 Lenova
Feb06 7 Lenova
Jan07 1 Apple
Jan07 2 Apple
Jan07 3 Apple
Jan07 4 Microsoft
Jan07 5 Lenovo
Jan07 6 Apple
Jan07 7 Apple
Feb07 1 TJmax
Feb07 2 TJMax
Feb07 3 TJMax
Feb07 4 TJMax
Feb07 5 TJMax
Feb07 6 TJMax
Feb07 7 TJMax
;
run;
proc sql;
create table want as
select a.date, median(a.ID) as Median_ID from have a inner join have b
on month(a.date)= month(b.date)
and year(a.date) = year(b.date) - 1
and a.ID = b.ID
and a.company = b.company
group by a.date
;
quit;
来源:https://stackoverflow.com/questions/31661121/compare-and-keep-character-observations-that-are-the-same