问题
Hi I have two data frames:
df1 = data.frame(PersonId1=c(1,2,3,4,5,6,7,8,9,10,1),PersonId2=c(11,12,13,14,15,16,17,18,19,20,11),
Played_together = c(1,0,0,1,1,0,0,0,1,0,1),
Event=c(1,1,1,1,2,2,2,2,2,2,2),
Utility=c(20,-2,-5,10,30,2,1,.5,50,-1,60))
df2 = data.frame(PersonId1=c(11,15,9,1),PersonId2=c(1,5,19,11),
Played_together = c(1,1,1,1),
Event=c(1,2,2,2))
Where df1 looks like this:
PersonId1 PersonId2 Played_together Event Utility
1 1 11 1 1 20.0
2 2 12 0 1 -2.0
3 3 13 0 1 -5.0
4 4 14 1 1 10.0
5 5 15 1 2 30.0
6 6 16 0 2 2.0
7 7 17 0 2 1.0
8 8 18 0 2 0.5
9 9 19 1 2 50.0
10 10 20 0 2 -1.0
11 1 11 1 2 60.0
and df2 looks like this:
PersonId1 PersonId2 Played_together Event
1 11 1 1 1
2 15 5 1 2
3 9 19 1 2
4 1 11 1 2
Note that df2 is not simply df1$played_together==1. (for eg PlayerId1 = 4 and PlayerId2=14 is not present in df2.
Also note that although df2 is a subset of df1, the order in which individuals appear in df2 is random. For example in df1 in row 1, we see playerid1 =1 and playerId2 = 11 for event 1. But in df2 in row 1, we see playerid1 =11 and playerId2 = 1 for event 1. These two cases are exactly same and I want to look up the values of Utility from df1 to df2. The merge has to take place for each event. The final output should look like this:
PersonId1 PersonId2 Played_together Event Utility
1 11 1 1 1 20
2 15 5 1 2 30
3 9 19 1 2 50
4 1 11 1 2 60
I know that a merge function exists in R, but I do not know what to do when the lookup ids can appear as random. Would appreciate it if someone could help me out a little bit. Thanks in advance.
回答1:
Here is what I have for you:
library(dplyr)
rbind(left_join(df2, df1,
by = c("PersonId2" = "PersonId1", "PersonId1" = "PersonId2",
"Played_together" = "Played_together", "Event" = "Event")),
left_join(df2, df1,
by = c("PersonId1" = "PersonId1", "PersonId2" = "PersonId2",
"Played_together" = "Played_together", "Event" = "Event"))) %>%
filter(!is.na(Utility))
Basically it seems like your data sometimes has personid flipped. We can bind two joins together and then filter out those rows that have a utility that is NA
.
Your output looks like this:
PersonId1 PersonId2 Played_together Event Utility
1 11 1 1 1 20
2 15 5 1 2 30
3 9 19 1 2 50
4 1 11 1 2 60
回答2:
A solution is to create a "Team" column using combination of PersonId1
and PersonId2
such a way that it makes min(PersonId) : max(PersonId)
for two teams. Now, join df1
and df2
on Team
and Event
to get the desired data.
library(dplyr)
df2 %>% rowwise() %>%
mutate(Team = paste0(min(PersonId1,PersonId2), ":",max(PersonId1,PersonId2))) %>%
inner_join(df1 %>% rowwise() %>%
mutate(Team =
paste0(min(PersonId1,PersonId2), ":",max(PersonId1,PersonId2))),
by = c("Team", "Event")) %>%
select(PersonId1 = PersonId1.x, PersonId2 = PersonId2.x,
Played_together = Played_together.x, Event, Utility) %>%
as.data.frame()
# PersonId1 PersonId2 Played_together Event Utility
# 1 11 1 1 1 20
# 2 15 5 1 2 30
# 3 9 19 1 2 50
# 4 1 11 1 2 60
来源:https://stackoverflow.com/questions/51088367/r-match-values-from-2-dataframes-based-on-multiple-condtions-when-the-order-o