Note: I changed the example from when I first posted. My first example was too simplified to capture the real problem.
I have two data frames which are sorted differently in one column. I want to match one column and then merge in the value from the second column. The second column needs to stay in the same order.
So I have this:
state<-c("IA","IA","IA","IL","IL","IL")
value1<-c(1,2,3,4,5,6)
s1<-data.frame(state,value1)
state<-c("IL","IL","IL","IA","IA","IA")
value2<-c(3,4,5,6,7,8)
s2<-data.frame(state,value2)
s1
s2
which returns this:
> s1
state value1
1 IA 1
2 IA 2
3 IA 3
4 IL 4
5 IL 5
6 IL 6
> s2
state value2
1 IL 3
2 IL 4
3 IL 5
4 IA 6
5 IA 7
6 IA 8
and I want this:
state value1 value2
1 IA 1 6
2 IA 2 7
3 IA 3 8
4 IL 4 3
5 IL 5 4
6 IL 6 5
I'm about to drive myself silly trying to solve this. Seems like it should be a simple subscript problem.
There are several ways to do this (it is R, after all) but I think the most clear is creating an index. We need a function that creates a sequential index (starting at one and ending with the number of observations).
seq_len(3)
> [1] 1 2 3
But we need to calculate this index within each grouping variable (state). For this we can use R's ave
function. It takes a numeric as the first argument, then the grouping factors, and finally the function to be applied in each group.
s1$index <- with(s1,ave(value1,state,FUN=seq_len))
s2$index <- with(s2,ave(value2,state,FUN=seq_len))
(Note the use of with
, which tells R to search for the variables within the environment/dataframe. This is better practice than using s1$value1, s2$value2, etc.)
Now we can simply merge (join) the two data frames (by the variables present in the both data frames: state and index).
merge(s1,s2)
which gives
state index value1 value2
1 IA 1 1 6
2 IA 2 2 7
3 IA 3 3 8
4 IL 1 4 3
5 IL 2 5 4
6 IL 3 6 5
For this to work, there should be the same number of observations by state in each of the data frames.
[Edit: commented the code for clarity.] [Edit: Used seq_len instead of creating a new function as suggested by hadley.]
NOTE: Check the 5th comment on the answer above. Solution should be
s1$index <- with(s1,ave(value1,state,FUN=seq_along))
s2$index <- with(s2,ave(value2,state,FUN=seq_along))
Tested and working.
来源:https://stackoverflow.com/questions/1301759/mixed-merge-in-r-subscript-solution