问题
I have a panel dataset: panel
and a dataset with a list of events: Events
. For the panel dataset, an equal panelID
shows that two observations belong together.
panelID = c(1:50)
year= c(2001:2010)
country = c("NLD", "GRC", "GBR")
n <- 2
library(data.table)
set.seed(123)
Panel <- data.table(panelID = rep(sample(panelID), each = n),
country = rep(sample(country, length(panelID), replace = T), each = n),
year = c(replicate(length(panelID), sample(year, n))),
some_NA = sample(0:5, 6),
some_NA_factor = sample(0:5, 6),
norm = round(runif(100)/10,2),
Income = round(rnorm(10,-5,5),2),
Happiness = sample(10,10),
Sex = round(rnorm(10,0.75,0.3),2),
Age = sample(100,100),
Educ = round(rnorm(10,0.75,0.3),2))
Panel[, uniqueID := .I] # Creates a unique ID
Panel[Panel == 0] <- NA
Events <- fread(
"Event_Type country year
A NLD 2005
C NLD 2004
A GBR 2006
B GBR 2003
A GRC 2002
D GRC 2007",
header = TRUE)
================================================================================ EDIT:
Events <- fread(
"Event_Type country year
A NLD 2005
A NLD 2004
A GBR 2006
A GBR 2003
A GRC 2002
A GRC 2007",
header = TRUE)
EDITED DESIRED OUTCOME:
panleID country year 2002 2003 2004 2005 2006 2007
1 NLD 2002 NA NA 1 1 NA NA
1 NLD 2006 NA NA 1 1 NA NA
=======================================================================
I would like the value in the column Event_Type
to be added to Panel
, if the year
of the event is between two panel observation (and in the same country).
As an example, let's take the following panel observation:
panleID country year
1 NLD 2002
1 NLD 2006
Panel
will get 4 extra columns A
to D
. The column A
, will get a 1
in the column if the event in country NLD
in the year 2005 (first line Events
, takes place in one of or between the two years. Since it does this results in the following:
panleID country year A B C D
1 NLD 2002 1 NA NA NA
1 NLD 2006 1 NA NA NA
I know that merging with the same year goes as follows:
merge(Panel, dcast(Events, iso + country ~ Event_Type),
by = c("country", "year"))
But how should I make the merge if I want the values to be equal to or in between the two panelID
years?
回答1:
Here is a go at your problem using data.table
The code can be shortened, but I always find it useful (expecially on SO) to show all steps in between for easy error-checking and validation.
#first, summarise Panel, to get the time-span of the panelID
Panel.short <- Panel[, .(country = unique(country),
start = min(year),
end = max(year) ),
by = .(panelID)]
# panelID country start end
# 1: 1 NLD 2002 2006
#perform left non-equi join
Panel.short.joined <- Events[ Panel.short, on =.(country, year >= start, year <= end), mult = "all"][]
# Event_Type country year year.1 panelID
# 1: A NLD 2002 2006 1
# 2: C NLD 2002 2006 1
#cast to wide
Panel.final <- dcast( Panel.short.joined,
panelID + country ~ Event_Type,
fun.aggregate = length )
# panelID country A C
# 1: 1 NLD 1 1
#perform update join on the original Panel
Panel[, `:=`(A=0, B=0, C=0, D=0)][
Panel.final,
`:=`( A = i.A, C = i.C), # <- add B = i.B and D = i.D here
on = .( panelID )][]
# panelID country year A B C D
# 1: 1 NLD 2002 1 0 1 0
# 2: 1 NLD 2006 1 0 1 0
回答2:
This is similar to @Wimpel but changes the order to:
- Cast
Events
to wide - Update by reference the year range by
panelID
- Non-equi update join
# cast Event
Events_cast <- dcast(Events, country + year~Event_Type, length)
# update by reference for join later
Panel[, `:=`(start = min(year), end = max(year)), by = panelID]
# dcast sorts the rhs alphabetically
cols <- sort(unique(Events[['Event_Type']]))
# non-equi update join
Panel[Events_cast,
on = .(country,
start <= year,
end >= year),
(cols) := mget(cols)]
#clean up data frame
setnafill(Panel, fill = 0L, cols = cols)
Panel[, `:=`(start = NULL, end = NULL)]
Panel
回答3:
I would consider making use of 'between' and '.SD'. Having trouble following your example, so typically:
DT[between(startYear, endYear, incbounds=FALSE,][, dcast(,.SD, cat1 ~ cat2 ...)]
Note: By passing data.table to cast with .SD, you can use i to subset further.
来源:https://stackoverflow.com/questions/58890738/conditional-merge-based-an-event-happening-between-two-panel-observations