问题
I have a time-series panel dataset which is structured in the following way:
df <- data.frame(
year = c(2012L, 2013L, 2014L, 2012L, 2013L, 2014L),
id = c(1L, 1L, 1L, 2L, 2L, 2L),
c = c(11L, 13L, 13L, 16L, 15L, 15L)
)
#> year id c
#> 1 2012 1 11
#> 2 2013 1 13
#> 3 2014 1 13
#> 4 2012 2 16
#> 5 2013 2 15
#> 6 2014 2 15
I would like to find the cross-correlation between values in column C given their id number. Something similar to this:
#> 1 2
#> 1 1 0.8
#> 2 0.8 1
I have been using dplyr package to find the cross-correlation between two variables in my panel data but for some reason, I can't do the same for cross correlation in one veriable grouped by id.
回答1:
If you are already using tidyverse
tools, you should try widyr.
Its functions reshape to wide, get the correlations, and give you back a tidy data frame again.
(Note I changed the sample data slightly to match akaDrHouse's answer.
df <- data.frame(
year = c(2012L, 2013L, 2014L, 2012L, 2013L, 2014L),
id = c(1L, 1L, 1L, 2L, 2L, 2L),
c = c(11L, 13L, 13L, 16L, 15L, 156L)
)
df
#> year id c
#> 1 2012 1 11
#> 2 2013 1 13
#> 3 2014 1 13
#> 4 2012 2 16
#> 5 2013 2 15
#> 6 2014 2 156
widyr::pairwise_cor(df, id, year, c)
#> # A tibble: 2 x 3
#> item1 item2 correlation
#> <int> <int> <dbl>
#> 1 2 1 0.4946525
#> 2 1 2 0.4946525
widyr::pairwise_cor(df, id, year, c, upper = FALSE)
#> # A tibble: 1 x 3
#> item1 item2 correlation
#> <int> <int> <dbl>
#> 1 1 2 0.4946525
回答2:
Do you mean something like the following? I used the reshape package to cast based on the value of your id, followed by the cor()
function in baseR.
> mydf <- data.frame(year=c("12","13","14","12","13","14"),id=c(1,1,1,2,2,2),c=c(11,13,13,16,15,156))
> library(reshape2)
> mydf
year id c
1 12 1 11
2 13 1 13
3 14 1 13
4 12 2 16
5 13 2 15
6 14 2 156
> my_wide_data <- dcast(mydf, year~id,value.var="c")
> cor(my_wide_data[,2:3])
1 2
1 1.0000000 0.4946525
2 0.4946525 1.0000000
So @Henrik's comment was much more simple and elegant, so including here.
> cor(unstack(mydf[ , -1], c ~ id))
X1 X2
X1 1.0000000 0.4946525
X2 0.4946525 1.0000000
来源:https://stackoverflow.com/questions/44348269/correlation-matrix-in-panel-data-in-r