Between/within standard deviations

后端未结

关注

 2  1254

When working on a hierarchical/multilevel/panel dataset, it may be very useful to adopt a package which returns the within- and between-group standard deviations of the avai

相关标签:

2条回答

南笙

2021-01-03 01:37

I know this question is four years old, but recently I wanted to do the same in R and came up with the following function. It depends on dplyr and tibble. Where: df is the dataframe, columns is a numerical vector to subset the dataframe and individuals is the column with the individuals.

xtsumR<-function(df,columns,individuals){
  df<-dplyr::arrange_(df,individuals)
  panel<-tibble::tibble()
  for (i in columns){
    v<-df %>% dplyr::group_by_() %>%
      dplyr::summarize_(
        mean=mean(df[[i]]),
        sd=sd(df[[i]]),
        min=min(df[[i]]),
        max=max(df[[i]])
      )
    v<-tibble::add_column(v,variacao="overal",.before=-1)
    v2<-aggregate(df[[i]],list(df[[individuals]]),"mean")[[2]]
    sdB<-sd(v2)
    varW<-df[[i]]-rep(v2,each=12) #
    varW<-varW+mean(df[[i]])
    sdW<-sd(varW)
    minB<-min(v2)
    maxB<-max(v2)
    minW<-min(varW)
    maxW<-max(varW)
    v<-rbind(v,c("between",NA,sdB,minB,maxB),c("within",NA,sdW,minW,maxW))
    panel<-rbind(panel,v)
  }
  var<-rep(names(df)[columns])
  n1<-rep(NA,length(columns))
  n2<-rep(NA,length(columns))
  var<-c(rbind(var,n1,n1))
  panel$var<-var
  panel<-panel[c(6,1:5)]
  names(panel)<-c("variable","variation","mean","standard.deviation","min","max")
  panel[3:6]<-as.numeric(unlist(panel[3:6]))
  panel[3:6]<-round(unlist(panel[3:6]),2)
  return(panel)
}

0 讨论(0)

不知归路

2021-01-03 01:41

I don't know what your stata command should reproduce, but to answer the second part of question about hierarchical structure , it is easy to do this with list. For example, you define a structure like this:

tree = list(
      "var1" = list(
         "panel" = list(type ='p',mean = 1,sd=0)
         ,"cluster" = list(type = 'c',value = c(5,8,10)))
      ,"var2" = list(
          "panel" = list(type ='p',mean = 2,sd=0.5)
         ,"cluster" = list(type="c",value =c(1,2)))
)

To create this lapply is convinent to work with list

tree <- lapply(list('var1','var2'),function(x){ 
  ll <- list(panel= list(type ='p',mean = rnorm(1),sd=0), ## I use symbol here not name
             cluster= list(type = 'c',value = rnorm(3)))  ## R prefer symbols
})
names(tree) <-c('var1','var2')

You can view he structure with str

str(tree)
List of 2
 $ var1:List of 2
  ..$ panel  :List of 3
  .. ..$ type: chr "p"
  .. ..$ mean: num 0.284
  .. ..$ sd  : num 0
  ..$ cluster:List of 2
  .. ..$ type : chr "c"
  .. ..$ value: num [1:3] 0.0722 -0.9413 0.6649
 $ var2:List of 2
  ..$ panel  :List of 3
  .. ..$ type: chr "p"
  .. ..$ mean: num -0.144
  .. ..$ sd  : num 0
  ..$ cluster:List of 2
  .. ..$ type : chr "c"
  .. ..$ value: num [1:3] -0.595 -1.795 -0.439

Edit after OP clarification

I think that package reshape2 is what you want. I will demonstrate this here.

The idea here is in order to do the multilevel analysis we need to reshape the data.

First to divide the variables into two groups :identifier and measured variables. library(reshape2) dat.m <- melt(dat,id.vars=c('son_id','mom_id')) ## other columns are measured

str(dat.m)
'data.frame':   21 obs. of  4 variables:
 $ son_id  : Factor w/ 3 levels "1","2","3": 1 2 3 1 2 1 2 1 2 3 ...
 $ mom_id  : Factor w/ 3 levels "1","2","3": 1 1 1 2 2 3 3 1 1 1 ...
 $ variable: Factor w/ 3 levels "hispanic","mom_smoke",..: 1 1 1 1 1 1 1 2 2 2 ...
 $ value   : num  1 1 1 0 0 0 0 1 0 0 ..

Once your have data in "moten" form , you can "cast" to rearrange it in the shape that you want:

# mom1 means for all variable
 acast(dat.m,variable~mom_id,mean)
                           1    2      3
hispanic           1.0000000    0    0.0
mom_smoke          0.3333333    1    0.5
son_birthweigth 3943.3333333 4160 2977.5
# Within-mother variance for birthweigth

acast(dat.m,variable~mom_id,function(x) sum((x-mean(x))^2))
                           1    2    3
hispanic           0.0000000    0  0.0
mom_smoke          0.6666667    0  0.5
son_birthweigth 5066.6666667 3200 12.5

## overall mean of each variable
acast(dat.m,variable~.,mean)
[,1]
hispanic           0.4285714
mom_smoke          0.5714286
son_birthweigth 3729.2857143

0 讨论(0)