How to apply a shapiro test by groups in R?

后端未结

关注

 3  1917

萌比男神i

I have a dataframe where all my 90 variables have integer data, of the type:

code | variable1 | variable2 | variable3 | ...

AB | 2 | 3 | 10 |

相关标签:

3条回答

春和景丽

2021-01-24 10:34

Using mtcars data from R

mydata<-mtcars
 kk<-Map(function(x)cbind(shapiro.test(x)$statistic,shapiro.test(x)$p.value),mydata)
library(plyr)
myout<-ldply(kk)
names(myout)<-c("var","W","p.value")
myout
    var         W      p.value
1   mpg 0.9475648 1.228816e-01
2   cyl 0.7533102 6.058378e-06
3  disp 0.9200127 2.080660e-02
4    hp 0.9334191 4.880736e-02
5  drat 0.9458838 1.100604e-01
6    wt 0.9432578 9.265551e-02
7  qsec 0.9732511 5.935208e-01
8    vs 0.6322636 9.737384e-08
9    am 0.6250744 7.836356e-08
10 gear 0.7727857 1.306847e-05
11 carb 0.8510972 4.382401e-04

0 讨论(0)

时光说笑

2021-01-24 10:41

The answer by @GegznaV was excellent but meanwhile, the tidyverse has some newer constructs like tidyr::pivot_longer replacing tidyr::gather, and the tidyverse authors recommend the nest-unnest syntax.

Also, I replaced broom::tidy by broom::glance as it gives the statistics for more models (e.g. aov()).

Here's the same example of @GegznaV rewritten in the updated tidyverse syntax:

library(tidyverse)
library(broom)

mtcars %>% 
  select(-am, -wt) %>%
  pivot_longer(
    cols = everything(),
    names_to = "variable_name",
    values_to = "value"
  ) %>% 
  nest(data = -variable_name) %>% 
  mutate(
    shapiro = map(data, ~shapiro.test(.x$value)),
    glanced = map(shapiro, glance)
  ) %>% 
  unnest(glanced) %>% 
  select(variable_name, W = statistic, p.value) %>% 
  arrange(variable_name)

which gives the same result:

# A tibble: 9 x 3
  variable_name     W      p.value
  <chr>         <dbl>        <dbl>
1 carb          0.851 0.000438    
2 cyl           0.753 0.00000606  
3 disp          0.920 0.0208      
4 drat          0.946 0.110       
5 gear          0.773 0.0000131   
6 hp            0.933 0.0488      
7 mpg           0.948 0.123       
8 qsec          0.973 0.594       
9 vs            0.632 0.0000000974

0 讨论(0)

暗喜

2021-01-24 10:56

Example with mtcars data.

library(tidyverse)
library(broom)

mtcars %>% 
    select(-am, - wt) %>% # Remove unnecessary columns
    gather(key = "variable_name", value = "value") %>%
    group_by(variable_name)  %>% 
    do(broom::tidy(shapiro.test(.$value)))  %>% 
    ungroup()  %>% 
    select(variable_name, W = statistic, `p-value` = p.value)

The result:

# A tibble: 9 x 3
  variable_name     W    `p-value`
  <chr>         <dbl>        <dbl>
1 carb          0.851 0.000438    
2 cyl           0.753 0.00000606  
3 disp          0.920 0.0208      
4 drat          0.946 0.110       
5 gear          0.773 0.0000131   
6 hp            0.933 0.0488      
7 mpg           0.948 0.123       
8 qsec          0.973 0.594       
9 vs            0.632 0.0000000974

0 讨论(0)