I have a dataframe where all my 90 variables have integer data, of the type:
code | variable1 | variable2 | variable3 | ...
AB | 2 | 3 | 10 |
Using mtcars data from R
mydata<-mtcars
kk<-Map(function(x)cbind(shapiro.test(x)$statistic,shapiro.test(x)$p.value),mydata)
library(plyr)
myout<-ldply(kk)
names(myout)<-c("var","W","p.value")
myout
var W p.value
1 mpg 0.9475648 1.228816e-01
2 cyl 0.7533102 6.058378e-06
3 disp 0.9200127 2.080660e-02
4 hp 0.9334191 4.880736e-02
5 drat 0.9458838 1.100604e-01
6 wt 0.9432578 9.265551e-02
7 qsec 0.9732511 5.935208e-01
8 vs 0.6322636 9.737384e-08
9 am 0.6250744 7.836356e-08
10 gear 0.7727857 1.306847e-05
11 carb 0.8510972 4.382401e-04
The answer by @GegznaV was excellent but meanwhile, the tidyverse has some newer constructs like tidyr::pivot_longer
replacing tidyr::gather
, and the tidyverse authors recommend the nest-unnest
syntax.
Also, I replaced broom::tidy
by broom::glance
as it gives the statistics for more models (e.g. aov()
).
Here's the same example of @GegznaV rewritten in the updated tidyverse syntax:
library(tidyverse)
library(broom)
mtcars %>%
select(-am, -wt) %>%
pivot_longer(
cols = everything(),
names_to = "variable_name",
values_to = "value"
) %>%
nest(data = -variable_name) %>%
mutate(
shapiro = map(data, ~shapiro.test(.x$value)),
glanced = map(shapiro, glance)
) %>%
unnest(glanced) %>%
select(variable_name, W = statistic, p.value) %>%
arrange(variable_name)
which gives the same result:
# A tibble: 9 x 3
variable_name W p.value
<chr> <dbl> <dbl>
1 carb 0.851 0.000438
2 cyl 0.753 0.00000606
3 disp 0.920 0.0208
4 drat 0.946 0.110
5 gear 0.773 0.0000131
6 hp 0.933 0.0488
7 mpg 0.948 0.123
8 qsec 0.973 0.594
9 vs 0.632 0.0000000974
Example with mtcars
data.
library(tidyverse)
library(broom)
mtcars %>%
select(-am, - wt) %>% # Remove unnecessary columns
gather(key = "variable_name", value = "value") %>%
group_by(variable_name) %>%
do(broom::tidy(shapiro.test(.$value))) %>%
ungroup() %>%
select(variable_name, W = statistic, `p-value` = p.value)
The result:
# A tibble: 9 x 3
variable_name W `p-value`
<chr> <dbl> <dbl>
1 carb 0.851 0.000438
2 cyl 0.753 0.00000606
3 disp 0.920 0.0208
4 drat 0.946 0.110
5 gear 0.773 0.0000131
6 hp 0.933 0.0488
7 mpg 0.948 0.123
8 qsec 0.973 0.594
9 vs 0.632 0.0000000974