Using shapiro.test on multiple columns in a data frame

前端未结

关注

 3  1996

伪装坚强ぢ

It seems like a pretty simple question, but I can\'t find the answer.

I have a dataframe (lets call it df), containing n=100 columns (C1, <

相关标签:

3条回答

囚心锁ツ

2020-12-28 10:37

To apply some function over rows or columns of a data frame, one uses apply family:

df <- data.frame(a=rnorm(100), b=rnorm(100))    
df.shapiro <- apply(df, 2, shapiro.test)
df.shapiro
$a

    Shapiro-Wilk normality test

data:  newX[, i]
W = 0.9895, p-value = 0.6276


$b

    Shapiro-Wilk normality test

data:  newX[, i]
W = 0.9854, p-value = 0.3371

Note that column names are preserved, and df.shapiro is a named list.

Now, if you want, say, a vector of p-values, all you have to do is to extract them from appropriate lists:

unlist(lapply(df.shapiro, function(x) x$p.value))
        a         b 
0.6275521 0.3370931

0 讨论(0)

悲&欢浪女

2020-12-28 10:39

Use do.call with rbind and lapply for more simple and compact solution:

df <- data.frame(a = rnorm(100), b = rnorm(100), c = rnorm(100))
do.call(rbind, lapply(df, function(x) shapiro.test(x)[c("statistic", "p.value")]))
#>   statistic p.value    
#> a 0.986224  0.3875904  
#> b 0.9894938 0.6238027
#> c 0.9652532 0.009694794

0 讨论(0)

悲哀的现实

2020-12-28 10:42
Not that I think this is a sensible approach to data analysis, but the underlying issue of applying a function to the columns of a data frame is a general task that can easily be achieved using one of sapply() or lapply() (or even apply(), but for data frames, one of the two earlier-mentioned functions would be best).

Here is an example, using some dummy data:
```
set.seed(42)
df <- data.frame(Gaussian = rnorm(50), Poisson = rpois(50, 2), 
                 Uniform = runif(50))
```
Now apply the shapiro.test() function. We capture the output in a list (given the object returned by this function) so we will use lapply().
```
lshap <- lapply(df, shapiro.test)
lshap[[1]] ## look at the first column results

R> lshap[[1]]

    Shapiro-Wilk normality test

data:  X[[1L]]
W = 0.9802, p-value = 0.5611
```
You will need to extract the things you want from these objects, which all have the structure:
```
R> str(lshap[[1]])
List of 4
 $ statistic: Named num 0.98
  ..- attr(*, "names")= chr "W"
 $ p.value  : num 0.561
 $ method   : chr "Shapiro-Wilk normality test"
 $ data.name: chr "X[[1L]]"
 - attr(*, "class")= chr "htest"
```
If you want the statistic and p.value components of this object for all elements of lshap, we will use sapply() this time, to nicely arrange the results for us:
```
lres <- sapply(lshap, `[`, c("statistic","p.value"))

R> lres
          Gaussian Poisson Uniform 
statistic 0.9802   0.9371  0.918   
p.value   0.5611   0.01034 0.001998
```
Given that you have 500 of these, I'd transpose lres:
```
R> t(lres)
         statistic p.value 
Gaussian 0.9802    0.5611  
Poisson  0.9371    0.01034 
Uniform  0.918     0.001998
```
If you plan on doing anything with the p-values from this exercise, I suggest you start thinking about how to correct for multiple comparisons before you shoot yourself in the foot with a 30-cal.
0 讨论(0)
发布评论:

提交评论
- 加载中...