Say I have the following data:
colA <- c(\"SampA\", \"SampB\", \"SampC\")
colB <- c(21, 20, 30)
colC <- c(15, 14, 12)
colD <- c(10, 22, 18)
df &l
Package magrittr
pipes %>%
are not a good way to process by rows.
Maybe the following is what you want.
df %>%
select(-colA) %>%
t() %>% as.data.frame() %>%
summarise_all(sd)
# V1 V2 V3
#1 5.507571 4.163332 9.165151
Here is another way using pmap
to get the rowwise mean
and sd
library(purrr)
library(dplyr)
library(tidur_
f1 <- function(x) tibble(Mean = mean(x), SD = sd(x))
df %>%
# select the numeric columns
select_if(is.numeric) %>%
# apply the f1 rowwise to get the mean and sd in transmute
transmute(out = pmap(., ~ f1(c(...)))) %>%
# unnest the list column
unnest %>%
# bind with the original dataset
bind_cols(df, .)
# colA colB colC colD Mean SD
#1 SampA 21 15 10 15.33333 5.507571
#2 SampB 20 14 22 18.66667 4.163332
#3 SampC 30 12 18 20.00000 9.165151
Try this (using), withrowSds
from the matrixStats
package,
library(dplyr)
library(matrixStats)
columns <- c('colB', 'colC', 'colD')
df %>%
mutate(Mean= rowMeans(.[columns]), stdev=rowSds(as.matrix(.[columns])))
Returns
colA colB colC colD Mean stdev
1 SampA 21 15 10 15.33333 5.507571
2 SampB 20 14 22 18.66667 4.163332
3 SampC 30 12 18 20.00000 9.165151
Your data
colA <- c("SampA", "SampB", "SampC")
colB <- c(21, 20, 30)
colC <- c(15, 14, 12)
colD <- c(10, 22, 18)
df <- data.frame(colA, colB, colC, colD)
df
A different tidyverse
approach could be:
df %>%
rowid_to_column() %>%
gather(var, val, -c(colA, rowid)) %>%
group_by(rowid) %>%
summarise(rsds = sd(val)) %>%
left_join(df %>%
rowid_to_column(), by = c("rowid" = "rowid")) %>%
select(-rowid)
rsds colA colB colC colD
<dbl> <fct> <dbl> <dbl> <dbl>
1 5.51 SampA 21 15 10
2 4.16 SampB 20 14 22
3 9.17 SampC 30 12 18
Here it, first, creates a row ID. Second, it performs a wide-to-long data transformation, excluding the "colA" and row ID. Third, it groups by row ID and calculates the standard deviation. Finally, it joins it with the original df on row ID.
Or alternatively, using rowwise()
and do()
:
df %>%
rowwise() %>%
do(data.frame(., rsds = sd(unlist(.[2:length(.)]))))
colA colB colC colD rsds
* <fct> <dbl> <dbl> <dbl> <dbl>
1 SampA 21 15 10 5.51
2 SampB 20 14 22 4.16
3 SampC 30 12 18 9.17
You can use pmap
, or rowwise
(or group by colA
) along with mutate
:
library(tidyverse)
df %>% mutate(sd = pmap(.[-1], ~sd(c(...)))) # same as transform(df, sd = apply(df[-1],1,sd))
#> colA colB colC colD sd
#> 1 SampA 21 15 10 5.507571
#> 2 SampB 20 14 22 4.163332
#> 3 SampC 30 12 18 9.165151
df %>% rowwise() %>% mutate(sd = sd(c(colB,colC,colD)))
#> Source: local data frame [3 x 5]
#> Groups: <by row>
#>
#> # A tibble: 3 x 5
#> colA colB colC colD sd
#> <fct> <dbl> <dbl> <dbl> <dbl>
#> 1 SampA 21 15 10 5.51
#> 2 SampB 20 14 22 4.16
#> 3 SampC 30 12 18 9.17
df %>% group_by(colA) %>% mutate(sd = sd(c(colB,colC,colD)))
#> # A tibble: 3 x 5
#> # Groups: colA [3]
#> colA colB colC colD sd
#> <fct> <dbl> <dbl> <dbl> <dbl>
#> 1 SampA 21 15 10 5.51
#> 2 SampB 20 14 22 4.16
#> 3 SampC 30 12 18 9.17
I see this post is a bit old, but there are some pretty complicated answers so I thought I'd suggest an easier (and faster) approach.
Calculating means of rows is trivial, just use rowMeans:
rowMeans(df[, c('colB', 'colC', 'colD')])
This is vectorised and very fast.
There is no 'rowSd' function, but it is not hard to write one. Here is my 'rowVars' that I use.
rowVars <- function(x, na.rm=F) {
# Vectorised version of variance filter
rowSums((x - rowMeans(x, na.rm=na.rm))^2, na.rm=na.rm) / (ncol(x) - 1)
}
To calculate sd:
sqrt(rowVars(df[, c('colB', 'colC', 'colD')]))
Again, vectorised and fast which can be important if the input matrix is large.