问题
Solution
I went with the solution provided by @thelatemail because I'm trying to stick with tidyverse and thus dplyr--I'm still new to R, so I'm taking baby steps and taking advantage of helper libraries. Thank you everyone for taking the time to contribute solutions.
df_new <- df_inh %>%
select(
isolate,
Phenotype,
which(
sapply( ., function( x ) sd( x ) != 0 )
)
)
Question
I'm trying to select columns if the column name is "isolate" or "Phenotype" or if the standard deviation of the column values is not 0.
I have tried the following code.
df_new <- df_inh %>%
# remove isolate and Phenotype column for now, don't want to calculate their standard deviation
select(
-isolate,
-Phenotype
) %>%
# remove columns with all 1's or all 0's by calculating column standard deviation
select_if(
function( col ) return( sd( col ) != 0 )
) %>%
# add back the isolate and Phenotype columns
select(
isolate,
Phenotype
)
I also tried this
df_new <- df_inh %>%
select_if(
function( col ) {
if ( col == 'isolate' | col == 'Phenotype' ) {
return( TRUE )
}
else {
return( sd( col ) != 0 )
}
}
)
I can select columns by standard deviation or by column name however I cannot do this simultaneously.
回答1:
Not sure if you can do this with select_if
alone but one way is to combine two select
operation and then bind the columns. Using mtcars
as sample data.
library(dplyr)
bind_cols(mtcars %>% select_if(function(x) sum(x) > 1000),
mtcars %>% select(mpg, cyl))
# disp hp mpg cyl
#1 160.0 110 21.0 6
#2 160.0 110 21.0 6
#3 108.0 93 22.8 4
#4 258.0 110 21.4 6
#5 360.0 175 18.7 8
#6 225.0 105 18.1 6
#7 360.0 245 14.3 8
#8 146.7 62 24.4 4
#....
However, if a column satisfies both the condition (gets selected in select_if
as well as select
) then the column would be repeated.
We can also use base R which gives the same output but avoids column getting selected twice using unique
.
sel_names <- c("mpg", "cyl")
mtcars[unique(c(sel_names, names(mtcars)[sapply(mtcars, sum) > 1000]))]
So for your case the two versions would be :
bind_cols(df_inh %>% select_if(function(x) sd(x) != 0),
df_inh %>% select(isolate, Phenotype))
and
sel_names <- c("isolate", "Phenotype")
df_inh[unique(c(sel_names, names(df_inh)[sapply(df_inh, sd) != 0]))]
回答2:
I wouldn't use tidyverse functions at all for this task.
df_new <- df_inh[,c(grep("isolate", names(df_inh)),
grep("Phenotype", names(df_inh),
which(sapply(df_inh, sd) != 0))]
Above, you just index using []
by each criteria using grep
and which
来源:https://stackoverflow.com/questions/55584714/how-to-select-columns-by-name-or-their-standard-deviation-simultaneously