tidyverse

How can I draw a random sample from a dataset, proportionate to size, based on different proportions for each value of a factor variable, in R

北城余情 提交于 2021-01-01 17:51:34
问题 I want to draw a random sample from my dataset, using different proportions for each value of a factor variable, as well as using weights stored in some other column. dplyr solution in pipes will be preferred as it can be inserted easily in long code. Let's take the example of iris dataset. Species column is divided into three values 50 rows each. Let's also assume the sample weights are stored in column Sepal.Length . If I have to sample equal proportions (or equal rows) per species, the

How can I draw a random sample from a dataset, proportionate to size, based on different proportions for each value of a factor variable, in R

倾然丶 夕夏残阳落幕 提交于 2021-01-01 17:51:31
问题 I want to draw a random sample from my dataset, using different proportions for each value of a factor variable, as well as using weights stored in some other column. dplyr solution in pipes will be preferred as it can be inserted easily in long code. Let's take the example of iris dataset. Species column is divided into three values 50 rows each. Let's also assume the sample weights are stored in column Sepal.Length . If I have to sample equal proportions (or equal rows) per species, the

How can I draw a random sample from a dataset, proportionate to size, based on different proportions for each value of a factor variable, in R

[亡魂溺海] 提交于 2021-01-01 17:51:16
问题 I want to draw a random sample from my dataset, using different proportions for each value of a factor variable, as well as using weights stored in some other column. dplyr solution in pipes will be preferred as it can be inserted easily in long code. Let's take the example of iris dataset. Species column is divided into three values 50 rows each. Let's also assume the sample weights are stored in column Sepal.Length . If I have to sample equal proportions (or equal rows) per species, the

Refering to column names inside dplyr's across()

断了今生、忘了曾经 提交于 2020-12-29 14:28:38
问题 Is it possible to refer to column names in a lambda function inside across() ? df <- tibble(age = c(12, 45), sex = c('f', 'f')) allowed_values <- list(age = 18:100, sex = c("f", "m")) df %>% mutate(across(c(age, sex), c(valid = ~ .x %in% allowed_values[[COLNAME]]))) I just came across this question where OP asks about validating columns in a dataframe based on a list of allowed values. dplyr just gained across() and it seems like a natural choice, but we need columns names to look up the

Refering to column names inside dplyr's across()

北城余情 提交于 2020-12-29 14:27:49
问题 Is it possible to refer to column names in a lambda function inside across() ? df <- tibble(age = c(12, 45), sex = c('f', 'f')) allowed_values <- list(age = 18:100, sex = c("f", "m")) df %>% mutate(across(c(age, sex), c(valid = ~ .x %in% allowed_values[[COLNAME]]))) I just came across this question where OP asks about validating columns in a dataframe based on a list of allowed values. dplyr just gained across() and it seems like a natural choice, but we need columns names to look up the

Refering to column names inside dplyr's across()

你说的曾经没有我的故事 提交于 2020-12-29 14:22:02
问题 Is it possible to refer to column names in a lambda function inside across() ? df <- tibble(age = c(12, 45), sex = c('f', 'f')) allowed_values <- list(age = 18:100, sex = c("f", "m")) df %>% mutate(across(c(age, sex), c(valid = ~ .x %in% allowed_values[[COLNAME]]))) I just came across this question where OP asks about validating columns in a dataframe based on a list of allowed values. dplyr just gained across() and it seems like a natural choice, but we need columns names to look up the

Refering to column names inside dplyr's across()

一世执手 提交于 2020-12-29 14:18:26
问题 Is it possible to refer to column names in a lambda function inside across() ? df <- tibble(age = c(12, 45), sex = c('f', 'f')) allowed_values <- list(age = 18:100, sex = c("f", "m")) df %>% mutate(across(c(age, sex), c(valid = ~ .x %in% allowed_values[[COLNAME]]))) I just came across this question where OP asks about validating columns in a dataframe based on a list of allowed values. dplyr just gained across() and it seems like a natural choice, but we need columns names to look up the

R - Obtaining the highest/lowest value in a set of columns defined by the value in a different dataframe

与世无争的帅哥 提交于 2020-12-15 06:02:48
问题 I have two dataframes: one (A) containing the start and end dates (Julian date, so a continuous count of days) of an event, and the other (B) containing values at dates from start to beyond the end dates in the first dataframe. The start date in A is stable, the end date varies. I want to be able to, for each row, identify the value with the greatest magnitude of change (highest and/or lowest values) between the start and end date in the series in B, then write to a new dataframe. Example

R - Obtaining the highest/lowest value in a set of columns defined by the value in a different dataframe

南笙酒味 提交于 2020-12-15 06:02:34
问题 I have two dataframes: one (A) containing the start and end dates (Julian date, so a continuous count of days) of an event, and the other (B) containing values at dates from start to beyond the end dates in the first dataframe. The start date in A is stable, the end date varies. I want to be able to, for each row, identify the value with the greatest magnitude of change (highest and/or lowest values) between the start and end date in the series in B, then write to a new dataframe. Example

How can I add stars to broom package's tidy() function output?

旧巷老猫 提交于 2020-11-27 01:53:35
问题 I have been using the broom package's tidy() function in R to print my model summaries. However, the tidy() function returns p-values without stars, which makes it a bit weird for many people who are used to seeing stars in model summaries. Does anyone know a way to add stars to the output? 回答1: We can use a convenient function stars.pval from gtools to do this library(gtools) library(broom) library(dplyr) data(mtcars) mtcars %>% lm(mpg ~ wt + qsec, .) %>% tidy %>% mutate(signif = stars.pval