tidyverse | 易学教程

How can I draw a random sample from a dataset, proportionate to size, based on different proportions for each value of a factor variable, in R

阅读更多关于 How can I draw a random sample from a dataset, proportionate to size, based on different proportions for each value of a factor variable, in R

问题 I want to draw a random sample from my dataset, using different proportions for each value of a factor variable, as well as using weights stored in some other column. dplyr solution in pipes will be preferred as it can be inserted easily in long code. Let's take the example of iris dataset. Species column is divided into three values 50 rows each. Let's also assume the sample weights are stored in column Sepal.Length . If I have to sample equal proportions (or equal rows) per species, the

How can I draw a random sample from a dataset, proportionate to size, based on different proportions for each value of a factor variable, in R

阅读更多关于 How can I draw a random sample from a dataset, proportionate to size, based on different proportions for each value of a factor variable, in R

How can I draw a random sample from a dataset, proportionate to size, based on different proportions for each value of a factor variable, in R

阅读更多关于 How can I draw a random sample from a dataset, proportionate to size, based on different proportions for each value of a factor variable, in R

Refering to column names inside dplyr's across()

阅读更多关于 Refering to column names inside dplyr's across()

问题 Is it possible to refer to column names in a lambda function inside across() ? df <- tibble(age = c(12, 45), sex = c('f', 'f')) allowed_values <- list(age = 18:100, sex = c("f", "m")) df %>% mutate(across(c(age, sex), c(valid = ~ .x %in% allowed_values[[COLNAME]]))) I just came across this question where OP asks about validating columns in a dataframe based on a list of allowed values. dplyr just gained across() and it seems like a natural choice, but we need columns names to look up the

Refering to column names inside dplyr's across()

阅读更多关于 Refering to column names inside dplyr's across()

Refering to column names inside dplyr's across()

阅读更多关于 Refering to column names inside dplyr's across()

Refering to column names inside dplyr's across()

阅读更多关于 Refering to column names inside dplyr's across()

R - Obtaining the highest/lowest value in a set of columns defined by the value in a different dataframe

阅读更多关于 R - Obtaining the highest/lowest value in a set of columns defined by the value in a different dataframe

问题 I have two dataframes: one (A) containing the start and end dates (Julian date, so a continuous count of days) of an event, and the other (B) containing values at dates from start to beyond the end dates in the first dataframe. The start date in A is stable, the end date varies. I want to be able to, for each row, identify the value with the greatest magnitude of change (highest and/or lowest values) between the start and end date in the series in B, then write to a new dataframe. Example

R - Obtaining the highest/lowest value in a set of columns defined by the value in a different dataframe

阅读更多关于 R - Obtaining the highest/lowest value in a set of columns defined by the value in a different dataframe

How can I add stars to broom package's tidy() function output?

阅读更多关于 How can I add stars to broom package's tidy() function output?

问题 I have been using the broom package's tidy() function in R to print my model summaries. However, the tidy() function returns p-values without stars, which makes it a bit weird for many people who are used to seeing stars in model summaries. Does anyone know a way to add stars to the output? 回答1: We can use a convenient function stars.pval from gtools to do this library(gtools) library(broom) library(dplyr) data(mtcars) mtcars %>% lm(mpg ~ wt + qsec, .) %>% tidy %>% mutate(signif = stars.pval