tidyverse | 易学教程

How can I mutate multiple variables using dplyr?

阅读更多关于 How can I mutate multiple variables using dplyr?

问题 Given a tbl_df object df containing multiple variables (i.e. Var.50, Var.100, Var.150 and Var.200), measured twice (i.e. P1 and P2), I want to mutate a new set of the same variables from repeated measurements (for example, average P1 and P2, creating P3 for each corresponding variable). Similar questions have been asked before, but there does not seem to have clear answers using dplyr. Example data: df <- structure(list(P1.Var.50 = c(134.242050170898, 52.375, 177.126017252604 ), P1.Var.100 =

In R, use nonstandard evaluation to select specific variables from data.frames

阅读更多关于 In R, use nonstandard evaluation to select specific variables from data.frames

问题 I've got several large-ish data.frames set up like a relational database, and I'd like to make a single function to look for whatever variable I need and grab it from that particular data.frame and add it to the data.frame I'm currently working on. I've got a way to do this that works, but it requires temporarily making a list of all the data.frames, which seems inefficient. I suspect that nonstandard evaluation would solve this problem for me, but I'm not sure how to do it. Here's what works

Cumulative aggregates within tidyverse

阅读更多关于 Cumulative aggregates within tidyverse

问题 say I have a tibble (or data.table ) which consists of two columns: a <- tibble(id = rep(c("A", "B"), each = 6), val = c(1, 0, 0, 1 ,0,1,0,0,0,1,1,1)) Furthermore I have a function called myfun which takes a numeric vector of arbitrary length as input and returns a single number. For example, you can think of myfun as being the standard deviation. Now I would like to create a third column to my tibble (called result) which contains the outputs of myfun applied to val cumulated and grouped

tidyverse - delete a column within a nested column/list

阅读更多关于 tidyverse - delete a column within a nested column/list

问题 I have the following data: (Note: I'm using the current github version of dplyr within tidyverse which offerse some new experimental functions, like condense - which I'm using below, but I think that's not relevant for my problem/question). library(tidyverse) library(corrr) dat <- data.frame(grp = rep(1:4, each = 25), Q1 = sample(c(1:5, NA), 100, replace = TRUE), Q2 = sample(c(1:5, NA), 100, replace = TRUE), Q3 = sample(c(1:5, NA), 100, replace = TRUE), Q4 = sample(c(1:5, NA), 100, replace =

R How to Pass a function as a String Inside another Function

阅读更多关于 R How to Pass a function as a String Inside another Function

问题 Any assistance on this little conundrum would be mightily appreciated thanks. I am trying to pass an argument to the tq_transmute function from the tidyquant package; the value for the argument is a function, however I would like to pass it as a string (out with the scope of the example below I’ll be passing it via a Shiny selectInput ). I have tried every way I can think of to turn the string 'apply.quarterly' into the object apply.quarterly accepted by the mutate_fun argument. The commented

Using mutate over multiples columns with a for loop to recode values

阅读更多关于 Using mutate over multiples columns with a for loop to recode values

问题 I need to recode values over multiple columns of a data frame based on another table. I have to recode the values of multiple columns of a data table using a side table. The values correspond to geographic identifiers that I must replace with place names. So I decided to do a loop but what works outside the loop doesn't work anymore . I can't use mutate in for loop. My real data contains 274 columns with 38 columns to recode. This columns have many different names (they aren't call places")

How to do cumulative filtering with `purrr::accumulate`?

阅读更多关于 How to do cumulative filtering with `purrr::accumulate`?

问题 I'm looking for an approach to do something like this # this doesnt work # accumulate(1:8, ~filter(mtcars, carb >= .x)) So that I can examine some summary statistics at different cutoff values. I could simply do # this works but redundant filtering is done map2(list(mtcars), 1:8, ~filter(.x, carb >= .y)) But since my data is rather large, it doesn't make sense to filter out values that were already filtered out in the step just before. In essence, this just duplicates the original dataframe a

Unclear warning when defining custom pipe operator

阅读更多关于 Unclear warning when defining custom pipe operator

问题 In my process I need to perform many dplyr::inner_join s. Thought I might define a custom pipe operator for it as explained here: library(tidyverse) library(rlang) df1 <- tibble(a = 1:10, b = 11:20) df2 <- tibble(a = 1:10, c = 21:30) `%J>%` <- function(lhs, rhs){ inner_join(lhs, rhs) } df1 %J>% df2 This works as expected and I get: Joining, by = "a" # A tibble: 10 x 3 a b c <int> <int> <int> 1 1 11 21 2 2 12 22 3 3 13 23 4 4 14 24 5 5 15 25 6 6 16 26 7 7 17 27 8 8 18 28 9 9 19 29 10 10 20 30

(R) Cleaner way to use map() with list-columns

阅读更多关于 (R) Cleaner way to use map() with list-columns

问题 I am trying to move away from rowwise() for list columns as I have heard that the tidyverse team is in the process of axing it. However, I am not used to using the purrr functions so I feel like there must be a better way of doing the following: I create a list-column containing a tibble for each species. I then want to go into the tibble and take the mean of certain variables. The first case is using map and second is the rowwise solution that I personally feel is cleaner. Does anyone know a

Reorder factors by increasing frequency

阅读更多关于 Reorder factors by increasing frequency

问题 How do I reorder factor-valued columns by frequency - in increasing order? While the forcats package provides an explicit way to reorder a factor based on its frequency (fct_infreq()), it does so in decreasing frequency order. I need the reverse order of the factor frequency/counts. E.g. library(forcats) set.seed(555) df <- data.frame(x=factor(sample(as.character(1:10), 100, replace=TRUE))) table(df$x) 1 10 2 3 4 5 6 7 8 9 9 10 12 14 10 10 5 12 8 10 levels(fct_infreq(df$x)) [1] "3" "2" "7"