tidyverse | 易学教程

R Function to identify non-matching rows

阅读更多关于 R Function to identify non-matching rows

问题 I am trying to compare 2 data.frames, "V1" represents my CRM, "V2" represents Leads that I would like to send out. 'V1 has roughly 8k elements' 'V2 has roughly 25k elements' I need to compare every row in V2 to every row in V1, discard every instance where a V2 element exists in V1. I would then like to return only the elements that do not appear either exactly or loosely in V1 into the Leads column. The goal is to send out a lead(V2) that does not exist in CRM(V1). I've made some good

Passing column name into function

阅读更多关于 Passing column name into function

问题 I have a simple problem with non-standard evaluation: passing a variable name as an argument into a function. As a reproducible example, here's a simple thing: taking the mean of one variable, mpg from the mtcars dataset. My end goal is to have a function where I can input the dataset and the variable, and get the mean. So without a function: library(tidyverse) mtcars %>% summarise(mean = mean(mpg)) #> mean #> 1 20.09062 I've tried to use get() for non-standard evaluation, but I'm getting

Passing column name into function

阅读更多关于 Passing column name into function

Create numerically encoded dummy variables efficiently in R?

阅读更多关于 Create numerically encoded dummy variables efficiently in R?

问题 How can we transform data of the form df <- structure(list(customer_number = c(3, 3, 1, 1, 3), item = c("milkshake","burger", "apple", "burger", "water") ), row.names = c(NA, -5L), class = "data.frame") # customer_number item # 1 3 milkshake # 2 3 burger # 3 1 apple # 4 1 burger # 5 3 water into numerically encoded dummy variables, like this data.frame(customer_number=c(1,3), item_milkshake=c(0,1), item_burger=c(1,1), item_apple=c(1,0), item_water=c(0,1)) # customer_number item_milkshake item

Renaming multiple columns with dplyr rename(across(

阅读更多关于 Renaming multiple columns with dplyr rename(across(

问题 Hey i'm trying to rename some columsn by adding "Last_" with the new version of dplyr but I keep getting this error Error: `across()` must only be used inside dplyr verbs. this is my code data %>% rename(across(everything(), ~paste0("Last_", .))) dplyr version: v1.0.2 回答1: We can use rename_with instead of rename library(dplyr) library(stringr) data %>% rename_with(~str_c("Last_", .), everything()) Reproducible example data(iris) head(iris) %>% rename_with(~str_c("Last_", .), .cols =

Error message when installing xml2 R package

阅读更多关于 Error message when installing xml2 R package

问题 After updating to R 4.0.0 on my Windows machine, I can't install some packages such as xml2 (the same goes for foreign and nnet ). When I try to install I get this error message: * installing *source* package 'foreign' ... ** package 'foreign' successfully unpacked and MD5 sums checked ** using staged installation ** libs *** arch - i386 "c:/rtools40/mingw32/bin/"gcc -I"C:/PROGRA~1/R/R-40~1.0/include" -DNDEBUG -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c R_systat.c -o R_systat.o

Error with select function from dplyr

阅读更多关于 Error with select function from dplyr

问题 When I use the select function from dplyr, it doesn't work and gives me an error stating that the column names that I want to select are unused arguments. However, if I specify dplyr before the function call like s: "dplyr::select" then it works as normal: Here is a sample df: sampledf <- structure(list(CRN = c(5497L, 6515L, 7248L, 36956L, 37021L), varA = structure(c(2L, 2L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"), varB = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA

Assign event number based on Date of occurece in R dataframe

阅读更多关于 Assign event number based on Date of occurece in R dataframe

问题 How to assign an event number based on their date of occurrence satisfying the following conditions. If the event occurs for at least 3 consecutive days ( or more ) assign event number e1 and so on and mutate (join) with the original data frame. If the occurrence is not for continuous 3 days, assign NA and mutate with the original data frame. In time series dts how can I achieve it. The output data frame would be like dts_output (done manually). dts<-structure(list(Date = structure(c(16442,

How to optimize case_when in a function?

阅读更多关于 How to optimize case_when in a function?

问题 I would like to write a function that creates a binning variable based on some raw data. Specifically, I have a dateset with the age values for each respondent and I would like to write a function that classifies that person into an age group, where the age group is a parameter of that function. This is what I started with: data <- data.frame(age = 18:100) foo <- function(data, brackets = list(18:24, 25:34, 35:59)) { require(tidyverse) tmp <- data %>% drop_na(age) %>% mutate(age_bracket =

How to calculate several slopes from linear regressions in tidyverse

阅读更多关于 How to calculate several slopes from linear regressions in tidyverse

问题 I have measured the methane concentration in soil incubations (closed jars with soil in them) over time. To calculate the methane production rate I need to fit a second‐order polynomial regression model to the relationship between methane concentration (ch4_umol) and time (stamp). I would like to make two new columns to my dataset: The value of the regression line slope and the Rsquared value. I would like to calculate these two values for each "jar_camp". Can anyone help with this? That