tidyverse

R Function to identify non-matching rows

房东的猫 提交于 2021-02-04 21:20:46
问题 I am trying to compare 2 data.frames, "V1" represents my CRM, "V2" represents Leads that I would like to send out. 'V1 has roughly 8k elements' 'V2 has roughly 25k elements' I need to compare every row in V2 to every row in V1, discard every instance where a V2 element exists in V1. I would then like to return only the elements that do not appear either exactly or loosely in V1 into the Leads column. The goal is to send out a lead(V2) that does not exist in CRM(V1). I've made some good

Passing column name into function

 ̄綄美尐妖づ 提交于 2021-02-04 21:06:56
问题 I have a simple problem with non-standard evaluation: passing a variable name as an argument into a function. As a reproducible example, here's a simple thing: taking the mean of one variable, mpg from the mtcars dataset. My end goal is to have a function where I can input the dataset and the variable, and get the mean. So without a function: library(tidyverse) mtcars %>% summarise(mean = mean(mpg)) #> mean #> 1 20.09062 I've tried to use get() for non-standard evaluation, but I'm getting

Passing column name into function

两盒软妹~` 提交于 2021-02-04 21:05:59
问题 I have a simple problem with non-standard evaluation: passing a variable name as an argument into a function. As a reproducible example, here's a simple thing: taking the mean of one variable, mpg from the mtcars dataset. My end goal is to have a function where I can input the dataset and the variable, and get the mean. So without a function: library(tidyverse) mtcars %>% summarise(mean = mean(mpg)) #> mean #> 1 20.09062 I've tried to use get() for non-standard evaluation, but I'm getting

Create numerically encoded dummy variables efficiently in R?

。_饼干妹妹 提交于 2021-02-04 19:58:55
问题 How can we transform data of the form df <- structure(list(customer_number = c(3, 3, 1, 1, 3), item = c("milkshake","burger", "apple", "burger", "water") ), row.names = c(NA, -5L), class = "data.frame") # customer_number item # 1 3 milkshake # 2 3 burger # 3 1 apple # 4 1 burger # 5 3 water into numerically encoded dummy variables, like this data.frame(customer_number=c(1,3), item_milkshake=c(0,1), item_burger=c(1,1), item_apple=c(1,0), item_water=c(0,1)) # customer_number item_milkshake item

Renaming multiple columns with dplyr rename(across(

你。 提交于 2021-02-04 19:08:53
问题 Hey i'm trying to rename some columsn by adding "Last_" with the new version of dplyr but I keep getting this error Error: `across()` must only be used inside dplyr verbs. this is my code data %>% rename(across(everything(), ~paste0("Last_", .))) dplyr version: v1.0.2 回答1: We can use rename_with instead of rename library(dplyr) library(stringr) data %>% rename_with(~str_c("Last_", .), everything()) Reproducible example data(iris) head(iris) %>% rename_with(~str_c("Last_", .), .cols =

Error message when installing xml2 R package

南笙酒味 提交于 2021-02-04 08:13:06
问题 After updating to R 4.0.0 on my Windows machine, I can't install some packages such as xml2 (the same goes for foreign and nnet ). When I try to install I get this error message: * installing *source* package 'foreign' ... ** package 'foreign' successfully unpacked and MD5 sums checked ** using staged installation ** libs *** arch - i386 "c:/rtools40/mingw32/bin/"gcc -I"C:/PROGRA~1/R/R-40~1.0/include" -DNDEBUG -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c R_systat.c -o R_systat.o

Error with select function from dplyr

試著忘記壹切 提交于 2021-02-04 04:45:32
问题 When I use the select function from dplyr, it doesn't work and gives me an error stating that the column names that I want to select are unused arguments. However, if I specify dplyr before the function call like s: "dplyr::select" then it works as normal: Here is a sample df: sampledf <- structure(list(CRN = c(5497L, 6515L, 7248L, 36956L, 37021L), varA = structure(c(2L, 2L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"), varB = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA

Assign event number based on Date of occurece in R dataframe

℡╲_俬逩灬. 提交于 2021-01-29 19:50:40
问题 How to assign an event number based on their date of occurrence satisfying the following conditions. If the event occurs for at least 3 consecutive days ( or more ) assign event number e1 and so on and mutate (join) with the original data frame. If the occurrence is not for continuous 3 days, assign NA and mutate with the original data frame. In time series dts how can I achieve it. The output data frame would be like dts_output (done manually). dts<-structure(list(Date = structure(c(16442,

How to optimize case_when in a function?

北城以北 提交于 2021-01-29 18:44:45
问题 I would like to write a function that creates a binning variable based on some raw data. Specifically, I have a dateset with the age values for each respondent and I would like to write a function that classifies that person into an age group, where the age group is a parameter of that function. This is what I started with: data <- data.frame(age = 18:100) foo <- function(data, brackets = list(18:24, 25:34, 35:59)) { require(tidyverse) tmp <- data %>% drop_na(age) %>% mutate(age_bracket =

How to calculate several slopes from linear regressions in tidyverse

大兔子大兔子 提交于 2021-01-29 15:31:18
问题 I have measured the methane concentration in soil incubations (closed jars with soil in them) over time. To calculate the methane production rate I need to fit a second‐order polynomial regression model to the relationship between methane concentration (ch4_umol) and time (stamp). I would like to make two new columns to my dataset: The value of the regression line slope and the Rsquared value. I would like to calculate these two values for each "jar_camp". Can anyone help with this? That