fuzzyjoin

fuzzyjoin with dates in R

浪尽此生 提交于 2020-11-29 09:51:11
问题 I am working on a project where I am analyzing individual-level survey data within countries based on outcomes of sports matches across countries and I am not sure what the most efficient way to produce the merge that I want is. I am working on two separate datasets. One contains individual-level data nested within countries. The data might look something like this: country <- c(rep("Country A", 4), rep("Country B", 6)) date <- c("2000-01-01", "2000-01-02", "2000-01-03", "2000-01-04", rep(

fuzzyjoin with dates in R

自闭症网瘾萝莉.ら 提交于 2020-11-29 09:49:45
问题 I am working on a project where I am analyzing individual-level survey data within countries based on outcomes of sports matches across countries and I am not sure what the most efficient way to produce the merge that I want is. I am working on two separate datasets. One contains individual-level data nested within countries. The data might look something like this: country <- c(rep("Country A", 4), rep("Country B", 6)) date <- c("2000-01-01", "2000-01-02", "2000-01-03", "2000-01-04", rep(

stringdist_join results in NAs

烂漫一生 提交于 2020-06-27 12:22:12
问题 i am experimenting with the stringdist package in order to make fuzzy joins and i run into a problem which i do not understand and fail to find an answer for. I want to join these 2 data tables with the "dl" method and it produces a NA, which i completely do not understand. Maybe one of you has an explanation for this. The code: library(fuzzyjoin) test1<-as.data.frame(test1<-c("techniker")) test2<-as.data.frame(test2<-c("technician")) setnames(test2,1,"label") setnames(test1,1,"label") x <-

stringdist_join results in NAs

隐身守侯 提交于 2020-06-27 12:21:43
问题 i am experimenting with the stringdist package in order to make fuzzy joins and i run into a problem which i do not understand and fail to find an answer for. I want to join these 2 data tables with the "dl" method and it produces a NA, which i completely do not understand. Maybe one of you has an explanation for this. The code: library(fuzzyjoin) test1<-as.data.frame(test1<-c("techniker")) test2<-as.data.frame(test2<-c("technician")) setnames(test2,1,"label") setnames(test1,1,"label") x <-

stringdist_join results in NAs

耗尽温柔 提交于 2020-06-27 12:21:30
问题 i am experimenting with the stringdist package in order to make fuzzy joins and i run into a problem which i do not understand and fail to find an answer for. I want to join these 2 data tables with the "dl" method and it produces a NA, which i completely do not understand. Maybe one of you has an explanation for this. The code: library(fuzzyjoin) test1<-as.data.frame(test1<-c("techniker")) test2<-as.data.frame(test2<-c("technician")) setnames(test2,1,"label") setnames(test1,1,"label") x <-

fuzzyjoin two data frames using data.table

旧时模样 提交于 2020-01-01 18:18:15
问题 I have been working on a fuzzyjoin to join 2 data frames together however due to memory issues the join causes cannot allocate memory of… . So I am trying to join the data using data.table . A sample of the data is below. df1 looks like: ID f_date ACCNUM flmNUM start_date end_date 1 50341 2002-03-08 0001104659-02-000656 2571187 2002-09-07 2003-08-30 2 1067983 2009-11-25 0001047469-09-010426 91207220 2010-05-27 2011-05-19 3 804753 2004-05-14 0001193125-04-088404 4805453 2004-11-13 2005-11-05 4

Passing arguments into multiple match_fun functions in R fuzzyjoin::fuzzy_join

有些话、适合烂在心里 提交于 2019-12-22 01:15:55
问题 I was answering these two questions and got an adequate solution, but I had trouble passing arguments using fuzzy_join into the match_fun that I extracted from fuzzyjoin::stringdist_join . In this case, I'm using a mix of multiple match_fun's, including this customized match_fun_stringdist and also == and <= for exact and criteria matching. The error message I'm getting is: # Error in mf(rep(u_x, n_y), rep(u_y, each = n_x), ...): object 'ignore_case' not found # Data: library(data.table,

Match some columns exactly, and some partially with inner_join

霸气de小男生 提交于 2019-12-10 11:37:06
问题 I have two dataframes from different sources that refer to the same people, but due to errors from self-reported data, the dates may be slightly off. Example data: df1 <- data.frame(name= c("Ann", "Betsy", "Charlie", "Dave"), dob= c(as.Date("2000-01-01", "%Y-%m-%d"), as.Date("2001-01-01", "%Y-%m-%d"), as.Date("2002-01-01", "%Y-%m-%d"), as.Date("2003-01-01", "%Y-%m-%d")), stringsAsFactors=FALSE) df2 <- data.frame(name= c("Ann", "Charlie", "Elmer", "Fred"), dob= c(as.Date("2000-01-11", "%Y-%m-