fuzzyjoin

Passing arguments into multiple match_fun functions in R fuzzyjoin::fuzzy_join

ぐ巨炮叔叔 提交于 2019-12-04 18:14:46
I was answering these two questions and got an adequate solution, but I had trouble passing arguments using fuzzy_join into the match_fun that I extracted from fuzzyjoin::stringdist_join . In this case, I'm using a mix of multiple match_fun's, including this customized match_fun_stringdist and also == and <= for exact and criteria matching. The error message I'm getting is: # Error in mf(rep(u_x, n_y), rep(u_y, each = n_x), ...): object 'ignore_case' not found # Data: library(data.table, quietly = TRUE) Address1 <- c("786, GALI NO 5, XYZ","rambo, 45, strret 4, atlast, pqr","23/4, 23RD FLOOR,

R fill new column based on interval from another dataset (lookup)

别来无恙 提交于 2019-12-02 16:09:05
问题 Lets say I have this dataset: df1 = data.frame(groupID = c(rep("a", 6), rep("b", 6), rep("c", 6)), testid = c(111, 222, 333, 444, 555, 666, 777, 888, 999, 1010, 1111, 1212, 1313, 1414, 1515, 1616, 1717, 1818)) df1 groupID testid 1 a 111 2 a 222 3 a 333 4 a 444 5 a 555 6 a 666 7 b 777 8 b 888 9 b 999 10 b 1010 11 b 1111 12 b 1212 13 c 1313 14 c 1414 15 c 1515 16 c 1616 17 c 1717 18 c 1818 And I have this 2nd dataset: df2 = data.frame(groupID = c("a", "a", "a", "a", "b", "b", "b", "c", "c", "c"

R fill new column based on interval from another dataset (lookup)

Deadly 提交于 2019-12-02 12:08:58
Lets say I have this dataset: df1 = data.frame(groupID = c(rep("a", 6), rep("b", 6), rep("c", 6)), testid = c(111, 222, 333, 444, 555, 666, 777, 888, 999, 1010, 1111, 1212, 1313, 1414, 1515, 1616, 1717, 1818)) df1 groupID testid 1 a 111 2 a 222 3 a 333 4 a 444 5 a 555 6 a 666 7 b 777 8 b 888 9 b 999 10 b 1010 11 b 1111 12 b 1212 13 c 1313 14 c 1414 15 c 1515 16 c 1616 17 c 1717 18 c 1818 And I have this 2nd dataset: df2 = data.frame(groupID = c("a", "a", "a", "a", "b", "b", "b", "c", "c", "c"), testid = c(222, 333, 555, 666, 777, 999, 1010, 1313, 1616, 1818), bd = c(1, 1, 2, 2, 0, 1, 1, 1, 1,

Doing a “fuzzyjoin” (and non-fuzzyjoin) in combination with a merge in data.table

有些话、适合烂在心里 提交于 2019-12-02 06:21:33
问题 I am using multiple databases. For each of these databases I have created a key called matchcode . This matchcode is a combination of a country code and a year. Mostly when I merge these datasets I simply do: dfA<- merge(dfA, dfB, by= "matchcode", all.x = TRUE, allow.cartesian=FALSE) The problem is that sometimes the years do not completely match: dfA <- read.table( text = "A B C D E F G iso year matchcode 1 0 1 1 1 0 1 0 NLD 2010 NLD2010 2 1 0 0 0 1 0 1 NLD 2014 NLD2014 3 0 0 0 1 1 0 0 AUS

Doing a “fuzzy” and non-fuzzy, many to 1 merge with data.table

主宰稳场 提交于 2019-12-02 04:15:28
问题 Lets assume I have two databases dfA and dfB . One has individual observations and one has country level data (which is applicable to multiple observations which are from the same year and country) For each of these databases I have created a key called matchcode. This matchcode is a combination of a country code and a year. dfA <- read.table( text = "A B C D E F G iso year matchcode 1 0 1 1 1 0 1 0 NLD 2010 NLD2010 2 1 0 0 0 1 0 1 NLD 2014 NLD2014 3 0 0 0 1 1 0 0 AUS 2010 AUS2010 4 1 0 1 0 0

Doing a “fuzzyjoin” (and non-fuzzyjoin) in combination with a merge in data.table

China☆狼群 提交于 2019-12-02 00:40:29
I am using multiple databases. For each of these databases I have created a key called matchcode . This matchcode is a combination of a country code and a year. Mostly when I merge these datasets I simply do: dfA<- merge(dfA, dfB, by= "matchcode", all.x = TRUE, allow.cartesian=FALSE) The problem is that sometimes the years do not completely match: dfA <- read.table( text = "A B C D E F G iso year matchcode 1 0 1 1 1 0 1 0 NLD 2010 NLD2010 2 1 0 0 0 1 0 1 NLD 2014 NLD2014 3 0 0 0 1 1 0 0 AUS 2010 AUS2010 4 1 0 1 0 0 1 0 AUS 2006 AUS2006 5 0 1 0 1 0 1 1 USA 2008 USA2008 6 0 0 1 0 0 0 1 USA 2010

Simultaneous fuzzy and non-fuzzy join

痴心易碎 提交于 2019-12-01 10:56:40
Say I have this data frame: # Set random seed set.seed(33550336) # Number of IDs n <- 5 # Create data frames df <- data.frame(ID = rep(1:n, each = 10), loc = seq(10, 100, by =10)) # ID loc # 1 1 10 # 2 1 20 # 3 1 30 # 4 1 40 # 5 1 50 # 6 1 60 # 7 1 70 # 8 1 80 # 9 1 90 # 10 1 100 # 11 2 10 # 12 2 20 # 13 2 30 # 14 2 40 # 15 2 50 # 16 2 60 # 17 2 70 # 18 2 80 # 19 2 90 # 20 2 100 # 21 3 10 # 22 3 20 # 23 3 30 # 24 3 40 # 25 3 50 # 26 3 60 # 27 3 70 # 28 3 80 # 29 3 90 # 30 3 100 # 31 4 10 # 32 4 20 # 33 4 30 # 34 4 40 # 35 4 50 # 36 4 60 # 37 4 70 # 38 4 80 # 39 4 90 # 40 4 100 # 41 5 10 # 42 5

Simultaneous fuzzy and non-fuzzy join

那年仲夏 提交于 2019-12-01 08:00:27
问题 Say I have this data frame: # Set random seed set.seed(33550336) # Number of IDs n <- 5 # Create data frames df <- data.frame(ID = rep(1:n, each = 10), loc = seq(10, 100, by =10)) # ID loc # 1 1 10 # 2 1 20 # 3 1 30 # 4 1 40 # 5 1 50 # 6 1 60 # 7 1 70 # 8 1 80 # 9 1 90 # 10 1 100 # 11 2 10 # 12 2 20 # 13 2 30 # 14 2 40 # 15 2 50 # 16 2 60 # 17 2 70 # 18 2 80 # 19 2 90 # 20 2 100 # 21 3 10 # 22 3 20 # 23 3 30 # 24 3 40 # 25 3 50 # 26 3 60 # 27 3 70 # 28 3 80 # 29 3 90 # 30 3 100 # 31 4 10 # 32