Find Match of two data frames and rewrite the answer as data frame

匿名 (未验证) 提交于 2019-12-03 02:16:02

问题:

i have two data frames which are cleaned and merged as a single csv file , the data frames are like this

  **Source                         Master**   chang chun petrochemical      CHANG CHUN GROUP  chang chun plastics           CHURCH AND DWIGHT CO INC  church  dwight                CITRIX SYSTEMS ASIA PACIFIC P L  citrix systems  pacific       CNH INDUSTRIAL N.V 

now from these , i have to consider the first name and check with each name of master names and find a match that is relevant and print the output as another data frame. the above data frames are few , but i am working with 20k values as such.

My output must look like this

 **Source                         Master                         Result**   chang chun petrochemical      CHANG CHUN GROUP                 CHANG CHUN GROUP  chang chun plastics           CHURCH AND DWIGHT CO INC         CHANG CHUN GROUP  church  dwight                CITRIX SYSTEMS ASIA PACIFIC P L  CHURCH AND DWIGHT CO INC  citrix systems  pacific       CNH INDUSTRIAL N.V               CITRIX SYSTEMS ASIA PACIFIC P L 

I tried this with possible ways with this link Merging through fuzzy matching of variables in R but , no luck so far..!

Thank in advance!!

when i use the above code for large set of data , the result is this-

code used:

Mast <- pmatch(Names$I_sender_O_Receiver_Customer, Master.Names$MOD, nomatch=NA) 

OUTPUT

NA NA  2  3 NA NA NA  6 NA NA  9 NA NA NA 12 NA NA NA 13 14 15 16 NA 18 19 20 21 22 NA 24 NA 26 NA 28 NA NA NA 30 NA NA 33 NA 35 36 37 NA 39 40 NA NA 43 NA 45 46 NA 48 49 50 51 52 53 54 55 56 57 58 NA  [68] 60 61 62 NA NA NA NA 64 NA 66 67 68 69 70 71 72 73 NA 75 76 77 78 NA 79 80 81 NA 83 84 85 86 87 88 

CODE:

Mast <- sapply(Names$I_sender_O_Receiver_Customer, function(x) {    agrep(x, Master.Names$MOD,value=TRUE) }) 

OUTPUT:

[[1]] character(0)  [[2]] character(0)  [[3]] [1] " CHURCH AND DWIGHT CO INC"  [[4]] [1] " CITRIX SYSTEMS ASIA PACIFIC P L"  [[5]] character(0) 

and even with for loop no result is produced.

code:

for(i in seq_len(nrow(df$ICIS_Cust_Names)))   {     df$reslt[i] <- grep(x = str_split(df$ICIS_Cust_Names[i]," ")[[1]][1], df$Master_Names[i],value=TRUE)   }   print(df$reslt) 

Code 2: Used for loop just for 100 rows

for (i in 100){   gr1$x[i] = agrep(gr1$ICIS_Cust_Names[i], gr2$Master_Names, value = TRUE, max = list(del = 0.2, ins = 0.3, sub = 0.4))   gr2$Y[i] = agrep(gr1$ICIS_Cust_Names[i], gr2$Master_Names, value = FALSE, max = list(del = 0.2, ins = 0.3, sub = 0.4)) } 

Result:

NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 

Error

Error in `$<-.data.frame`(`*tmp*`, "x", value = c(NA, NA, " church  dwight  " :    replacement has 3 rows, data has 100 

when observed the result for above is considered , as it checks directly with the row value of each data frames , but i want it to consider first element of Source and check with all the elements of master and come up with a match , likewise for rest. I would appreciate if someone could correct my code ! thanks in advance..!

回答1:

If you want to check the Master.Names only against the first word in Names, this could do the trick:

Names$Mast <- NA for(i in seq_len(nrow(Names)))      Names$Mast[i] <- grep(toupper(x = strsplit(Names[i,1]," ")[[1]][1]), Master.Names$V1,value=TRUE) 

Edit

Using sapply instead of a loop could gain you some speed:

Names$Mast <- sapply(Names$V1, function(x) {     grep(toupper(x = strsplit(x," ")[[1]][1]), Master.Names$V1,value=TRUE) }) 

Results

> Names                         V1                            Mast 1 chang chun petrochemical                CHANG CHUN GROUP 2      chang chun plastics                CHANG CHUN GROUP 3            church dwight        CHURCH AND DWIGHT CO INC 4   citrix systems pacific CITRIX SYSTEMS ASIA PACIFIC P L 

Data

Master.Names <- read.csv(text="CHANG CHUN GROUP CHURCH AND DWIGHT CO INC CITRIX SYSTEMS ASIA PACIFIC P L CNH INDUSTRIAL N.V", header=FALSE)  Names <- read.csv(text="chang chun petrochemical chang chun plastics      church dwight           citrix systems pacific", header=FALSE) 


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!