I want to group_by similar results (not unique) and I don\'t know how to do it.
I mean, I have a df with a column called \'name\' that has similar results like: ARPO, AR
Here I have an example to input:
df <- tibble::tribble(
~name, ~number, ~ind,
"ARPO", "405162", 5,
"ARPO S.L.", "504653", 22,
"ARPOS", "900232", 1,
"ARPO", "504694", 12,
"ARPO", "400304", 42,
"JJJJ", "401605", 2,
"JJJJ", "900029", 31,
"BBBBB", "400090", 25,
"BBBBB", "403004", 33,
"JJJJ", "900222", 2,
"BBBBB", "403967", 11,
"BBBB", "400304", 52,
"JJJJ", "404308", 200,
"ARPO", "403898", 2,
"ARPO", "158159", 24,
"BBBBBBB", "700805", 2,
"ARPO S.L.", "900245", 24,
"JJJJ", "501486", 2,
"JJJJ", "400215", 210,
"JJJJ", "504379", 26,
"HARPO", "900222", 400,
"BBBBB", "109700", 46,
"ARPO", "142173", 14,
"BBBBB", "400586", 22,
"ARPO", "401605", 322
)
I found a similar solution here: Group together levels with similar names R
x <- df$name
groups <- list()
i <- 1
while(length(x) > 0) {
id <- agrep(x[1], x, ignore.case = TRUE, max.distance = 0.1)
groups[[i]] <- x[id]
x <- x[-id]
i <- i + 1
}
So, from that point, you can create a group variable:
df$group <- ""
for (j in 1:length(groups)){
df$group <- ifelse(df$name %in% groups[[j]], paste0("group_",j), df$group)
}
Maybe you can find a simpler solution, but this works!