Group by similar results in R

前端未结

关注

 2  589

青春惊慌失措 2021-01-29 05:14

I want to group_by similar results (not unique) and I don\'t know how to do it.

I mean, I have a df with a column called \'name\' that has similar results like: ARPO, AR

2条回答

遥遥无期 (楼主)

2021-01-29 05:32

Here I have an example to input:

df <- tibble::tribble(
  ~name,       ~number,       ~ind,
  "ARPO",      "405162",      5,
  "ARPO S.L.", "504653",      22,
  "ARPOS",     "900232",      1,
  "ARPO",      "504694",      12,
  "ARPO",      "400304",      42,
  "JJJJ",      "401605",      2,
  "JJJJ",      "900029",      31,
  "BBBBB",     "400090",      25,
  "BBBBB",     "403004",      33,
  "JJJJ",      "900222",      2,
  "BBBBB",     "403967",      11,
  "BBBB",      "400304",      52,
  "JJJJ",      "404308",      200,
  "ARPO",      "403898",      2,
  "ARPO",      "158159",      24,
  "BBBBBBB",   "700805",      2,
  "ARPO S.L.", "900245",      24,
  "JJJJ",      "501486",      2,
  "JJJJ",      "400215",      210,
  "JJJJ",      "504379",      26,
  "HARPO",     "900222",      400,
  "BBBBB",     "109700",      46,
  "ARPO",      "142173",      14,
  "BBBBB",     "400586",      22,
  "ARPO",      "401605",      322
)

I found a similar solution here: Group together levels with similar names R

x <- df$name

groups <- list()
i <- 1
while(length(x) > 0) {

  id <- agrep(x[1], x, ignore.case = TRUE, max.distance = 0.1)
  groups[[i]] <- x[id]
  x <- x[-id]
  i <- i + 1

}

So, from that point, you can create a group variable:

df$group <- ""

for (j in 1:length(groups)){
  df$group <- ifelse(df$name %in% groups[[j]], paste0("group_",j), df$group)
}

Maybe you can find a simpler solution, but this works!

0 讨论(0)

查看其它2个回答