Group by similar results in R

前端 未结 2 584
青春惊慌失措
青春惊慌失措 2021-01-29 05:14

I want to group_by similar results (not unique) and I don\'t know how to do it.

I mean, I have a df with a column called \'name\' that has similar results like: ARPO, AR

2条回答
  •  遥遥无期
    2021-01-29 05:32

    Here I have an example to input:

    df <- tibble::tribble(
      ~name,       ~number,       ~ind,
      "ARPO",      "405162",      5,
      "ARPO S.L.", "504653",      22,
      "ARPOS",     "900232",      1,
      "ARPO",      "504694",      12,
      "ARPO",      "400304",      42,
      "JJJJ",      "401605",      2,
      "JJJJ",      "900029",      31,
      "BBBBB",     "400090",      25,
      "BBBBB",     "403004",      33,
      "JJJJ",      "900222",      2,
      "BBBBB",     "403967",      11,
      "BBBB",      "400304",      52,
      "JJJJ",      "404308",      200,
      "ARPO",      "403898",      2,
      "ARPO",      "158159",      24,
      "BBBBBBB",   "700805",      2,
      "ARPO S.L.", "900245",      24,
      "JJJJ",      "501486",      2,
      "JJJJ",      "400215",      210,
      "JJJJ",      "504379",      26,
      "HARPO",     "900222",      400,
      "BBBBB",     "109700",      46,
      "ARPO",      "142173",      14,
      "BBBBB",     "400586",      22,
      "ARPO",      "401605",      322
    )
    

    I found a similar solution here: Group together levels with similar names R

    x <- df$name
    
    groups <- list()
    i <- 1
    while(length(x) > 0) {
    
      id <- agrep(x[1], x, ignore.case = TRUE, max.distance = 0.1)
      groups[[i]] <- x[id]
      x <- x[-id]
      i <- i + 1
    
    }
    
    

    So, from that point, you can create a group variable:

    df$group <- ""
    
    for (j in 1:length(groups)){
      df$group <- ifelse(df$name %in% groups[[j]], paste0("group_",j), df$group)
    }
    

    Maybe you can find a simpler solution, but this works!

提交回复
热议问题