Group values by unique elements

前端 未结 1 1157
隐瞒了意图╮
隐瞒了意图╮ 2020-12-12 03:29

I have a vector that looks like this:

a <- c(\"A110\",\"A110\",\"A110\",\"B220\",\"B220\",\"C330\",\"D440\",\"D440\",\"D440\",\"D440\",\"D440\",\"D440\",         


        
相关标签:
1条回答
  • 2020-12-12 04:00

    First of all, (I assume) this is your vector

    a <- c("A110","A110","A110","B220","B220","C330","D440","D440","D440","D440","D440","D440","E550")
    

    As per possible solutions, here are few (can't find a good dupe right now)

    as.integer(factor(a))
    # [1] 1 1 1 2 2 3 4 4 4 4 4 4 5
    

    Or

    cumsum(!duplicated(a))
    # [1] 1 1 1 2 2 3 4 4 4 4 4 4 5
    

    Or

    match(a, unique(a))
    # [1] 1 1 1 2 2 3 4 4 4 4 4 4 5
    

    Also rle will work the similarly in your specific scenario

    with(rle(a), rep(seq_along(values), lengths))
    # [1] 1 1 1 2 2 3 4 4 4 4 4 4 5
    

    Or (which is practically the same)

    data.table::rleid(a)
    # [1] 1 1 1 2 2 3 4 4 4 4 4 4 5
    

    Though be advised that all 4 solutions have their unique behavior in different scenarios, consider the following vector

    a <- c("B110","B110","B110","A220","A220","C330","D440","D440","B110","B110","E550")
    

    And the results of the 4 different solutions:

    1.

    as.integer(factor(a))
    # [1] 2 2 2 1 1 3 4 4 2 2 5
    

    The factor solution begins with 2 because a is unsorted and hence the first values are getting higher integer representation within the factor function. Hence, this solution is only valid if your vector is sorted, so don't use it other wise.

    2.

    cumsum(!duplicated(a))
    # [1] 1 1 1 2 2 3 4 4 4 4 5
    

    This cumsum/duplicated solution got confused because of "B110" already been present at the beginning and hence grouped "D440","D440","B110","B110" into the same group.

    3.

    match(a, unique(a))
    # [1] 1 1 1 2 2 3 4 4 1 1 5
    

    This match/unique solution added ones at the end, because it is sensitive to "B110" showing up in more than one sequences (because of unique) and hence grouping them all into same group regardless of where they appear

    4.

    with(rle(a), rep(seq_along(values), lengths))
    # [1] 1 1 1 2 2 3 4 4 5 5 6
    

    This solution only cares about sequences, hence different sequences of "B110" were grouped into different groups

    0 讨论(0)
提交回复
热议问题