Renames levels of factor conditional on unique levels of another factor

前端 未结 1 1991
礼貌的吻别
礼貌的吻别 2021-01-25 21:14

Lets say I have a dataframe like so:

df<- 
   plantfam        plantsp                  plantcn
   Asteraceae                               fuzzy leaf
   Aster         


        
相关标签:
1条回答
  • 2021-01-25 21:50

    Here's the code block first, then I'll explain what I did:

    temp <- df %>% 
            filter(is.na(plantsp)) %>% 
            group_by(plantfam, plantcn) %>% 
            summarize(plantsp=NA) %>%
            group_by(plantfam) %>%
            mutate(dummy = cumsum(!is.na(plantcn))) %>%
            mutate(plantsp = paste0(plantfam, " morpho", dummy)) %>%
            select(-dummy)
    

    The first thing I suggest is to remove entries that do not require mutation filter(is.na(plantsp)).

    Then aggregate redundant entries with group_by(plantfam, plantcn) %>% summarize(plantsp=NA).

    I added a dummy variable that counts plantcn in each group with mutate(dummy = cumsum(!is.na(plantcn))).

    I use this dummy variable for creating the string you want mutate(plantsp = paste0(plantfam, " morpho", dummy)).

    Finally, get rid of the dummy column with select(-dummy).

    This is the resulting data frame:

    temp
        plantfam        plantcn            plantsp
           <chr>          <chr>              <chr>
    1 Asteraceae     fuzzy leaf Asteraceae morpho1
    2 Asteraceae non-fuzzy leaf Asteraceae morpho2
    3    Poaceae          3vien    Poaceae morpho1
    

    You can add back the entries that already had plantsp names with:

    new.df <- df %>% 
              filter(!is.na(plantsp)) %>%
              full_join( ., temp, by = c("plantfam","plantsp","plantcn"))
    new.df
    

    NOTE: You will need to do something a little more complicated if you want to keep redundant entries

    0 讨论(0)
提交回复
热议问题