Lets say I have a dataframe like so:
df<-
plantfam plantsp plantcn
Asteraceae fuzzy leaf
Aster
Here's the code block first, then I'll explain what I did:
temp <- df %>%
filter(is.na(plantsp)) %>%
group_by(plantfam, plantcn) %>%
summarize(plantsp=NA) %>%
group_by(plantfam) %>%
mutate(dummy = cumsum(!is.na(plantcn))) %>%
mutate(plantsp = paste0(plantfam, " morpho", dummy)) %>%
select(-dummy)
The first thing I suggest is to remove entries that do not require mutation filter(is.na(plantsp))
.
Then aggregate redundant entries with group_by(plantfam, plantcn) %>% summarize(plantsp=NA)
.
I added a dummy
variable that counts plantcn
in each group with mutate(dummy = cumsum(!is.na(plantcn)))
.
I use this dummy variable for creating the string you want mutate(plantsp = paste0(plantfam, " morpho", dummy))
.
Finally, get rid of the dummy
column with select(-dummy)
.
This is the resulting data frame:
temp
plantfam plantcn plantsp
<chr> <chr> <chr>
1 Asteraceae fuzzy leaf Asteraceae morpho1
2 Asteraceae non-fuzzy leaf Asteraceae morpho2
3 Poaceae 3vien Poaceae morpho1
You can add back the entries that already had plantsp
names with:
new.df <- df %>%
filter(!is.na(plantsp)) %>%
full_join( ., temp, by = c("plantfam","plantsp","plantcn"))
new.df
NOTE: You will need to do something a little more complicated if you want to keep redundant entries