I have a data frame in R that is supposed to have duplicates. However, there are some duplicates that I would need to remove. In particular, I only want to
Try
df[with(df, c(x[-1]!= x[-nrow(df)], TRUE)),]
# x y
#1 A 1
#2 B 2
#3 C 3
#4 A 4
#5 B 5
#6 C 6
#7 A 7
#9 B 9
#10 C 10
Here, we are comparing an element with the element preceding it. This can be done by removing the first element
from the column and that column compared with the column from which last element
is removed (so that the lengths become equal)
df$x[-1] #first element removed
#[1] B C A B C A B B C
df$x[-nrow(df)]
#[1] A B C A B C A B B #last element `C` removed
df$x[-1]!=df$x[-nrow(df)]
#[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE
In the above, the length is 1
less than the nrow
of df
as we removed one element. Inorder to compensate that, we can concatenate a TRUE
and then use this index
for subsetting the dataset.