I have a data frame that I want to remove duplicates that are consecutive (in base). I know rle
may be helpful here but can\'t think of how to use it. The exa
Here a fast solution using filter
dat[(filter(dat,c(-1,1))!= 0)[,1],]
v1 v2
1 A Jan
3 E May
4 B Feb
7 A Jan
8 D Apr
10 A Mar
11 B Feb
12 E May
15 B Feb
18 C Mar
19 D Apr
NA <NA> <NA>
You need to add the last value of the original data to the result.
Here's a way, not with rle
, but a way none-the-less:
dat[with(dat, c(TRUE, diff(as.numeric(interaction(v1, v2))) != 0)), ]
This assumes you're using factor
columns, as your sample data implies.
Using rle
I came up with this
ind <- cumsum(rle(as.character(dat$v1))$length)
dat[ind, ]
ind
indicates either the first or the last of consecutive entries.
EDIT:
A simple solution to Matthews comment would be
dat[15, 2] <- "May"
dat[cumsum(rle(paste0(dat$v1, dat$v2))$length), ]