问题
I have a dataset of the following type (first row is the header):
content
is always textmerge
is always a logical
id1 id2 start_line end_line content merge
A B 1 1 "aaaa" TRUE
A B 4 4 "aa mm" TRUE
A B 5 5 "boool" TRUE
A B 6 6 "omw" TRUE
C D 6 6 "hear!" TRUE
C D 7 7 " me out!" TRUE
C D 21 21 "hello" FALSE
Problem: I need to merge following a very specific criteria:
- Rows that have
merge = FALSE
must remain as is - Rows that have: same
id1
, sameid2
and consecutivestart_line
:- Need to be appended on the column
content
- The
end_line
value needs to change to the last row
- Need to be appended on the column
So, the expected result would be:
id1 id2 start_line end_line content merge
A B 1 1 "aaaa" TRUE
A B 4 6 "aa mm boool omw" TRUE
C D 6 7 "hear! me out!" TRUE
C D 21 21 "hello" FALSE
Notice in the example that:
- The minimal merge is with two lines (example of ids: C-D, originally 6th and 7th rows)
- There can be multiple lines to merge (example of ids A-B, originally rows 2nd, 3rd, 4th)
I have attempted a very large, and inefficient series of loops, that only merge two lines. That is why I am not posting my attempt here.
回答1:
Using dplyr
you can try :
library(dplyr)
df %>%
group_by(id1, id2, grp = cumsum(c(TRUE, diff(start_line) > 1))) %>%
summarise(start_line = first(start_line),
end_line = last(end_line),
content = paste(content, collapse = " "),
merge = any(merge))
# id1 id2 grp start_line end_line content merge
# <chr> <chr> <int> <int> <int> <chr> <lgl>
#1 A B 1 1 1 aaaa TRUE
#2 A B 2 4 6 aa mm boool omw TRUE
#3 C D 2 6 7 hear! me out! TRUE
#4 C D 3 21 21 hello FALSE
data
df <- structure(list(id1 = c("A", "A", "A", "A", "C", "C", "C"), id2 = c("B",
"B", "B", "B", "D", "D", "D"), start_line = c(1L, 4L, 5L, 6L,
6L, 7L, 21L), end_line = c(1L, 4L, 5L, 6L, 6L, 7L, 21L), content = c("aaaa",
"aa mm", "boool", "omw", "hear!", " me out!", "hello"), merge = c(TRUE,
TRUE, TRUE, TRUE, TRUE, TRUE, FALSE)), class = "data.frame",
row.names = c(NA, -7L))
来源:https://stackoverflow.com/questions/63108159/r-concatenate-cell-in-dataframe-by-group-depending-on-another-cell-value