问题
I have a dataset likes like below
ID. Invoice. Date of Invoice. paid or not.
1 1 10/31/2019 yes
1 1 10/31/2019 yes
1 2 11/30/2019 no
1 3 12/31/2019 no
2 1 09/30/2019 no
2 2 10/30/2019 no
2 3 11/30/2019 yes
3 1 7/31/2019 no
3 2 9/30/2019 yes
3 3 12/31/2019 no
4 1 7/31/2019 yes
4 2 9/30/2019 no
4 3 12/31/2019 yes
I would like to know whether the customers' willingness to pay. As long as a customer has paid a new invoice with an old invoice not paid, I will give him a good score. so for customer 1 and 3, I gave "good", customer 2 is a "bad" score.
so the final data will have one more column, with values of good and bad.
ID. Invoice. Date of Invoice. paid or not. Bad or good
1 1 10/31/2019 yes bad
1 1 10/31/2019 yes bad
1 2 11/30/2019 no bad
1 3 12/31/2019 no bad
2 1 09/30/2019 no good
2 2 10/30/2019 no good
2 3 11/30/2019 yes good
3 1 7/31/2019 no good
3 2 9/30/2019 yes good
3 3 12/31/2019 no good
4 1 7/31/2019 yes good
4 2 9/30/2019 no good
4 3 12/31/2019 yes good
回答1:
Not clear about the logic. May be, we can check for 'yes' in any of the rows except the first row after grouping by 'ID'
library(dplyr)
library(lubridate)
df1 %>%
mutate(Date_of_Invoice = mdy(Date_of_Invoice)) %>%
arrange(ID, Date_of_Invoice) %>%
group_by(ID) %>%
mutate(flag = c('bad', 'good')[1 + any(paid_or_not[-1] == "yes")])
# A tibble: 9 x 5
# Groups: ID [3]
# ID Invoice Date_of_Invoice paid_or_not flag
# <int> <int> <date> <chr> <chr>
#1 1 1 2019-09-30 no good
#2 1 2 2019-10-30 no good
#3 1 3 2019-11-30 yes good
#4 2 1 2019-10-31 yes bad
#5 2 2 2019-11-30 no bad
#6 2 3 2019-12-31 no bad
#7 3 1 2019-07-31 no good
#8 3 2 2019-09-30 yes good
#9 3 3 2019-12-31 no good
data
df1 <- structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), Invoice = c(1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), Date_of_Invoice = c("09/30/2019",
"10/30/2019", "11/30/2019", "10/31/2019", "11/30/2019", "12/31/2019",
"7/31/2019", "9/30/2019", "12/31/2019"), paid_or_not = c("no",
"no", "yes", "yes", "no", "no", "no", "yes", "no")), class = "data.frame", row.names = c(NA,
-9L))
回答2:
Assuming your Date of Invoice.
is ordered already, then here is a base R solution using ave
df$`good or band.` <- ave(df$`paid or not.`,df$ID., FUN = function(v) ifelse(which(v=="yes")==1,"bad","good"))
such that
> df
ID. Invoice. Date of Invoice. paid or not. good or band.
1 1 1 09/30/2019 no good
2 1 2 10/30/2019 no good
3 1 3 11/30/2019 yes good
4 2 1 10/31/2019 yes bad
5 2 2 11/30/2019 no bad
6 2 3 12/31/2019 no bad
7 3 1 7/31/2019 no good
8 3 2 9/30/2019 yes good
9 3 3 12/31/2019 no good
DATA
df <- structure(list(ID. = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), Invoice. = c(1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), `Date of Invoice.` = c("09/30/2019",
"10/30/2019", "11/30/2019", "10/31/2019", "11/30/2019", "12/31/2019",
"7/31/2019", "9/30/2019", "12/31/2019"), `paid or not.` = c("no",
"no", "yes", "yes", "no", "no", "no", "yes", "no")), class = "data.frame", row.names = c(NA,
-9L))
来源:https://stackoverflow.com/questions/60121743/generate-a-new-variable-based-on-the-aging-of-another-variable