generate a new variable based on the aging of another variable

问题

I have a dataset likes like below

ID. Invoice. Date of Invoice.  paid or not.  

1    1         10/31/2019       yes
1    1         10/31/2019       yes
1    2         11/30/2019       no
1    3         12/31/2019       no

2    1         09/30/2019       no
2    2         10/30/2019       no
2    3         11/30/2019       yes

3    1         7/31/2019        no
3    2         9/30/2019        yes
3    3         12/31/2019       no

4    1         7/31/2019        yes
4    2         9/30/2019        no
4    3         12/31/2019       yes

I would like to know whether the customers' willingness to pay. As long as a customer has paid a new invoice with an old invoice not paid, I will give him a good score. so for customer 1 and 3, I gave "good", customer 2 is a "bad" score.

so the final data will have one more column, with values of good and bad.

ID. Invoice. Date of Invoice. paid or not. Bad or good

1    1         10/31/2019       yes          bad
1    1         10/31/2019       yes          bad
1    2         11/30/2019       no           bad
1    3         12/31/2019       no           bad

2    1         09/30/2019       no           good
2    2         10/30/2019       no           good
2    3         11/30/2019       yes          good

3    1         7/31/2019        no           good
3    2         9/30/2019        yes          good
3    3         12/31/2019       no           good

4    1         7/31/2019        yes          good
4    2         9/30/2019        no           good
4    3         12/31/2019       yes          good

回答1:

Not clear about the logic. May be, we can check for 'yes' in any of the rows except the first row after grouping by 'ID'

library(dplyr)
library(lubridate)
df1 %>% 
   mutate(Date_of_Invoice = mdy(Date_of_Invoice)) %>% 
   arrange(ID, Date_of_Invoice) %>%
   group_by(ID) %>%
   mutate(flag = c('bad', 'good')[1 + any(paid_or_not[-1] == "yes")])
# A tibble: 9 x 5
# Groups:   ID [3]
#     ID Invoice Date_of_Invoice paid_or_not flag 
#  <int>   <int> <date>          <chr>       <chr>
#1     1       1 2019-09-30      no          good 
#2     1       2 2019-10-30      no          good 
#3     1       3 2019-11-30      yes         good 
#4     2       1 2019-10-31      yes         bad  
#5     2       2 2019-11-30      no          bad  
#6     2       3 2019-12-31      no          bad  
#7     3       1 2019-07-31      no          good 
#8     3       2 2019-09-30      yes         good 
#9     3       3 2019-12-31      no          good

data

df1 <- structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), Invoice = c(1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), Date_of_Invoice = c("09/30/2019", 
"10/30/2019", "11/30/2019", "10/31/2019", "11/30/2019", "12/31/2019", 
"7/31/2019", "9/30/2019", "12/31/2019"), paid_or_not = c("no", 
"no", "yes", "yes", "no", "no", "no", "yes", "no")), class = "data.frame", row.names = c(NA, 
-9L))

回答2:

Assuming your Date of Invoice. is ordered already, then here is a base R solution using ave

df$`good or band.` <- ave(df$`paid or not.`,df$ID., FUN = function(v) ifelse(which(v=="yes")==1,"bad","good"))

such that

> df
  ID. Invoice. Date of Invoice. paid or not. good or band.
1   1        1       09/30/2019           no          good
2   1        2       10/30/2019           no          good
3   1        3       11/30/2019          yes          good
4   2        1       10/31/2019          yes           bad
5   2        2       11/30/2019           no           bad
6   2        3       12/31/2019           no           bad
7   3        1        7/31/2019           no          good
8   3        2        9/30/2019          yes          good
9   3        3       12/31/2019           no          good

DATA

df <- structure(list(ID. = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), Invoice. = c(1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), `Date of Invoice.` = c("09/30/2019", 
"10/30/2019", "11/30/2019", "10/31/2019", "11/30/2019", "12/31/2019", 
"7/31/2019", "9/30/2019", "12/31/2019"), `paid or not.` = c("no", 
"no", "yes", "yes", "no", "no", "no", "yes", "no")), class = "data.frame", row.names = c(NA, 
-9L))

来源：https://stackoverflow.com/questions/60121743/generate-a-new-variable-based-on-the-aging-of-another-variable

标签

dplyr

gdata