问题
I have a dataset wherein I have account number and "days past due" with every observation. For every account number, as soon as the "days past due" column hits a code like "DLQ3", I want to remove rest of the rows for that account (even if DLQ3 is the first observation for that account).
My dataset looks like:
Obs_month Acc_No OS_Bal Days_past_due
201005 2000000031 3572.68 NORM
201006 2000000031 4036.78 NORM
200810 2000000049 39741.97 NORM
200811 2000000049 38437.54 DLQ3
200812 2000000049 23923.98 DLQ1
200901 2000000049 35063.88 NORM
So, for account 2000000049, I want to remove all the rows post the date 200812 as now it's in default.
So in all, I want to see when the account hits DLQ3 and when it does I want to remove all the rows post the first DLQ3 observation.
What I tried was to subset the data with all DLQ3 observations and order the observation month in ascending order and getting an unique list of account number which have DLQ3 and their first month of hitting DLQ3. Post that I thought I could do some left_join with the orginal data and use ifelse but the flow is dicey.
回答1:
The following function will scan your data frame and find the row containing the DLQ3 tag. It will then remove all rows for that account number that occur after that tag.
scan_table <- function(data_frame, due_column, acct_column, due_tag) {
for(i in 1:nrow(data_frame)) {
if(data_frame[i,c(due_column)] == due_tag) {
# remove rows past here, for this account
acct_num <- data_frame[i,c(acct_column)]
top_frame <- data_frame[1:i,] # cut point
sub_frame <- subset(data_frame, Acc_No != acct_num)
final_frame <- unique(do.call('rbind', list(top_frame, sub_frame)))
return(final_frame)
}
}
}
Example:
df
Usage:
scan_table(df, 'Days_past_due', 'Acc_No', 'DLQ3')
Let me know if you wanted something different.
回答2:
Given your example
data <- read.table(text=
"Obs_month Acc_No OS_Bal Days_past_due
201005 2000000031 3572.68 NORM
201006 2000000031 4036.78 NORM
200810 2000000049 39741.97 NORM
200811 2000000049 38437.54 DLQ3
200812 2000000049 23923.98 DLQ1
200901 2000000049 35063.88 NORM", stringsAsFactors=F, header=T)
I will sort it
data <- data[with(data, order(Acc_No, Obs_month)), ]
and define a function that allows you to set the code indicating expiry ("DLQ3" or "DLQ1" from your example)
sbst <- function(data, pattern){
if( all(data$Days_past_due %in% "NORM") == TRUE){
return(data)} else{
indx <- min(grep(1, match(data$Days_past_due, pattern, nomatch = 0)))
data <- data[1:indx,]
return(data)
}
}
Finally, apply the function and aggregate the lists of data.frame into final data.frame
Reduce(rbind, lapply(split(data, data$Acc_No), sbst, patter="DLQ3"))
# Obs_month Acc_No OS_Bal Days_past_due
#1 201005 2000000031 3572.68 NORM
#2 201006 2000000031 4036.78 NORM
#3 200810 2000000049 39741.97 NORM
#4 200811 2000000049 38437.54 DLQ3
来源:https://stackoverflow.com/questions/48219968/how-to-remove-rows-after-a-particular-observation-is-seen-for-the-first-time