How to remove rows after a particular observation is seen for the first time

问题

I have a dataset wherein I have account number and "days past due" with every observation. For every account number, as soon as the "days past due" column hits a code like "DLQ3", I want to remove rest of the rows for that account (even if DLQ3 is the first observation for that account).

My dataset looks like:

Obs_month  Acc_No       OS_Bal      Days_past_due
201005     2000000031   3572.68     NORM
201006     2000000031   4036.78     NORM
200810     2000000049   39741.97    NORM
200811     2000000049   38437.54    DLQ3
200812     2000000049   23923.98    DLQ1
200901     2000000049   35063.88    NORM

So, for account 2000000049, I want to remove all the rows post the date 200812 as now it's in default.

So in all, I want to see when the account hits DLQ3 and when it does I want to remove all the rows post the first DLQ3 observation.

What I tried was to subset the data with all DLQ3 observations and order the observation month in ascending order and getting an unique list of account number which have DLQ3 and their first month of hitting DLQ3. Post that I thought I could do some left_join with the orginal data and use ifelse but the flow is dicey.

回答1:

The following function will scan your data frame and find the row containing the DLQ3 tag. It will then remove all rows for that account number that occur after that tag.

scan_table <- function(data_frame, due_column, acct_column, due_tag) {
    for(i in 1:nrow(data_frame)) {
        if(data_frame[i,c(due_column)] == due_tag) {
            # remove rows past here, for this account 
            acct_num <- data_frame[i,c(acct_column)]
            top_frame <- data_frame[1:i,] # cut point
            sub_frame <- subset(data_frame, Acc_No != acct_num)
            final_frame <- unique(do.call('rbind', list(top_frame, sub_frame)))
            return(final_frame)
        }
    }
}

Example:

df

Usage:

scan_table(df, 'Days_past_due', 'Acc_No', 'DLQ3')

Let me know if you wanted something different.

回答2:

Given your example

data <- read.table(text=
"Obs_month  Acc_No       OS_Bal      Days_past_due
201005     2000000031   3572.68     NORM
201006     2000000031   4036.78     NORM
200810     2000000049   39741.97    NORM
200811     2000000049   38437.54    DLQ3
200812     2000000049   23923.98    DLQ1
200901     2000000049   35063.88    NORM", stringsAsFactors=F, header=T)

I will sort it

data <- data[with(data, order(Acc_No, Obs_month)), ]

and define a function that allows you to set the code indicating expiry ("DLQ3" or "DLQ1" from your example)

sbst <- function(data, pattern){
  if( all(data$Days_past_due %in% "NORM") == TRUE){
    return(data)} else{
      indx <- min(grep(1, match(data$Days_past_due, pattern, nomatch = 0)))
      data <- data[1:indx,]
      return(data)
    }
}

Finally, apply the function and aggregate the lists of data.frame into final data.frame

Reduce(rbind, lapply(split(data, data$Acc_No), sbst, patter="DLQ3"))
#  Obs_month     Acc_No   OS_Bal Days_past_due
#1    201005 2000000031  3572.68          NORM
#2    201006 2000000031  4036.78          NORM
#3    200810 2000000049 39741.97          NORM
#4    200811 2000000049 38437.54          DLQ3

来源：https://stackoverflow.com/questions/48219968/how-to-remove-rows-after-a-particular-observation-is-seen-for-the-first-time

标签

database

bigdata

data-modeling

finance