My objective is to predict the next 3 events of each id_num based on their previous events

限于喜欢 提交于 2019-12-25 00:34:20

问题


I am new to data science and I am working on a model that kind of looks like the sample data shown below. However in the orginal data there are many id_num and Events. My objective is to predict the next 3 events of each id_num based on their previous Events.

Please help me in solving this or regarding the method to be used for solving, using R programming.


回答1:


The simplest "prediction" is to assume that the sequence of letters will repeat for each id_num. I hope this is in line what the OP understands by "prediction".

The code

library(data.table)
DT[, .(Events = append(Events, head(rep(Events, 3L), 3L))), by = id_num]

creates

    id_num Events
 1:      1      A
 2:      1      B
 3:      1      C
 4:      1      D
 5:      1      E
 6:      1      A
 7:      1      B
 8:      1      C
 9:      2      B
10:      2      E
11:      2      B
12:      2      E
13:      2      B
14:      3      E
15:      3      A
16:      3      E
17:      3      A
18:      3      E
19:      3      A
20:      3      E
21:      4      C
22:      4      C
23:      4      C
24:      4      C
25:      5      F
26:      5      G
27:      5      F
28:      5      G
29:      5      F
    id_num Events

data.table is used here because of the easy to use grouping function and because I'm acquainted with it.

Explanation

For each id_num the existing sequence of letters is replicated 3 times using rep() to ensure enough values to fill at least 3 next values. But, only the first 3 values are taken using head(). These 3 values are appended to the existing sequence for each id_num

Some tuning

There are two possible optimisations:

  1. If the sequence of values is much longer than the number of values to predict n_pred, simply repeating the long sequence n_pred times is a waste.
  2. The call to append() can be avoided if the existing sequence will be repeated one more time.

So, the optimised code looks like:

n_pred <- 3L
DT[, .(Events = head(rep(Events, 1L + ceiling(n_pred / .N)), .N + n_pred)), by = id_num]

Note that .N is a special symbol in data.table syntax containing the number rows in a group. head() now returns the original sequence plus the predicted values.

Data

DT <- data.table(
  id_num = c(rep(1L, 5L), 2L, 2L, rep(3L, 4L), 4L, 5L, 5L),
  Events = c(LETTERS[1:5], "B", "E", rep(c("E", "A"), 2L), "C", "F", "G")
)
DT
    id_num Events
 1:      1      A
 2:      1      B
 3:      1      C
 4:      1      D
 5:      1      E
 6:      2      B
 7:      2      E
 8:      3      E
 9:      3      A
10:      3      E
11:      3      A
12:      4      C
13:      5      F
14:      5      G


来源:https://stackoverflow.com/questions/45250893/my-objective-is-to-predict-the-next-3-events-of-each-id-num-based-on-their-previ

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!