问题
I am new to data science and I am working on a model that kind of looks like the sample data shown below. However in the orginal data there are many id_num
and Events
. My objective is to predict the next 3 events of each id_num
based on their previous Events
.
Please help me in solving this or regarding the method to be used for solving, using R programming.
回答1:
The simplest "prediction" is to assume that the sequence of letters will repeat for each id_num
. I hope this is in line what the OP understands by "prediction".
The code
library(data.table)
DT[, .(Events = append(Events, head(rep(Events, 3L), 3L))), by = id_num]
creates
id_num Events 1: 1 A 2: 1 B 3: 1 C 4: 1 D 5: 1 E 6: 1 A 7: 1 B 8: 1 C 9: 2 B 10: 2 E 11: 2 B 12: 2 E 13: 2 B 14: 3 E 15: 3 A 16: 3 E 17: 3 A 18: 3 E 19: 3 A 20: 3 E 21: 4 C 22: 4 C 23: 4 C 24: 4 C 25: 5 F 26: 5 G 27: 5 F 28: 5 G 29: 5 F id_num Events
data.table
is used here because of the easy to use grouping function and because I'm acquainted with it.
Explanation
For each id_num
the existing sequence of letters is replicated 3 times using rep()
to ensure enough values to fill at least 3 next values. But, only the first 3 values are taken using head()
. These 3 values are appended to the existing sequence for each id_num
Some tuning
There are two possible optimisations:
- If the sequence of values is much longer than the number of values to predict
n_pred
, simply repeating the long sequencen_pred
times is a waste. - The call to
append()
can be avoided if the existing sequence will be repeated one more time.
So, the optimised code looks like:
n_pred <- 3L
DT[, .(Events = head(rep(Events, 1L + ceiling(n_pred / .N)), .N + n_pred)), by = id_num]
Note that .N
is a special symbol in data.table
syntax containing the number rows in a group. head()
now returns the original sequence plus the predicted values.
Data
DT <- data.table(
id_num = c(rep(1L, 5L), 2L, 2L, rep(3L, 4L), 4L, 5L, 5L),
Events = c(LETTERS[1:5], "B", "E", rep(c("E", "A"), 2L), "C", "F", "G")
)
DT
id_num Events 1: 1 A 2: 1 B 3: 1 C 4: 1 D 5: 1 E 6: 2 B 7: 2 E 8: 3 E 9: 3 A 10: 3 E 11: 3 A 12: 4 C 13: 5 F 14: 5 G
来源:https://stackoverflow.com/questions/45250893/my-objective-is-to-predict-the-next-3-events-of-each-id-num-based-on-their-previ