I\'m trying to solve a machine learning problem. I have a specific dataset with time-series element. For this problem I\'m using well-known python library - sklea
You can obtain the desired cross-validation splits without using sklearn
. Here's an example
import numpy as np
from sklearn.svm import SVR
from sklearn.feature_selection import RFECV
# Generate some data.
N = 10
X_train = np.random.randn(N, 3)
y_train = np.random.randn(N)
# Define the splits.
idxs = np.arange(N)
cv_splits = [(idxs[:i], idxs[i:]) for i in range(1, N)]
# Create the RFE object and compute a cross-validated score.
svr = SVR(kernel="linear")
rfecv = RFECV(estimator=svr, step=1, cv=cv_splits)
rfecv.fit(X_train, y_train)
Meanwhile this was added to the library: http://scikit-learn.org/stable/modules/cross_validation.html#time-series-split
Example from the doc:
>>> from sklearn.model_selection import TimeSeriesSplit
>>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4], [1, 2], [3, 4]])
>>> y = np.array([1, 2, 3, 4, 5, 6])
>>> tscv = TimeSeriesSplit(n_splits=3)
>>> print(tscv)
TimeSeriesSplit(n_splits=3)
>>> for train, test in tscv.split(X):
... print("%s %s" % (train, test))
[0 1 2] [3]
[0 1 2 3] [4]
[0 1 2 3 4] [5]