sklearn: User defined cross validation for time series data

后端 未结 2 1229
执念已碎
执念已碎 2021-02-14 10:52

I\'m trying to solve a machine learning problem. I have a specific dataset with time-series element. For this problem I\'m using well-known python library - sklea

相关标签:
2条回答
  • 2021-02-14 11:45

    You can obtain the desired cross-validation splits without using sklearn. Here's an example

    import numpy as np
    
    from sklearn.svm import SVR
    from sklearn.feature_selection import RFECV
    
    # Generate some data.
    N = 10
    X_train = np.random.randn(N, 3)
    y_train = np.random.randn(N)
    
    # Define the splits.
    idxs = np.arange(N)
    cv_splits = [(idxs[:i], idxs[i:]) for i in range(1, N)]
    
    # Create the RFE object and compute a cross-validated score.
    svr = SVR(kernel="linear")
    rfecv = RFECV(estimator=svr, step=1, cv=cv_splits)
    rfecv.fit(X_train, y_train)
    
    0 讨论(0)
  • 2021-02-14 11:46

    Meanwhile this was added to the library: http://scikit-learn.org/stable/modules/cross_validation.html#time-series-split

    Example from the doc:

    >>> from sklearn.model_selection import TimeSeriesSplit
    
    >>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4], [1, 2], [3, 4]])
    >>> y = np.array([1, 2, 3, 4, 5, 6])
    >>> tscv = TimeSeriesSplit(n_splits=3)
    >>> print(tscv)  
    TimeSeriesSplit(n_splits=3)
    >>> for train, test in tscv.split(X):
    ...     print("%s %s" % (train, test))
    [0 1 2] [3]
    [0 1 2 3] [4]
    [0 1 2 3 4] [5]
    
    0 讨论(0)
提交回复
热议问题