Plotting decision boundary for High Dimension Data

后端 未结 1 1566
忘掉有多难
忘掉有多难 2021-02-02 11:38

I am building a model for binary classification problem where each of my data points is of 300 dimensions (I am using 300 features). I am using a PassiveAg

相关标签:
1条回答
  • 2021-02-02 11:55

    One way is to impose a Voronoi tesselation on your 2D plot, i.e. color it based on proximity to the 2D data points (different colors for each predicted class label). See recent paper by Migut et al., 2015.

    This is a lot easier than it sounds using a meshgrid and scikit's KNeighborsClassifier (this is an end to end example with the Iris dataset; replace the first few lines with your model/code):

    import numpy as np, matplotlib.pyplot as plt
    from sklearn.neighbors.classification import KNeighborsClassifier
    from sklearn.datasets.base import load_iris
    from sklearn.manifold.t_sne import TSNE
    from sklearn.linear_model.logistic import LogisticRegression
    
    # replace the below by your data and model
    iris = load_iris()
    X,y = iris.data, iris.target
    X_Train_embedded = TSNE(n_components=2).fit_transform(X)
    print X_Train_embedded.shape
    model = LogisticRegression().fit(X,y)
    y_predicted = model.predict(X)
    # replace the above by your data and model
    
    # create meshgrid
    resolution = 100 # 100x100 background pixels
    X2d_xmin, X2d_xmax = np.min(X_Train_embedded[:,0]), np.max(X_Train_embedded[:,0])
    X2d_ymin, X2d_ymax = np.min(X_Train_embedded[:,1]), np.max(X_Train_embedded[:,1])
    xx, yy = np.meshgrid(np.linspace(X2d_xmin, X2d_xmax, resolution), np.linspace(X2d_ymin, X2d_ymax, resolution))
    
    # approximate Voronoi tesselation on resolution x resolution grid using 1-NN
    background_model = KNeighborsClassifier(n_neighbors=1).fit(X_Train_embedded, y_predicted) 
    voronoiBackground = background_model.predict(np.c_[xx.ravel(), yy.ravel()])
    voronoiBackground = voronoiBackground.reshape((resolution, resolution))
    
    #plot
    plt.contourf(xx, yy, voronoiBackground)
    plt.scatter(X_Train_embedded[:,0], X_Train_embedded[:,1], c=y)
    plt.show()
    

    Note that rather than precisely plotting your decision boundary, this will just give you an estimate of roughly where the boundary should lie (especially in regions with few data points, the true boundary can deviate from this). It will draw a line between two data points belonging to different classes, but will place it in the middle (there is indeed guaranteed to be a decision boundary between those points in this case, but it does not necessarily have to be in the middle).

    There are also some experimental approaches to better approximate the true decision boundary, e.g. this one on github

    0 讨论(0)
提交回复
热议问题