Linear Discriminant Analysis inverse transform

前端 未结 2 1970
忘了有多久
忘了有多久 2020-12-21 12:10

I try to use Linear Discriminant Analysis from scikit-learn library, in order to perform dimensionality reduction on my data which has more than 200 features. But I could no

相关标签:
2条回答
  • 2020-12-21 12:37

    The inverse of the LDA does not necessarily make sense beause it loses a lot of information.

    For comparison, consider the PCA. Here we get a coefficient matrix that is used to transform the data. We can do dimensionality reduction by stripping rows from the matrix. To get the inverse transform, we first invert the full matrix and then remove the columns corresponding to the removed rows.

    The LDA does not give us a full matrix. We only get a reduced matrix that cannot be directly inverted. It is possible to take the pseudo inverse, but this is much less efficient than if we had the full matrix at our disposal.

    Consider a simple example:

    C = np.ones((3, 3)) + np.eye(3)  # full transform matrix
    U = C[:2, :]  # dimensionality reduction matrix
    V1 = np.linalg.inv(C)[:, :2]  # PCA-style reconstruction matrix
    print(V1)
    #array([[ 0.75, -0.25],
    #       [-0.25,  0.75],
    #       [-0.25, -0.25]])
    
    V2 = np.linalg.pinv(U)  # LDA-style reconstruction matrix
    print(V2)
    #array([[ 0.63636364, -0.36363636],
    #       [-0.36363636,  0.63636364],
    #       [ 0.09090909,  0.09090909]])
    

    If we have the full matrix we get a different inverse transform (V1) than if we simple invert the transform (V2). That is because in the second case we lost all information about the discarded components.

    You have been warned. If you still want to do the inverse LDA transform, here is a function:

    import matplotlib.pyplot as plt
    
    from sklearn import datasets
    from sklearn.decomposition import PCA
    from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
    
    from sklearn.utils.validation import check_is_fitted
    from sklearn.utils import check_array, check_X_y
    
    import numpy as np
    
    
    def inverse_transform(lda, x):
        if lda.solver == 'lsqr':
            raise NotImplementedError("(inverse) transform not implemented for 'lsqr' "
                                      "solver (use 'svd' or 'eigen').")
        check_is_fitted(lda, ['xbar_', 'scalings_'], all_or_any=any)
    
        inv = np.linalg.pinv(lda.scalings_)
    
        x = check_array(x)
        if lda.solver == 'svd':
            x_back = np.dot(x, inv) + lda.xbar_
        elif lda.solver == 'eigen':
            x_back = np.dot(x, inv)
    
        return x_back
    
    
    iris = datasets.load_iris()
    
    X = iris.data
    y = iris.target
    target_names = iris.target_names
    
    lda = LinearDiscriminantAnalysis()
    Z = lda.fit(X, y).transform(X)
    
    Xr = inverse_transform(lda, Z)
    
    # plot first two dimensions of original and reconstructed data
    plt.plot(X[:, 0], X[:, 1], '.', label='original')
    plt.plot(Xr[:, 0], Xr[:, 1], '.', label='reconstructed')
    plt.legend()
    

    You see, the result of the inverse transform does not have much to do with the original data (well, it's possible to guess the direction of the projection). A considerable part of the variation is gone for good.

    0 讨论(0)
  • 2020-12-21 12:38

    There is no inverse transform because in general, you can not return from the lower dimensional feature space to your original coordinate space.

    Think of it like looking at your 2-dimensional shadow projected on a wall. You can't get back to your 3-dimensional geometry from a single shadow because information is lost during the projection.

    To address your comment regarding PCA, consider a data set of 10 random 3-dimensional vectors:

    In [1]: import numpy as np
    
    In [2]: from sklearn.decomposition import PCA
    
    In [3]: X = np.random.rand(30).reshape(10, 3)
    

    Now, what happens if we apply the Principal Components Transformation (PCT) and apply dimensionality reduction by keeping only the top 2 (out of 3) PCs, then apply the inverse transform?

    In [4]: pca = PCA(n_components=2)
    
    In [5]: pca.fit(X)
    Out[5]: 
    PCA(copy=True, iterated_power='auto', n_components=2, random_state=None,
      svd_solver='auto', tol=0.0, whiten=False)
    
    In [6]: Y = pca.transform(X)
    
    In [7]: X.shape
    Out[7]: (10, 3)
    
    In [8]: Y.shape
    Out[8]: (10, 2)
    
    In [9]: XX = pca.inverse_transform(Y)
    
    In [10]: X[0]
    Out[10]: array([ 0.95780971,  0.23739785,  0.06678655])
    
    In [11]: XX[0]
    Out[11]: array([ 0.87931369,  0.34958407, -0.01145125])
    

    Obviously, the inverse transform did not reconstruct the original data. The reason is that by dropping the lowest PC, we lost information. Next, let's see what happens if we retain all PCs (i.e., we do not apply any dimensionality reduction):

    In [12]: pca2 = PCA(n_components=3)
    
    In [13]: pca2.fit(X)
    Out[13]: 
    PCA(copy=True, iterated_power='auto', n_components=3, random_state=None,
      svd_solver='auto', tol=0.0, whiten=False)
    
    In [14]: Y = pca2.transform(X)
    
    In [15]: XX = pca2.inverse_transform(Y)
    
    In [16]: X[0]
    Out[16]: array([ 0.95780971,  0.23739785,  0.06678655])
    
    In [17]: XX[0]
    Out[17]: array([ 0.95780971,  0.23739785,  0.06678655])
    

    In this case, we were able to reconstruct the original data because we didn't throw away any information (since we retained all the PCs).

    The situation with LDA is even worse because the maximum number of components that can be retained is not 200 (the number of features for your input data); rather, the maximum number of components you can retain is n_classes - 1. So if, for example, you were doing a binary classification problem (2 classes), the LDA transform would be going from 200 input dimensions down to just a single dimension.

    0 讨论(0)
提交回复
热议问题