PCA projection and reconstruction in scikit-learn

前端未结

关注

 2  2016

予麋鹿 2021-01-30 05:43

I can perform PCA in scikit by code below: X_train has 279180 rows and 104 columns.

from sklearn.decomposition import PCA
pca = PCA(n_components=30)
X_train_pca


      
      
        
          2条回答        

        
                    
            
            
                         
                
              
              
                
                   情歌与酒
                                             
                
                
                (楼主)
            
              
              
                2021-01-30 06:24
              

            
            
                        
You can do

proj = pca.inverse_transform(X_train_pca)


That way you do not have to worry about how to do the multiplications.

What you obtain after pca.fit_transform or pca.transform are what is usually called the "loadings" for each sample, meaning how much of each component you need to describe it best using a linear combination of the components_ (the principal axes in feature space).

The projection you are aiming at is back in the original signal space. This means that you need to go back into signal space using the components and the loadings.

So there are three steps to disambiguate here. Here you have, step by step, what you can do using the PCA object and how it is actually calculated:


pca.fit estimates the components (using an SVD on the centered Xtrain):

from sklearn.decomposition import PCA
import numpy as np
from numpy.testing import assert_array_almost_equal

#Should this variable be X_train instead of Xtrain?
X_train = np.random.randn(100, 50)

pca = PCA(n_components=30)
pca.fit(X_train)

U, S, VT = np.linalg.svd(X_train - X_train.mean(0))

assert_array_almost_equal(VT[:30], pca.components_)

pca.transform calculates the loadings as you describe

X_train_pca = pca.transform(X_train)

X_train_pca2 = (X_train - pca.mean_).dot(pca.components_.T)

assert_array_almost_equal(X_train_pca, X_train_pca2)

pca.inverse_transform obtains the projection onto components in signal space you are interested in

X_projected = pca.inverse_transform(X_train_pca)
X_projected2 = X_train_pca.dot(pca.components_) + pca.mean_

assert_array_almost_equal(X_projected, X_projected2)



You can now evaluate the projection loss

loss = ((X_train - X_projected) ** 2).mean()

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它2个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复