问题
This question is actually a duplicate of this one, which however remains unanswered at the time of writing.
Why is the explained_variance_ratio_
from TruncatedSVD
not in descending order like it would be from PCA
? In my experience it seems that the first element of the list is always the lowest, and then at the second element the value jumps up and then goes in descending order from there. Why is explained_variance_ratio_[0]
< explained_variance_ratio_[1]
( > explained_variance_ratio_[2]
> explained_variance_ratio_[3]
...)? Does this mean the second "component" actually explains the most variance (not the first)?
Code to reproduce behavior:
from sklearn.decomposition import TruncatedSVD
n_components = 50
X_test = np.random.rand(50,100)
model = TruncatedSVD(n_components=n_components, algorithm = 'randomized')
model.fit_transform(X_test)
model.explained_variance_ratio_
回答1:
If you scale the data first, then I think the explained variance ratios will be in descending order:
from sklearn.decomposition import TruncatedSVD
from sklearn.preprocessing import StandardScaler
n_components = 50
X_test = np.random.rand(50,100)
scaler = StandardScaler()
X_test = scaler.fit_transform(X_test)
model = TruncatedSVD(n_components=n_components, algorithm = 'randomized')
model.fit_transform(X_test)
model.explained_variance_ratio_
来源:https://stackoverflow.com/questions/54411576/sci-kit-learn-truncatedsvd-explained-variance-ratio-not-in-descending-order