问题
I'm training a model in Pytorch and I want to use truncated SVD decomposition of input. For calculating SVD I transfer input witch is a Pytorch Cuda Tensor to CPU and using TruncatedSVD
from scikit-learn
perform truncate, after that, I transfer the result back to GPU. The following is code for my model:
class ImgEmb(nn.Module):
def __init__(self, input_size, hidden_size):
super(ImgEmb, self).__init__()
self.input_size = input_size
self.hidden_size = hidden_size
self.drop = nn.Dropout(0.2)
self.mlp = nn.Linear(input_size/2, hidden_size)
self.relu = nn.Tanh()
self.svd = TruncatedSVD(n_components=input_size/2)
def forward(self, input):
svd=self.svd.fit_transform(input.cpu())
svd_tensor=torch.from_numpy(svd)
svd_tensor=svd_tensor.cuda()
mlp=self.mlp(svd_tensor)
res = self.relu(mlp)
return res
I wonder is a way to implement truncated SVD without transferring back and forth to GPU? (Because it's very time consuming and is not efficient at all)
回答1:
You could directly use PyTorch's SVD and truncate it manually, or you can use the truncated SVD from TensorLy, with the PyTorch backend:
import tensorly as tl
tl.set_backend('pytorch')
U, S, V = tl.truncated_svd(matrix, n_eigenvecs=10)
However, the GPU SVD does not scale very well on large matrices. You can also use TensorLy's partial svd which will still copy your input to CPU but will be much faster if you keep only a few eigenvalues as it will use a sparse eigendecomposition. In Scikit-learn's truncated SVD, you can also use 'algorithm = arpack' to use Scipy's sparse SVD which again might be faster if you only need a few components.
回答2:
How to transform tensor CUDA in to the CPU?
If you have a CUDA tensor, you can transfer it to the CPU with this instruction:
y_vel it is the pytorch tensor in cuda.
y_val = y_val.cpu().data.numpy()
来源:https://stackoverflow.com/questions/58026949/truncate-svd-decomposition-of-pytorch-tensor-without-transfering-to-cpu