问题
I would like to build a torch.nn.embedding with tensors on databricks (the node is p2.8xlarge) by py3.
My code:
import numpy as np
import torch
from torch import nn
num_embedding, num_dim = 14000, 300
embedding = nn.Embedding(num_embedding, num_dim)
row, col = 800000, 302
t = [[x for x in range(col)] for _ in range(row)]
t1 = torch.tensor(t)
print(t1.shape) # torch.Size([800000, 302])
t1.dtype, t1.nelement() # torch.int64, 241600000
type(t1), t1.device, (t1.nelement() * t1.element_size())/(1024**3) # (torch.Tensor, device(type='cpu'), 1.8000602722167969)
tt = embedding(t1) # error [enforce fail at CPUAllocator.cpp:64] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 288,000,000,000 bytes. Error code 12 (Cannot allocate memory)
t2 = t1.cuda()
t2.device, t2.shape, t2.grad, t2.nelement(), t2.element_size(), (t2.nelement() * t2.element_size())/(1024**3) # (device(type='cuda', index=0), torch.Size([800000, 302]), None, 241600000, 8, 1.8000602722167969)
embedding_cuda = embedding.cuda()
embedding_cuda(t2) # CUDA out of memory. Tried to allocate 270.01 GiB (GPU 0; 11.17 GiB total capacity; 7.16 GiB already allocated; 2.01 GiB free; 8.88 GiB reserved in total by PyTorch)
I do not understand why the size of the given tensor is less than 2GB (1.8 GB) but it cannot be located to cpu and gpu ? Why cpu and gpu have to allocate so large size of 270.01 GiB ?
Did I miss out something ?
thanks
来源:https://stackoverflow.com/questions/64869711/pytorch-allocate-memory-for-small-size-tensor-on-cpu-and-gpu-but-got-error-on-a