pytorch allocate memory for small size tensor on cpu and gpu but got error on a node with more than 400 GB

南笙酒味 提交于 2021-02-11 14:57:38

问题


I would like to build a torch.nn.embedding with tensors on databricks (the node is p2.8xlarge) by py3.

My code:

  import numpy as np
  import torch
  from torch import nn

  num_embedding, num_dim = 14000, 300
  embedding = nn.Embedding(num_embedding, num_dim)
  row, col = 800000, 302
  t = [[x for x in range(col)] for _ in range(row)] 
  
  t1 = torch.tensor(t)
  print(t1.shape) # torch.Size([800000, 302])
  
  t1.dtype, t1.nelement() # torch.int64, 241600000

  type(t1), t1.device, (t1.nelement() * t1.element_size())/(1024**3) # (torch.Tensor, device(type='cpu'), 1.8000602722167969)

  tt = embedding(t1) # error [enforce fail at CPUAllocator.cpp:64] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 288,000,000,000 bytes. Error code 12 (Cannot allocate memory)

  t2 = t1.cuda()
  t2.device, t2.shape, t2.grad, t2.nelement(), t2.element_size(), (t2.nelement() * t2.element_size())/(1024**3) # (device(type='cuda', index=0), torch.Size([800000, 302]), None, 241600000, 8, 1.8000602722167969)

  embedding_cuda = embedding.cuda()
  embedding_cuda(t2) #  CUDA out of memory. Tried to allocate 270.01 GiB (GPU 0; 11.17 GiB total capacity; 7.16 GiB already allocated; 2.01 GiB free; 8.88 GiB reserved in total by PyTorch)

I do not understand why the size of the given tensor is less than 2GB (1.8 GB) but it cannot be located to cpu and gpu ? Why cpu and gpu have to allocate so large size of 270.01 GiB ?

Did I miss out something ?

thanks

来源:https://stackoverflow.com/questions/64869711/pytorch-allocate-memory-for-small-size-tensor-on-cpu-and-gpu-but-got-error-on-a

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!