Why do my google cloud compute instances always unexpectedly restart?

后端 未结 2 386
闹比i
闹比i 2021-01-02 04:59

Help! Help! Help!

It is really annoying and I almost cannot bear it anymore! I\'m using google cloud compute engine instances but they often unexpectedly restart wit

2条回答
  •  栀梦
    栀梦 (楼主)
    2021-01-02 05:21

    The issue is right here:

    all GPUs are in use

    If you check the official documentation about GPU:

    GPU instances must terminate for host maintenance events, but can automatically restart. These maintenance events typically occur once per week, but can occur more frequently when necessary. You must configure your workloads to handle these maintenance events cleanly. Specifically, long-running workloads like machine learning and high-performance computing (HPC) must handle the interruption of host maintenance events. Learn how to handle host maintenance events on instances with GPUs.

    This is because an instance that has a GPU attached cannot be migrated to another host for maintenance as it happens for the rest of the virtual machines. To get a physical GPU attached to the instance and bare metal performance you are using GPU passthrough , which sadly means if the host has to go through maintenance the VM is going down with it.

提交回复
热议问题