Why do my google cloud compute instances always unexpectedly restart?

后端 未结 2 385
闹比i
闹比i 2021-01-02 04:59

Help! Help! Help!

It is really annoying and I almost cannot bear it anymore! I\'m using google cloud compute engine instances but they often unexpectedly restart wit

相关标签:
2条回答
  • 2021-01-02 05:11

    This sounds like Preemptible VM instance.

    Preemptible instances function like normal instances, but have the following limitations:

    • Compute Engine might terminate preemptible instances at any time due to system events. The probability that Compute Engine will terminate a preemptible instance for a system event is generally low, but might vary from day to day and from zone to zone depending on current conditions.
    • Compute Engine always terminates preemptible instances after they run for 24 hours.

    To check if your instance is preemptible using gcloud cli, just run

    gcloud compute instances describe instance-name --format="(scheduling.preemptible)"
    

    Result

    scheduling:
      preemptible: false
    

    change "instance-name" to real name.

    Or simply via UI, click on compute instance and scroll down:

    To check for system operations performed on your instance, you can review it using following command:

    gcloud compute operations list 
    
    0 讨论(0)
  • 2021-01-02 05:21

    The issue is right here:

    all GPUs are in use

    If you check the official documentation about GPU:

    GPU instances must terminate for host maintenance events, but can automatically restart. These maintenance events typically occur once per week, but can occur more frequently when necessary. You must configure your workloads to handle these maintenance events cleanly. Specifically, long-running workloads like machine learning and high-performance computing (HPC) must handle the interruption of host maintenance events. Learn how to handle host maintenance events on instances with GPUs.

    This is because an instance that has a GPU attached cannot be migrated to another host for maintenance as it happens for the rest of the virtual machines. To get a physical GPU attached to the instance and bare metal performance you are using GPU passthrough , which sadly means if the host has to go through maintenance the VM is going down with it.

    0 讨论(0)
提交回复
热议问题