Keras uses GPU for first 2 epochs, then stops using it

前端 未结 2 1686
渐次进展
渐次进展 2021-01-26 12:42

I prepare the dataset and save it as as hdf5 file. I have a custom data generator that subclasses Sequence from keras and generates batches from the hdf5 file.

Now, when

2条回答
  •  北海茫月
    2021-01-26 13:11

    Can you try configuring GPU as given in this post https://www.tensorflow.org/guide/gpu

    Here is how i have done in my program

    print("Runnning Jupyter Notebook using python version: {}".format(python_version()))
    print("Running tensorflow version: {}".format(tf.keras.__version__))
    print("Running tensorflow.keras version: {}".format(tf.__version__))
    print("Running keras version: {}".format(keras.__version__))
    print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
    tf.config.experimental.list_physical_devices('GPU')
    
    gpus = tf.config.experimental.list_physical_devices('GPU')
    if gpus:
      # Restrict TensorFlow to only allocate 2GB of memory on the first GPU
      try:
        tf.config.experimental.set_virtual_device_configuration(
            gpus[0],
            [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=2048)])
        logical_gpus = tf.config.experimental.list_logical_devices('GPU')
        print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
      except RuntimeError as e:
        # Virtual devices must be set before GPUs have been initialized
        print(e)
    

    Here is the output of above command:

    Runnning Jupyter Notebook using python version: 3.7.7
    Running tensorflow version: 2.2.4-tf
    Running tensorflow.keras version: 2.1.0
    Running keras version: 2.3.1
    Num GPUs Available:  1
    1 Physical GPUs, 1 Logical GPUs
    

    Value might differ, memory_limit=2048 is the amount of memory allocated to GPU device.

    To confirm allocation please use nvidia-smi, if you run with this config keras won't increase memory usage. As you told that after 2 epochs it is very slow, can you tell further does kernel dies, system hangs or restarts? Issues without config I have faced, is system just hangs. If you are running on ubuntu 18.04 LTS please use System Monitor(visually tells how many cores are being used, periodic contants increase means something is wrong) tool before executing all cells in notebook, once you start check Resources Tab in System Monitor.

    Do:

    • A fresh run
    • Or Restart & Run All

    Suspected Issue: How to prevent tensorflow from allocating the totality of a GPU memory?

提交回复
热议问题