How does one train multiple models in a single script in TensorFlow when there are GPUs present?

前端未结

关注

 4  1122

暗喜 2021-01-30 14:36

Say I have access to a number of GPUs in a single machine (for the sake of argument assume 8GPUs each with max memory of 8GB each in one single machine with some amount of RAM a

4条回答

攒了一身酷 (楼主)

2021-01-30 14:38

An easy solution: Give each model a unique session and graph.

It works for this platform: TensorFlow 1.12.0, Keras 2.1.6-tf, Python 3.6.7, Jupyter Notebook.

Key code:

with session.as_default():
    with session.graph.as_default():
        # do something about an ANN model

Full code:

import tensorflow as tf
from tensorflow import keras
import gc

def limit_memory():
    """ Release unused memory resources. Force garbage collection """
    keras.backend.clear_session()
    keras.backend.get_session().close()
    tf.reset_default_graph()
    gc.collect()
    #cfg = tf.ConfigProto()
    #cfg.gpu_options.allow_growth = True
    #keras.backend.set_session(tf.Session(config=cfg))
    keras.backend.set_session(tf.Session())
    gc.collect()


def create_and_train_ANN_model(hyper_parameter):
    print('create and train my ANN model')
    info = { 'result about this ANN model' }
    return info

for i in range(10):
    limit_memory()        
    session = tf.Session()
    keras.backend.set_session(session)
    with session.as_default():
        with session.graph.as_default():   
            hyper_parameter = { 'A set of hyper-parameters' }  
            info = create_and_train_ANN_model(hyper_parameter)      
    limit_memory()

Inspired by this link: Keras (Tensorflow backend) Error - Tensor input_1:0, specified in either feed_devices or fetch_devices was not found in the Graph

0 讨论(0)

查看其它4个回答