How does one train multiple models in a single script in TensorFlow when there are GPUs present?

前端 未结 4 1122
暗喜
暗喜 2021-01-30 14:36

Say I have access to a number of GPUs in a single machine (for the sake of argument assume 8GPUs each with max memory of 8GB each in one single machine with some amount of RAM a

4条回答
  •  攒了一身酷
    2021-01-30 14:38

    An easy solution: Give each model a unique session and graph.

    It works for this platform: TensorFlow 1.12.0, Keras 2.1.6-tf, Python 3.6.7, Jupyter Notebook.

    Key code:

    with session.as_default():
        with session.graph.as_default():
            # do something about an ANN model
    

    Full code:

    import tensorflow as tf
    from tensorflow import keras
    import gc
    
    def limit_memory():
        """ Release unused memory resources. Force garbage collection """
        keras.backend.clear_session()
        keras.backend.get_session().close()
        tf.reset_default_graph()
        gc.collect()
        #cfg = tf.ConfigProto()
        #cfg.gpu_options.allow_growth = True
        #keras.backend.set_session(tf.Session(config=cfg))
        keras.backend.set_session(tf.Session())
        gc.collect()
    
    
    def create_and_train_ANN_model(hyper_parameter):
        print('create and train my ANN model')
        info = { 'result about this ANN model' }
        return info
    
    for i in range(10):
        limit_memory()        
        session = tf.Session()
        keras.backend.set_session(session)
        with session.as_default():
            with session.graph.as_default():   
                hyper_parameter = { 'A set of hyper-parameters' }  
                info = create_and_train_ANN_model(hyper_parameter)      
        limit_memory()
    

    Inspired by this link: Keras (Tensorflow backend) Error - Tensor input_1:0, specified in either feed_devices or fetch_devices was not found in the Graph

提交回复
热议问题