Say I have access to a number of GPUs in a single machine (for the sake of argument assume 8GPUs each with max memory of 8GB each in one single machine with some amount of RAM a
As I understand, firstly tensorflow constructs a symbolic graph and infers the derivatives based on chain rule. Then allocates memory for all (necessary) tensors, including some inputs and outputs of layers for efficiency. When running a session, data will be loaded into the graph but in general, memory use will not change any more.
The error you met, I guess, may be caused by constructing several models in one GPU.
Isolating your training/evaluation code from the hyper parameters is a good choice, as @user2476373 proposed. But I am using bash script directly, not task spooler (may be it's more convenient), e.g.
CUDA_VISIBLE_DEVICES=0 python train.py --lrn_rate 0.01 --weight_decay_rate 0.001 --momentum 0.9 --batch_size 8 --max_iter 60000 --snapshot 5000
CUDA_VISIBLE_DEVICES=0 python eval.py
Or you can write a 'for' loop in the bash script, not necessarily in python script. Noting that I used CUDA_VISIBLE_DEVICES=0
at beginning of the script (the index could be 7 if you have 8 GPUs in one machine). Because based on my experience, I've found that tensorflow uses all GPUs in one machine if I didn't specify operations use which GPU with the code like this
with tf.device('/gpu:0'):
If you want to try multi-GPU implementation, there is some example.
Hope this could help you.