问题
I've installed H2O 3.11.0.266 on a Ubuntu 16.04 with CUDA 8.0 and libcudnn.so.5.1.10 so I believe H2O should be able to find my GPUs.
However, when I start up my h2o.init() in Python, I do not see evidence that it is actually using my GPUs. I see:
- H2O cluster total cores: 8
- H2O cluster allowed cores: 8
which is the same as I had in the previous version (pre GPU).
Also, http://127.0.0.1:54321/flow/index.html shows only 8 cores as well.
I wonder if I don't have something properly installed or whether the latest h2o.init() hasn't implemented info about what GPUs are available or what...
Many thanks in advance.
[edit] I should have mentioned that 3.11.0.266 is supposed to be the version that supports GPUs.
[edit] Thanks for all the suggestions. I'm now running H2O 3.13.0.337
I found this command also useful:
sudo watch -n 0.1 'ps f -o user,pgrp,pid,pcpu,pmem,start,time,command -p `/usr/bin/lsof -n -w -t /dev/nvidia*`'
But, I'm a tad puzzled.
When I run XGBoost, I clearly see that the GPUs are very active 30 to 40% utilization (as well as all 8 of my CPU cores, which I guess must be managing the GPUs.) XGB finishes my classification problem in 20 seconds.
GLM runs pretty fast, so it's a little hard to tell if it's using my GPUs (done in less than a second. It does start clock in the STARTED column displayed by the ps program.
USER PGRP PID %CPU %MEM STARTED TIME COMMAND
user 3380 3380 116 12.0 10:52:56 04:36:36 /usr/local/anaconda2/bin/java -ea -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -jar /usr/local/anaconda2/lib/python2.7/site-packages/h
Distributed Random Forest starts the clock too, but doesn't seem to use any GPU processing but it does use all the CPU cores.
GBM is similar. It takes 1.5 minutes to train the same problem compared to 20sec for XGB. Since the algorithms are similar, I would have expected them to take the similar amount of time and use the GPUs in a similar way. I find this surprising.
I'm convinced that XGBoost is working the GPUs, but I'm not sure if any of the other algorithms are.
[added]
By way of comparison on H2O 3.13.0.341. Noticed the difference in temperature(!) and percentage GPU
Here's what gpustat -cup shows when I run xgboost:
[0] GeForce GTX 1080 | 64'C, 90 % | 1189 / 8105 MB | clem:java/31183(191M)
Here's what it shows when I run Distributed Random Forest (similar results occur for GBM and DeepLearning)
[0] GeForce GTX 1080 | 51'C, 5 % | 1187 / 8105 MB | clem:java/31183(189M)
回答1:
You will need the GPU-enabled version of H2O, available on the H2O download page. It is not clear from your question if you are using regular H2O or GPU-enabled H2O, however if you are using GPU-enabled H2O and have the proper dependencies, it should see your GPUs. The current dependency list is:
- Ubuntu 16.04
- CUDA 8.0
- cuDNN 5.1
I have opened a JIRA ticket to add some metadata in the h2o.init()
printout so that you'll see information about your GPUs there (in a future release).
回答2:
From a terminal window, run the nvidia-smi
tool. Look at the utilization. If it's 0%, you're not using the GPUs.
In the example below, you can see Volatile GPU Utilization is 0%, so the GPUs are not being used.
$ nvidia-smi
Tue May 30 13:50:11 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 370.28 Driver Version: 370.28 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1080 Off | 0000:02:00.0 Off | N/A |
| 27% 30C P8 10W / 180W | 1MiB / 8113MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 1080 Off | 0000:03:00.0 On | N/A |
| 27% 31C P8 9W / 180W | 38MiB / 8112MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 1 1599 G /usr/lib/xorg/Xorg 36MiB |
+-----------------------------------------------------------------------------+
I use the following handy little script to monitor GPU utilization for myself.
$ cat bin/gputop
#!/bin/bash
watch -d -n 0.5 nvidia-smi
来源:https://stackoverflow.com/questions/44269267/how-can-i-tell-if-h2o-3-11-0-266-is-running-with-gpus