new repository

author: Craig Jennings <c@cjennings.net> 2024-04-07 13:41:34 -0500
committer: Craig Jennings <c@cjennings.net> 2024-04-07 13:41:34 -0500
commit: 754bbf7a25a8dda49b5d08ef0d0443bbf5af0e36 (patch)
tree: f1190704f78f04a2b0b4c977d20fe96a828377f1 /devdocs/docker/compose%2Fgpu-support%2Findex.html
1 files changed, 119 insertions, 0 deletions
diff --git a/devdocs/docker/compose%2Fgpu-support%2Findex.html b/devdocs/docker/compose%2Fgpu-support%2Findex.html
new file mode 100644
index 00000000..02d3394f
--- /dev/null
+++ b/devdocs/docker/compose%2Fgpu-support%2Findex.html
@@ -0,0 +1,119 @@
+<h1>Enabling GPU access with Compose</h1>
+
+<p>Compose services can define GPU device reservations if the Docker host contains such devices and the Docker Daemon is set accordingly. For this, make sure to install the <a href="https://docs.docker.com/config/containers/resource_constraints/#gpu">prerequisites</a> if you have not already done so.</p> <p>The examples in the following sections focus specifically on providing service containers access to GPU devices with Docker Compose. You can use either <code class="language-plaintext highlighter-rouge">docker-compose</code> or <code class="language-plaintext highlighter-rouge">docker compose</code> commands.</p> <h3 id="use-of-service-runtime-property-from-compose-v23-format-legacy">Use of service <code class="language-plaintext highlighter-rouge">runtime</code> property from Compose v2.3 format (legacy)</h3> <p>Docker Compose v1.27.0+ switched to using the Compose Specification schema which is a combination of all properties from 2.x and 3.x versions. This re-enabled the use of service properties as <a href="../compose-file/compose-file-v2/index#runtime">runtime</a> to provide GPU access to service containers. However, this does not allow to have control over specific properties of the GPU devices.</p> <div class="highlight"><pre class="highlight" data-language="">services:
+  test:
+    image: nvidia/cuda:10.2-base
+    command: nvidia-smi
+    runtime: nvidia
+
+</pre></div> <h3 id="enabling-gpu-access-to-service-containers">Enabling GPU access to service containers</h3> <p>Docker Compose v1.28.0+ allows to define GPU reservations using the <a href="https://github.com/compose-spec/compose-spec/blob/master/deploy/#devices">device</a> structure defined in the Compose Specification. This provides more granular control over a GPU reservation as custom values can be set for the following device properties:</p> <ul> <li>
+<a href="https://github.com/compose-spec/compose-spec/blob/master/deploy/#capabilities" target="_blank" rel="noopener" class="_">capabilities</a> - value specifies as a list of strings (eg. <code class="language-plaintext highlighter-rouge">capabilities: [gpu]</code>). You must set this field in the Compose file. Otherwise, it returns an error on service deployment.</li> <li>
+<a href="https://github.com/compose-spec/compose-spec/blob/master/deploy/#count" target="_blank" rel="noopener" class="_">count</a> - value specified as an int or the value <code class="language-plaintext highlighter-rouge">all</code> representing the number of GPU devices that should be reserved ( providing the host holds that number of GPUs).</li> <li>
+<a href="https://github.com/compose-spec/compose-spec/blob/master/deploy/#device_ids" target="_blank" rel="noopener" class="_">device_ids</a> - value specified as a list of strings representing GPU device IDs from the host. You can find the device ID in the output of <code class="language-plaintext highlighter-rouge">nvidia-smi</code> on the host.</li> <li>
+<a href="https://github.com/compose-spec/compose-spec/blob/master/deploy/#driver" target="_blank" rel="noopener" class="_">driver</a> - value specified as a string (eg. <code class="language-plaintext highlighter-rouge">driver: 'nvidia'</code>)</li> <li>
+<a href="https://github.com/compose-spec/compose-spec/blob/master/deploy/#options" target="_blank" rel="noopener" class="_">options</a> - key-value pairs representing driver specific options.</li> </ul> <blockquote> <p><strong>Note</strong></p> <p>You must set the <code class="language-plaintext highlighter-rouge">capabilities</code> field. Otherwise, it returns an error on service deployment.</p> <p><code class="language-plaintext highlighter-rouge">count</code> and <code class="language-plaintext highlighter-rouge">device_ids</code> are mutually exclusive. You must only define one field at a time.</p> </blockquote> <p>For more information on these properties, see the <code class="language-plaintext highlighter-rouge">deploy</code> section in the <a href="https://github.com/compose-spec/compose-spec/blob/master/deploy/#devices" target="_blank" rel="noopener" class="_">Compose Specification</a>.</p> <p>Example of a Compose file for running a service with access to 1 GPU device:</p> <div class="highlight"><pre class="highlight" data-language="">services:
+  test:
+    image: nvidia/cuda:10.2-base
+    command: nvidia-smi
+    deploy:
+      resources:
+        reservations:
+          devices:
+            - driver: nvidia
+              count: 1
+              capabilities: [gpu]
+</pre></div> <p>Run with Docker Compose:</p> <div class="highlight"><pre class="highlight" data-language="">$ docker-compose up
+Creating network "gpu_default" with the default driver
+Creating gpu_test_1 ... done
+Attaching to gpu_test_1    
+test_1  | +-----------------------------------------------------------------------------+
+test_1  | | NVIDIA-SMI 450.80.02    Driver Version: 450.80.02    CUDA Version: 11.1     |
+test_1  | |-------------------------------+----------------------+----------------------+
+test_1  | | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
+test_1  | | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
+test_1  | |                               |                      |               MIG M. |
+test_1  | |===============================+======================+======================|
+test_1  | |   0  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
+test_1  | | N/A   23C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
+test_1  | |                               |                      |                  N/A |
+test_1  | +-------------------------------+----------------------+----------------------+
+test_1  |                                                                                
+test_1  | +-----------------------------------------------------------------------------+
+test_1  | | Processes:                                                                  |
+test_1  | |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
+test_1  | |        ID   ID                                                   Usage      |
+test_1  | |=============================================================================|
+test_1  | |  No running processes found                                                 |
+test_1  | +-----------------------------------------------------------------------------+
+gpu_test_1 exited with code 0
+
+</pre></div> <p>If no <code class="language-plaintext highlighter-rouge">count</code> or <code class="language-plaintext highlighter-rouge">device_ids</code> are set, all GPUs available on the host are going to be used by default.</p> <div class="highlight"><pre class="highlight" data-language="">services:
+  test:
+    image: tensorflow/tensorflow:latest-gpu
+    command: python -c "import tensorflow as tf;tf.test.gpu_device_name()"
+    deploy:
+      resources:
+        reservations:
+          devices:
+            - capabilities: [gpu]
+</pre></div> <div class="highlight"><pre class="highlight" data-language="">$ docker-compose up
+Creating network "gpu_default" with the default driver
+Creating gpu_test_1 ... done
+Attaching to gpu_test_1
+test_1  | I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
+.....
+test_1  | I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402]
+Created TensorFlow device (/device:GPU:0 with 13970 MB memory) -&gt; physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:1e.0, compute capability: 7.5)
+test_1  | /device:GPU:0
+gpu_test_1 exited with code 0
+</pre></div> <p>On machines hosting multiple GPUs, <code class="language-plaintext highlighter-rouge">device_ids</code> field can be set to target specific GPU devices and <code class="language-plaintext highlighter-rouge">count</code> can be used to limit the number of GPU devices assigned to a service container. If <code class="language-plaintext highlighter-rouge">count</code> exceeds the number of available GPUs on the host, the deployment will error out.</p> <div class="highlight"><pre class="highlight" data-language="">$ nvidia-smi   
++-----------------------------------------------------------------------------+
+| NVIDIA-SMI 450.80.02    Driver Version: 450.80.02    CUDA Version: 11.0     |
+|-------------------------------+----------------------+----------------------+
+| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
+| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
+|                               |                      |               MIG M. |
+|===============================+======================+======================|
+|   0  Tesla T4            On   | 00000000:00:1B.0 Off |                    0 |
+| N/A   72C    P8    12W /  70W |      0MiB / 15109MiB |      0%      Default |
+|                               |                      |                  N/A |
++-------------------------------+----------------------+----------------------+
+|   1  Tesla T4            On   | 00000000:00:1C.0 Off |                    0 |
+| N/A   67C    P8    11W /  70W |      0MiB / 15109MiB |      0%      Default |
+|                               |                      |                  N/A |
++-------------------------------+----------------------+----------------------+
+|   2  Tesla T4            On   | 00000000:00:1D.0 Off |                    0 |
+| N/A   74C    P8    12W /  70W |      0MiB / 15109MiB |      0%      Default |
+|                               |                      |                  N/A |
++-------------------------------+----------------------+----------------------+
+|   3  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
+| N/A   62C    P8    11W /  70W |      0MiB / 15109MiB |      0%      Default |
+|                               |                      |                  N/A |
++-------------------------------+----------------------+----------------------+
+</pre></div> <p>To enable access only to GPU-0 and GPU-3 devices:</p> <div class="highlight"><pre class="highlight" data-language="">services:
+  test:
+    image: tensorflow/tensorflow:latest-gpu
+    command: python -c "import tensorflow as tf;tf.test.gpu_device_name()"
+    deploy:
+      resources:
+        reservations:
+          devices:
+          - driver: nvidia
+            device_ids: ['0', '3']
+            capabilities: [gpu]
+
+</pre></div> <div class="highlight"><pre class="highlight" data-language="">$ docker-compose up
+...
+Created TensorFlow device (/device:GPU:0 with 13970 MB memory -&gt; physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:1b.0, compute capability: 7.5)
+...
+Created TensorFlow device (/device:GPU:1 with 13970 MB memory) -&gt; physical GPU (device: 1, name: Tesla T4, pci bus id: 0000:00:1e.0, compute capability: 7.5)
+...
+gpu_test_1 exited with code 0
+</pre></div> 
+<p><a href="https://docs.docker.com/search/?q=documentation">documentation</a>, <a href="https://docs.docker.com/search/?q=docs">docs</a>, <a href="https://docs.docker.com/search/?q=docker">docker</a>, <a href="https://docs.docker.com/search/?q=compose">compose</a>, <a href="https://docs.docker.com/search/?q=GPU%20access">GPU access</a>, <a href="https://docs.docker.com/search/?q=NVIDIA">NVIDIA</a>, <a href="https://docs.docker.com/search/?q=samples">samples</a></p>
+<div class="_attribution">
+  <p class="_attribution-p">
+    &copy; 2019 Docker, Inc.<br>Licensed under the Apache License, Version 2.0.<br>Docker and the Docker logo are trademarks or registered trademarks of Docker, Inc. in the United States and/or other countries.<br>Docker, Inc. and other parties may also have trademark rights in other terms used herein.<br>
+    <a href="https://docs.docker.com/compose/gpu-support/" class="_attribution-link">https://docs.docker.com/compose/gpu-support/</a>
+  </p>
+</div>
author	Craig Jennings <c@cjennings.net>	2024-04-07 13:41:34 -0500
committer	Craig Jennings <c@cjennings.net>	2024-04-07 13:41:34 -0500
commit	754bbf7a25a8dda49b5d08ef0d0443bbf5af0e36 (patch)
tree	f1190704f78f04a2b0b4c977d20fe96a828377f1 /devdocs/docker/compose%2Fgpu-support%2Findex.html