サンプルの実行
最後のデモとして,
最初に必要なパッケージのインストールとNVIDIA Dockerのコードのチェックアウトです。
$ sudo apt install git $ git clone https://github.com/NVIDIA/nvidia-docker.git $ cd nvidia-docker/
前回と同じくdeviceQuery
プログラムから試してみましょう。
$ docker build -t sample:deviceQuery samples/ubuntu-16.04/deviceQuery (中略) Successfully built f771d146a5a1 $ docker images sample:deviceQuery REPOSITORY TAG IMAGE ID CREATED SIZE sample deviceQuery f771d146a5a1 2 minutes ago 1.93 GB $ nvidia-docker run --rm sample:deviceQuery ./deviceQuery Starting... CUDA Device Query (Runtime API) version (CUDART static linking) Detected 1 CUDA Capable device(s) Device 0: "GeForce GTX 1050 Ti" CUDA Driver Version / Runtime Version 8.0 / 8.0 CUDA Capability Major/Minor version number: 6.1 Total amount of global memory: 4039 MBytes (4235001856 bytes) ( 6) Multiprocessors, (128) CUDA Cores/MP: 768 CUDA Cores GPU Max Clock rate: 1392 MHz (1.39 GHz) Memory Clock rate: 3504 Mhz Memory Bus Width: 128-bit L2 Cache Size: 1048576 bytes Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384) Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 2 copy engine(s) Run time limit on kernels: No Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled Device supports Unified Addressing (UVA): Yes Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GeForce GTX 1050 Ti Result = PASS
次にnbody
も試してみましょう。サンプルのDockerfileは/usr/
の個々のサンプルディレクトリをWORKDIR
に設定した上でmake
し,CMD
フィールドでサンプルを実行しています。よって./サンプルプログラム オプション
」
$ docker build -t sample:nbody samples/ubuntu-16.04/nbody (中略) Successfully built 32144285f8f4 $ docker images sample:nbody REPOSITORY TAG IMAGE ID CREATED SIZE sample nbody 32144285f8f4 About a minute ago 1.93 GB $ nvidia-docker run --rm sample:nbody (中略) 6144 bodies, total time for 10 iterations: 6.136 ms = 61.522 billion interactions per second = 1230.441 single-precision GFLOP/s at 20 flops per interaction $ nvidia-docker run --rm sample:nbody ./nbody -benchmark -numbodies=8192 (中略) number of bodies = 8192 8192 bodies, total time for 10 iterations: 12.298 ms = 54.568 billion interactions per second = 1091.357 single-precision GFLOP/s at 20 flops per interaction $ nvidia-docker run --rm sample:nbody ./nbody -benchmark -numbodies=65536 (中略) number of bodies = 65536 65536 bodies, total time for 10 iterations: 636.416 ms = 67.487 billion interactions per second = 1349.736 single-precision GFLOP/s at 20 flops per interaction $ nvidia-docker run --rm sample:nbody ./nbody -benchmark -numbodies=8192 -cpu (中略) > Simulation with CPU number of bodies = 8192 8192 bodies, total time for 10 iterations: 27903.188 ms = 0.024 billion interactions per second = 0.481 single-precision GFLOP/s at 20 flops per interaction
コンテナ上であっても,
仮想マシンでのGPGPU
コンテナではなくKVMのような仮想マシン上でもGPGPUは利用可能です。
たとえばIOMMU
たとえばIntelのGVT-gのように,
PCIデバイスとして考えた場合,
そもそもCUDA 8.
以上のようにいろいろな制約があることから,
コラム:今週末は「オープンソースカンファレンス2017 Tokyo/Spring」
Ubuntu Weekly Topicsでも改めて告知する予定ですが,
Ubuntu Japanese Teamも両日参加予定で,
セミナーの時間以外は,
ちなみに会場の最寄り駅の隣の駅には多摩動物公園があります。ただし,