GPGPU deいろんな問題解いてみた

GPU DE いろんなちょっとした問題解いてみた
Terumi YAMADA

自己紹介

• 山田てるみ(研修中

• SIMD大好きっ子

• Twitter: telmin_orca

もくじ
• 自己紹介

• 前フリ

• 巡回セールスマン問題解いてみた

• Aobench走らせてみた

• まとめ

OpenCL
OpenCLとは？

ステマ

NVIDIA

• Geforce GTX 580

• Fermi

• 512 CUDA core

• 3GB RAM

• PCIe 2.0

AMD

• Radeon HD 7970

• GCN

• 2048 Streaming Processor

• 3GB RAM

• PCIe 3.0

HOST

• Intel Core i7 2600K

• SandyBridge

• 8GB RAM

巡回セールスマン問題解いてみた

巡回セールスマン問題？

解法

• 遺伝的アルゴリズム

• 蟻コロニー最適化

• μ-opt法

• LK法

2-opt法

i k i k

l j l j

Parallel 2-opt

• SIMD 2-opt法のGPGPUへの適応と評価

• 第74回情報処理学会全国大会 GPUセッション

GPGPU deいろんな問題解いてみた

重いのは？
i k i k

l j l j

CPU -> GPU

経路長計算
最短経路選択
最短経路交換

Result
CPU NVIDIA AMD

10万 152.241 114.02 2472.06

12万 235.05 168.58 3487.41

14万 296.395 266.211

16万 427.161 328.547

…？
CPU NVIDIA AMD

10万 152.241 114.02 2472.06

12万 235.05 168.58 3487.41

14万 296.395 266.211

16万 427.161 328.547

…？
CPU NVIDIA AMD

10万 152.241 114.02 2472.06

12万 235.05 168.58 3487.41

14万 296.395 266.211

16万 427.161 328.547
＼(^o^)／

Aobench?

• Ambient Occlution benchmark.

• @syoyo氏制作

• 浮動小数点演算のベンチマーク

Ambient Occlution

• Global Illumination

• 間接光

• 結構重い

重いのは？

• Intersection

• Sphere * 3 + Plane = 4

• AO sample 64 * 64 = 256

Result

CPU NVIDIA AMD

256 * 256
6.30 0.057 0.061
64 * 64
512 * 512
24.58 0.213 0.131
64 * 64
1024 * 1024
96.735 0.831 0.4462
64 * 64

：
：が： //: /:::|::',|::'､:::::::::＼:.:＼.:.:.ヽ:.:.:＼:.:..＼::::::::::::＼､::::＼：：
：
：何： /!::|::l:::: /|:::l:ヽ:＼::ヽ:.:＼:.:＼.:::ヽ:.:.:ヽ:.:.:.:＼::::::::::::＼￣：：
：
：だ： |/l::|::|::|: ﾄ､:::::::::､､:ヽ､:.:.:.:::::::::::::::ヽ::::.:ヽ:.:.:.:.＼:.:.:.ヽ:::＼. ：
：：
：か： |::|::/l::|::|r-ヽ:::::ヽ(ヽー,―＼::::::､::::::::::ヽ::.:.::::::.:::::::ヾ.￣：：
：
：： }//l::|:::|{（:::）ヾ､:::ヽ＼!（:::）ヽ,:::ヽ:::::::::::::::::::::::::::::::::::ヾ、：
：：
：わ：. |/l::|::|:::|ヽ==''" ＼:ヽ、ヽ=＝'" |:::::::::::::::::::::::::::::::::::ヽ､::::＼
か / ',|::|:::| / ｀゛ |!::::::::::::::::::::::::::::ﾄ､::ﾄ､_｀゛`
ら l::!::::ﾄ、 '､ _ ||::::::::::::::::::::::::ﾄ:ヽヾ| |￣￣￣
｀ヽ、
な r'"´||',::::', |:::::/l:::::|＼:::ﾄ､ヾ | | ／ / ＼
い / ll ',::', 、ｰこﾆ=- /!::/ ヽ:::| ヾ､ﾉﾉ／ ,ｲ
ヽ、

Device type: Unknown
???
Max resource 2D width/height: 16384/16384
Total GPU memory size: 3072 MB
Total CPU cached space size: 508 MB
Total CPU uncached space size: 1788 MB
GPU engine clock: 925 MHz
GPU memory clock: 1375 MHz
Number of timing loops: 100
[ 16 bytes] CPU->GPU= 800.000 KB/sec, GPU->CPU 533.333 KB/sec
[ 32 bytes] CPU->GPU= 1.600 MB/sec, GPU->CPU 1.067 MB/sec
[ 32768 bytes] CPU->GPU= 1.638 GB/sec, GPU->CPU 1.638 GB/sec
...
[1073741824 bytes] CPU->GPU= 6.705 GB/sec, GPU->CPU 2.771 GB/sec
calResAllocRemote2D() returned an error when trying to allocate 1874853888 bytes (uncached)!
Peak CPU->GPU Bandwidth = 6.705 GB/sec [data size = 536870912 bytes]
Peak GPU->CPU Bandwidth = 4.369 GB/sec [data size = 131072 bytes]

????
GeForce GTX 580

Quick Mode

Host to Device Bandwidth, 1 Device(s), Paged memory, direct access
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 5561.7

Device to Host Bandwidth, 1 Device(s), Paged memory, direct access
33554432 5466.2

Device to Device Bandwidth, 1 Device(s)
33554432 138261.9

?????
__kernel void map_test(__global int* src,__global int* dst,const int limit)
{
int id = get_global_id(0);

if(id > limit) return;

dst[id] = src[limit - 1 - id];
}

?????
__kernel void map_test(__global int* src,__global int* dst,const int limit)
{
int id = get_global_id(0);

if(id > limit) return;

dst[id] = src[limit - 1 - id];
} 1000 ~

!

NVIDIA AMD

0.355824 1.70634
1000
0.16186 0.7224
3.54601 14.1305
10000
1.697 6.1982
35.4747 128.583
100000
16.213 58.0289

• GPGPUやるならGeforce GTX 580

• Radeon HD 7970は…

• スロースターター足に爆弾

• カーネルが大きくなれば…

GPGPU deいろんな問題解いてみた

More Related Content

What's hot (20)

Viewers also liked (15)

Similar to GPGPU deいろんな問題解いてみた (20)

GPGPU deいろんな問題解いてみた

Editor's Notes