Infiniband performance/CentOS and openMPI
From Teknologisk videncenter
A total increase from 1Gbps ethernet from approx. 114MBps (23 us latency) to approx 664MBps (17 us latency). not quite good enough - Working with Infiniband PSM
Over 1GB Ethernet
[root@centos1 bin]# mpiexec --mca btl ^openib --mca btl_tcp_if_include eth0 -H 10.0.1.102,10.0.1.101 /home/pong
Hello from 1 of 2
Hello from 0 of 2
Timer accuracy of ~0.953674 usecs
8 bytes took 121 usec ( 0.132 MB/sec)
16 bytes took 58 usec ( 0.552 MB/sec)
32 bytes took 60 usec ( 1.065 MB/sec)
64 bytes took 239 usec ( 0.536 MB/sec)
128 bytes took 256 usec ( 1.000 MB/sec)
256 bytes took 94 usec ( 5.450 MB/sec)
512 bytes took 268 usec ( 3.818 MB/sec)
1024 bytes took 151 usec ( 13.570 MB/sec)
2048 bytes took 311 usec ( 13.165 MB/sec)
4096 bytes took 444 usec ( 18.443 MB/sec)
8192 bytes took 272 usec ( 60.227 MB/sec)
16384 bytes took 424 usec ( 77.256 MB/sec)
32768 bytes took 687 usec ( 95.377 MB/sec)
65536 bytes took 1693 usec ( 77.419 MB/sec)
131072 bytes took 2788 usec ( 94.024 MB/sec)
262144 bytes took 5031 usec ( 104.209 MB/sec)
524288 bytes took 9545 usec ( 109.855 MB/sec)
1048576 bytes took 18478 usec ( 113.495 MB/sec)
Asynchronous ping-pong
8 bytes took 56 usec ( 0.286 MB/sec)
16 bytes took 49 usec ( 0.652 MB/sec)
32 bytes took 54 usec ( 1.183 MB/sec)
64 bytes took 258 usec ( 0.496 MB/sec)
128 bytes took 243 usec ( 1.054 MB/sec)
256 bytes took 246 usec ( 2.081 MB/sec)
512 bytes took 293 usec ( 3.495 MB/sec)
1024 bytes took 292 usec ( 7.012 MB/sec)
2048 bytes took 227 usec ( 18.046 MB/sec)
4096 bytes took 381 usec ( 21.502 MB/sec)
8192 bytes took 252 usec ( 65.014 MB/sec)
16384 bytes took 405 usec ( 80.894 MB/sec)
32768 bytes took 682 usec ( 96.111 MB/sec)
65536 bytes took 1555 usec ( 84.293 MB/sec)
131072 bytes took 2936 usec ( 89.282 MB/sec)
262144 bytes took 4778 usec ( 109.732 MB/sec)
524288 bytes took 9541 usec ( 109.902 MB/sec)
1048576 bytes took 18320 usec ( 114.473 MB/sec)
Bi-directional asynchronous ping-pong
8 bytes took 47 usec ( 0.339 MB/sec)
16 bytes took 51 usec ( 0.627 MB/sec)
32 bytes took 48 usec ( 1.335 MB/sec)
64 bytes took 219 usec ( 0.584 MB/sec)
128 bytes took 201 usec ( 1.274 MB/sec)
256 bytes took 255 usec ( 2.007 MB/sec)
512 bytes took 265 usec ( 3.866 MB/sec)
1024 bytes took 318 usec ( 6.439 MB/sec)
2048 bytes took 381 usec ( 10.751 MB/sec)
4096 bytes took 200 usec ( 40.953 MB/sec)
8192 bytes took 254 usec ( 64.525 MB/sec)
16384 bytes took 421 usec ( 77.825 MB/sec)
32768 bytes took 735 usec ( 89.159 MB/sec)
65536 bytes took 1538 usec ( 85.220 MB/sec)
131072 bytes took 3177 usec ( 82.509 MB/sec)
262144 bytes took 5665 usec ( 92.551 MB/sec)
524288 bytes took 11319 usec ( 92.639 MB/sec)
1048576 bytes took 28445 usec ( 73.727 MB/sec)
Max rate = 114.472840 MB/sec Min latency = 23.603439 usec
Over 4xQDR
[root@centos1 bin]# mpiexec --mca btl ^openib --mca btl_tcp_if_include ib0 -H 10.0.1.102,10.0.1.101 /home/pong
Hello from 1 of 2
Hello from 0 of 2
Timer accuracy of ~0.953674 usecs
8 bytes took 146 usec ( 0.109 MB/sec)
16 bytes took 119 usec ( 0.269 MB/sec)
32 bytes took 58 usec ( 1.105 MB/sec)
64 bytes took 122 usec ( 1.049 MB/sec)
128 bytes took 183 usec ( 1.400 MB/sec)
256 bytes took 215 usec ( 2.381 MB/sec)
512 bytes took 195 usec ( 5.257 MB/sec)
1024 bytes took 155 usec ( 13.215 MB/sec)
2048 bytes took 137 usec ( 29.878 MB/sec)
4096 bytes took 180 usec ( 45.510 MB/sec)
8192 bytes took 223 usec ( 73.497 MB/sec)
16384 bytes took 284 usec ( 115.398 MB/sec)
32768 bytes took 446 usec ( 146.915 MB/sec)
65536 bytes took 1020 usec ( 128.508 MB/sec)
131072 bytes took 1210 usec ( 216.653 MB/sec)
262144 bytes took 2058 usec ( 254.752 MB/sec)
524288 bytes took 3883 usec ( 270.034 MB/sec)
1048576 bytes took 4247 usec ( 493.802 MB/sec)
Asynchronous ping-pong
8 bytes took 66 usec ( 0.242 MB/sec)
16 bytes took 53 usec ( 0.602 MB/sec)
32 bytes took 54 usec ( 1.183 MB/sec)
64 bytes took 36 usec ( 3.555 MB/sec)
128 bytes took 36 usec ( 7.111 MB/sec)
256 bytes took 38 usec ( 13.506 MB/sec)
512 bytes took 37 usec ( 27.709 MB/sec)
1024 bytes took 38 usec ( 54.025 MB/sec)
2048 bytes took 54 usec ( 76.017 MB/sec)
4096 bytes took 118 usec ( 69.414 MB/sec)
8192 bytes took 124 usec ( 132.153 MB/sec)
16384 bytes took 166 usec ( 197.186 MB/sec)
32768 bytes took 260 usec ( 252.182 MB/sec)
65536 bytes took 407 usec ( 322.060 MB/sec)
131072 bytes took 601 usec ( 436.141 MB/sec)
262144 bytes took 946 usec ( 554.189 MB/sec)
524288 bytes took 1739 usec ( 602.968 MB/sec)
1048576 bytes took 3158 usec ( 664.057 MB/sec)
Bi-directional asynchronous ping-pong
8 bytes took 40 usec ( 0.399 MB/sec)
16 bytes took 38 usec ( 0.839 MB/sec)
32 bytes took 35 usec ( 1.839 MB/sec)
64 bytes took 38 usec ( 3.377 MB/sec)
128 bytes took 37 usec ( 6.927 MB/sec)
256 bytes took 39 usec ( 13.175 MB/sec)
512 bytes took 35 usec ( 29.217 MB/sec)
1024 bytes took 39 usec ( 52.378 MB/sec)
2048 bytes took 57 usec ( 71.882 MB/sec)
4096 bytes took 60 usec ( 136.348 MB/sec)
8192 bytes took 91 usec ( 180.366 MB/sec)
16384 bytes took 166 usec ( 197.470 MB/sec)
32768 bytes took 255 usec ( 256.895 MB/sec)
65536 bytes took 522 usec ( 251.145 MB/sec)
131072 bytes took 975 usec ( 268.895 MB/sec)
262144 bytes took 1842 usec ( 284.663 MB/sec)
524288 bytes took 3575 usec ( 293.301 MB/sec)
1048576 bytes took 8998 usec ( 233.071 MB/sec)
Max rate = 664.056547 MB/sec Min latency = 17.404556 usec