Infiniband performance/CentOS and openMPI
From Teknologisk videncenter
A total increase from 1Gbps ethernet from approx. 114MBps (23 us latency) to approx 676MBps (16 us latency). not quite good enough - Working with Infiniband PSM
Over 1GB Ethernet
[root@centos1 bin]# mpiexec --mca btl ^openib --mca btl_tcp_if_include eth0 -H 10.0.1.102,10.0.1.101 /home/pong
Hello from 1 of 2
Hello from 0 of 2
Timer accuracy of ~0.953674 usecs
8 bytes took 121 usec ( 0.132 MB/sec)
16 bytes took 58 usec ( 0.552 MB/sec)
32 bytes took 60 usec ( 1.065 MB/sec)
64 bytes took 239 usec ( 0.536 MB/sec)
128 bytes took 256 usec ( 1.000 MB/sec)
256 bytes took 94 usec ( 5.450 MB/sec)
512 bytes took 268 usec ( 3.818 MB/sec)
1024 bytes took 151 usec ( 13.570 MB/sec)
2048 bytes took 311 usec ( 13.165 MB/sec)
4096 bytes took 444 usec ( 18.443 MB/sec)
8192 bytes took 272 usec ( 60.227 MB/sec)
16384 bytes took 424 usec ( 77.256 MB/sec)
32768 bytes took 687 usec ( 95.377 MB/sec)
65536 bytes took 1693 usec ( 77.419 MB/sec)
131072 bytes took 2788 usec ( 94.024 MB/sec)
262144 bytes took 5031 usec ( 104.209 MB/sec)
524288 bytes took 9545 usec ( 109.855 MB/sec)
1048576 bytes took 18478 usec ( 113.495 MB/sec)
Asynchronous ping-pong
8 bytes took 56 usec ( 0.286 MB/sec)
16 bytes took 49 usec ( 0.652 MB/sec)
32 bytes took 54 usec ( 1.183 MB/sec)
64 bytes took 258 usec ( 0.496 MB/sec)
128 bytes took 243 usec ( 1.054 MB/sec)
256 bytes took 246 usec ( 2.081 MB/sec)
512 bytes took 293 usec ( 3.495 MB/sec)
1024 bytes took 292 usec ( 7.012 MB/sec)
2048 bytes took 227 usec ( 18.046 MB/sec)
4096 bytes took 381 usec ( 21.502 MB/sec)
8192 bytes took 252 usec ( 65.014 MB/sec)
16384 bytes took 405 usec ( 80.894 MB/sec)
32768 bytes took 682 usec ( 96.111 MB/sec)
65536 bytes took 1555 usec ( 84.293 MB/sec)
131072 bytes took 2936 usec ( 89.282 MB/sec)
262144 bytes took 4778 usec ( 109.732 MB/sec)
524288 bytes took 9541 usec ( 109.902 MB/sec)
1048576 bytes took 18320 usec ( 114.473 MB/sec)
Bi-directional asynchronous ping-pong
8 bytes took 47 usec ( 0.339 MB/sec)
16 bytes took 51 usec ( 0.627 MB/sec)
32 bytes took 48 usec ( 1.335 MB/sec)
64 bytes took 219 usec ( 0.584 MB/sec)
128 bytes took 201 usec ( 1.274 MB/sec)
256 bytes took 255 usec ( 2.007 MB/sec)
512 bytes took 265 usec ( 3.866 MB/sec)
1024 bytes took 318 usec ( 6.439 MB/sec)
2048 bytes took 381 usec ( 10.751 MB/sec)
4096 bytes took 200 usec ( 40.953 MB/sec)
8192 bytes took 254 usec ( 64.525 MB/sec)
16384 bytes took 421 usec ( 77.825 MB/sec)
32768 bytes took 735 usec ( 89.159 MB/sec)
65536 bytes took 1538 usec ( 85.220 MB/sec)
131072 bytes took 3177 usec ( 82.509 MB/sec)
262144 bytes took 5665 usec ( 92.551 MB/sec)
524288 bytes took 11319 usec ( 92.639 MB/sec)
1048576 bytes took 28445 usec ( 73.727 MB/sec)
Max rate = 114.472840 MB/sec Min latency = 23.603439 usec
Over 4xQDR
[root@centos1 bin]# mpiexec --mca btl ^openib --mca btl_tcp_if_include ib0 -H 10.0.1.102,10.0.1.101 /home/pong
Hello from 1 of 2
Hello from 0 of 2
Timer accuracy of ~1.192093 usecs
8 bytes took 129 usec ( 0.124 MB/sec)
16 bytes took 79 usec ( 0.405 MB/sec)
32 bytes took 97 usec ( 0.660 MB/sec)
64 bytes took 104 usec ( 1.231 MB/sec)
128 bytes took 80 usec ( 3.196 MB/sec)
256 bytes took 72 usec ( 7.111 MB/sec)
512 bytes took 53 usec ( 19.347 MB/sec)
1024 bytes took 91 usec ( 22.487 MB/sec)
2048 bytes took 128 usec ( 31.992 MB/sec)
4096 bytes took 134 usec ( 61.138 MB/sec)
8192 bytes took 94 usec ( 174.415 MB/sec)
16384 bytes took 152 usec ( 215.422 MB/sec)
32768 bytes took 365 usec ( 179.541 MB/sec)
65536 bytes took 581 usec ( 225.587 MB/sec)
131072 bytes took 1296 usec ( 202.265 MB/sec)
262144 bytes took 2255 usec ( 232.504 MB/sec)
524288 bytes took 3513 usec ( 298.476 MB/sec)
1048576 bytes took 4194 usec ( 500.034 MB/sec)
Asynchronous ping-pong
8 bytes took 44 usec ( 0.363 MB/sec)
16 bytes took 34 usec ( 0.945 MB/sec)
32 bytes took 38 usec ( 1.688 MB/sec)
64 bytes took 34 usec ( 3.754 MB/sec)
128 bytes took 37 usec ( 6.927 MB/sec)
256 bytes took 42 usec ( 12.202 MB/sec)
512 bytes took 33 usec ( 31.123 MB/sec)
1024 bytes took 42 usec ( 48.806 MB/sec)
2048 bytes took 47 usec ( 86.767 MB/sec)
4096 bytes took 54 usec ( 152.034 MB/sec)
8192 bytes took 82 usec ( 199.766 MB/sec)
16384 bytes took 110 usec ( 298.132 MB/sec)
32768 bytes took 200 usec ( 327.626 MB/sec)
65536 bytes took 322 usec ( 406.925 MB/sec)
131072 bytes took 515 usec ( 509.033 MB/sec)
262144 bytes took 894 usec ( 586.563 MB/sec)
524288 bytes took 1666 usec ( 629.371 MB/sec)
1048576 bytes took 3102 usec ( 676.050 MB/sec)
Bi-directional asynchronous ping-pong
8 bytes took 39 usec ( 0.409 MB/sec)
16 bytes took 41 usec ( 0.780 MB/sec)
32 bytes took 35 usec ( 1.839 MB/sec)
64 bytes took 34 usec ( 3.781 MB/sec)
128 bytes took 38 usec ( 6.711 MB/sec)
256 bytes took 32 usec ( 16.026 MB/sec)
512 bytes took 36 usec ( 28.443 MB/sec)
1024 bytes took 36 usec ( 56.887 MB/sec)
2048 bytes took 55 usec ( 74.695 MB/sec)
4096 bytes took 98 usec ( 83.600 MB/sec)
8192 bytes took 135 usec ( 121.198 MB/sec)
16384 bytes took 145 usec ( 226.051 MB/sec)
32768 bytes took 241 usec ( 271.887 MB/sec)
65536 bytes took 563 usec ( 232.849 MB/sec)
131072 bytes took 1331 usec ( 196.974 MB/sec)
262144 bytes took 2535 usec ( 206.831 MB/sec)
524288 bytes took 4167 usec ( 251.648 MB/sec)
1048576 bytes took 6992 usec ( 299.932 MB/sec)
Max rate = 676.050497 MB/sec Min latency = 15.974045 usec