Infiniband performance/CentOS and openMPI
From Teknologisk videncenter
< Infiniband performance
Revision as of 15:43, 28 April 2012 by Heth (talk | contribs) (Created page with "A total increase from 1Gbps ethernet from approx. 114MBps (23 us latency) to approx 630MBps (15 us latency). not quite good enough - Working with Infiniband PSM =Over 1GB Eth...")
A total increase from 1Gbps ethernet from approx. 114MBps (23 us latency) to approx 630MBps (15 us latency). not quite good enough - Working with Infiniband PSM
Over 1GB Ethernet
[root@centos1 bin]# mpiexec --mca btl ^openib --mca btl_tcp_if_include eth0 -H 10.0.1.102,10.0.1.101 /home/pong
Hello from 1 of 2
Hello from 0 of 2
Timer accuracy of ~0.953674 usecs
8 bytes took 121 usec ( 0.132 MB/sec)
16 bytes took 58 usec ( 0.552 MB/sec)
32 bytes took 60 usec ( 1.065 MB/sec)
64 bytes took 239 usec ( 0.536 MB/sec)
128 bytes took 256 usec ( 1.000 MB/sec)
256 bytes took 94 usec ( 5.450 MB/sec)
512 bytes took 268 usec ( 3.818 MB/sec)
1024 bytes took 151 usec ( 13.570 MB/sec)
2048 bytes took 311 usec ( 13.165 MB/sec)
4096 bytes took 444 usec ( 18.443 MB/sec)
8192 bytes took 272 usec ( 60.227 MB/sec)
16384 bytes took 424 usec ( 77.256 MB/sec)
32768 bytes took 687 usec ( 95.377 MB/sec)
65536 bytes took 1693 usec ( 77.419 MB/sec)
131072 bytes took 2788 usec ( 94.024 MB/sec)
262144 bytes took 5031 usec ( 104.209 MB/sec)
524288 bytes took 9545 usec ( 109.855 MB/sec)
1048576 bytes took 18478 usec ( 113.495 MB/sec)
Asynchronous ping-pong
8 bytes took 56 usec ( 0.286 MB/sec)
16 bytes took 49 usec ( 0.652 MB/sec)
32 bytes took 54 usec ( 1.183 MB/sec)
64 bytes took 258 usec ( 0.496 MB/sec)
128 bytes took 243 usec ( 1.054 MB/sec)
256 bytes took 246 usec ( 2.081 MB/sec)
512 bytes took 293 usec ( 3.495 MB/sec)
1024 bytes took 292 usec ( 7.012 MB/sec)
2048 bytes took 227 usec ( 18.046 MB/sec)
4096 bytes took 381 usec ( 21.502 MB/sec)
8192 bytes took 252 usec ( 65.014 MB/sec)
16384 bytes took 405 usec ( 80.894 MB/sec)
32768 bytes took 682 usec ( 96.111 MB/sec)
65536 bytes took 1555 usec ( 84.293 MB/sec)
131072 bytes took 2936 usec ( 89.282 MB/sec)
262144 bytes took 4778 usec ( 109.732 MB/sec)
524288 bytes took 9541 usec ( 109.902 MB/sec)
1048576 bytes took 18320 usec ( 114.473 MB/sec)
Bi-directional asynchronous ping-pong
8 bytes took 47 usec ( 0.339 MB/sec)
16 bytes took 51 usec ( 0.627 MB/sec)
32 bytes took 48 usec ( 1.335 MB/sec)
64 bytes took 219 usec ( 0.584 MB/sec)
128 bytes took 201 usec ( 1.274 MB/sec)
256 bytes took 255 usec ( 2.007 MB/sec)
512 bytes took 265 usec ( 3.866 MB/sec)
1024 bytes took 318 usec ( 6.439 MB/sec)
2048 bytes took 381 usec ( 10.751 MB/sec)
4096 bytes took 200 usec ( 40.953 MB/sec)
8192 bytes took 254 usec ( 64.525 MB/sec)
16384 bytes took 421 usec ( 77.825 MB/sec)
32768 bytes took 735 usec ( 89.159 MB/sec)
65536 bytes took 1538 usec ( 85.220 MB/sec)
131072 bytes took 3177 usec ( 82.509 MB/sec)
262144 bytes took 5665 usec ( 92.551 MB/sec)
524288 bytes took 11319 usec ( 92.639 MB/sec)
1048576 bytes took 28445 usec ( 73.727 MB/sec)
Max rate = 114.472840 MB/sec Min latency = 23.603439 usec
Over 4xQDR
[root@centos1 bin]# mpiexec --mca btl ^openib --mca btl_tcp_if_include ib0 -H 10.0.1.102,10.0.1.101 /home/pong
Hello from 1 of 2
Hello from 0 of 2
Timer accuracy of ~0.953674 usecs
8 bytes took 84 usec ( 0.191 MB/sec)
16 bytes took 68 usec ( 0.471 MB/sec)
32 bytes took 84 usec ( 0.763 MB/sec)
64 bytes took 101 usec ( 1.269 MB/sec)
128 bytes took 126 usec ( 2.034 MB/sec)
256 bytes took 130 usec ( 3.940 MB/sec)
512 bytes took 132 usec ( 7.753 MB/sec)
1024 bytes took 114 usec ( 17.971 MB/sec)
2048 bytes took 126 usec ( 32.538 MB/sec)
4096 bytes took 113 usec ( 72.489 MB/sec)
8192 bytes took 143 usec ( 114.532 MB/sec)
16384 bytes took 289 usec ( 113.398 MB/sec)
32768 bytes took 505 usec ( 129.782 MB/sec)
65536 bytes took 1095 usec ( 119.694 MB/sec)
131072 bytes took 1437 usec ( 182.431 MB/sec)
262144 bytes took 2516 usec ( 208.379 MB/sec)
524288 bytes took 3891 usec ( 269.488 MB/sec)
1048576 bytes took 3487 usec ( 601.442 MB/sec)
Asynchronous ping-pong
8 bytes took 43 usec ( 0.373 MB/sec)
16 bytes took 34 usec ( 0.945 MB/sec)
32 bytes took 32 usec ( 1.988 MB/sec)
64 bytes took 41 usec ( 3.121 MB/sec)
128 bytes took 38 usec ( 6.711 MB/sec)
256 bytes took 34 usec ( 15.017 MB/sec)
512 bytes took 39 usec ( 26.349 MB/sec)
1024 bytes took 40 usec ( 51.131 MB/sec)
2048 bytes took 58 usec ( 70.699 MB/sec)
4096 bytes took 60 usec ( 136.348 MB/sec)
8192 bytes took 72 usec ( 227.548 MB/sec)
16384 bytes took 95 usec ( 345.324 MB/sec)
32768 bytes took 153 usec ( 428.159 MB/sec)
65536 bytes took 326 usec ( 402.162 MB/sec)
131072 bytes took 811 usec ( 323.196 MB/sec)
262144 bytes took 1298 usec ( 403.935 MB/sec)
524288 bytes took 2064 usec ( 508.034 MB/sec)
1048576 bytes took 3331 usec ( 629.597 MB/sec)
Bi-directional asynchronous ping-pong
8 bytes took 39 usec ( 0.412 MB/sec)
16 bytes took 37 usec ( 0.866 MB/sec)
32 bytes took 36 usec ( 1.778 MB/sec)
64 bytes took 40 usec ( 3.196 MB/sec)
128 bytes took 36 usec ( 7.111 MB/sec)
256 bytes took 37 usec ( 13.855 MB/sec)
512 bytes took 31 usec ( 33.038 MB/sec)
1024 bytes took 39 usec ( 52.378 MB/sec)
2048 bytes took 52 usec ( 78.807 MB/sec)
4096 bytes took 63 usec ( 130.151 MB/sec)
8192 bytes took 116 usec ( 141.398 MB/sec)
16384 bytes took 178 usec ( 184.235 MB/sec)
32768 bytes took 250 usec ( 262.288 MB/sec)
65536 bytes took 560 usec ( 234.038 MB/sec)
131072 bytes took 204089 usec ( 1.284 MB/sec)
262144 bytes took 2302 usec ( 227.760 MB/sec)
524288 bytes took 3068 usec ( 341.782 MB/sec)
1048576 bytes took 7490 usec ( 279.988 MB/sec)
Max rate = 629.596523 MB/sec Min latency = 15.497208 usec