Infiniband performance/CentOS and openMPI

From Teknologisk videncenter
< Infiniband performance
Revision as of 19:56, 28 April 2012 by Heth (talk | contribs) (Over 4xQDR)
Jump to: navigation, search

A total increase from 1Gbps ethernet from approx. 114MBps (23 us latency) to approx 630MBps (15 us latency). not quite good enough - Working with Infiniband PSM

Over 1GB Ethernet

[root@centos1 bin]#  mpiexec --mca btl ^openib --mca btl_tcp_if_include eth0 -H 10.0.1.102,10.0.1.101 /home/pong
Hello from 1 of 2
Hello from 0 of 2
Timer accuracy of ~0.953674 usecs

       8 bytes took       121 usec (   0.132 MB/sec)
      16 bytes took        58 usec (   0.552 MB/sec)
      32 bytes took        60 usec (   1.065 MB/sec)
      64 bytes took       239 usec (   0.536 MB/sec)
     128 bytes took       256 usec (   1.000 MB/sec)
     256 bytes took        94 usec (   5.450 MB/sec)
     512 bytes took       268 usec (   3.818 MB/sec)
    1024 bytes took       151 usec (  13.570 MB/sec)
    2048 bytes took       311 usec (  13.165 MB/sec)
    4096 bytes took       444 usec (  18.443 MB/sec)
    8192 bytes took       272 usec (  60.227 MB/sec)
   16384 bytes took       424 usec (  77.256 MB/sec)
   32768 bytes took       687 usec (  95.377 MB/sec)
   65536 bytes took      1693 usec (  77.419 MB/sec)
  131072 bytes took      2788 usec (  94.024 MB/sec)
  262144 bytes took      5031 usec ( 104.209 MB/sec)
  524288 bytes took      9545 usec ( 109.855 MB/sec)
 1048576 bytes took     18478 usec ( 113.495 MB/sec)

  Asynchronous ping-pong

       8 bytes took        56 usec (   0.286 MB/sec)
      16 bytes took        49 usec (   0.652 MB/sec)
      32 bytes took        54 usec (   1.183 MB/sec)
      64 bytes took       258 usec (   0.496 MB/sec)
     128 bytes took       243 usec (   1.054 MB/sec)
     256 bytes took       246 usec (   2.081 MB/sec)
     512 bytes took       293 usec (   3.495 MB/sec)
    1024 bytes took       292 usec (   7.012 MB/sec)
    2048 bytes took       227 usec (  18.046 MB/sec)
    4096 bytes took       381 usec (  21.502 MB/sec)
    8192 bytes took       252 usec (  65.014 MB/sec)
   16384 bytes took       405 usec (  80.894 MB/sec)
   32768 bytes took       682 usec (  96.111 MB/sec)
   65536 bytes took      1555 usec (  84.293 MB/sec)
  131072 bytes took      2936 usec (  89.282 MB/sec)
  262144 bytes took      4778 usec ( 109.732 MB/sec)
  524288 bytes took      9541 usec ( 109.902 MB/sec)
 1048576 bytes took     18320 usec ( 114.473 MB/sec)

  Bi-directional asynchronous ping-pong

       8 bytes took        47 usec (   0.339 MB/sec)
      16 bytes took        51 usec (   0.627 MB/sec)
      32 bytes took        48 usec (   1.335 MB/sec)
      64 bytes took       219 usec (   0.584 MB/sec)
     128 bytes took       201 usec (   1.274 MB/sec)
     256 bytes took       255 usec (   2.007 MB/sec)
     512 bytes took       265 usec (   3.866 MB/sec)
    1024 bytes took       318 usec (   6.439 MB/sec)
    2048 bytes took       381 usec (  10.751 MB/sec)
    4096 bytes took       200 usec (  40.953 MB/sec)
    8192 bytes took       254 usec (  64.525 MB/sec)
   16384 bytes took       421 usec (  77.825 MB/sec)
   32768 bytes took       735 usec (  89.159 MB/sec)
   65536 bytes took      1538 usec (  85.220 MB/sec)
  131072 bytes took      3177 usec (  82.509 MB/sec)
  262144 bytes took      5665 usec (  92.551 MB/sec)
  524288 bytes took     11319 usec (  92.639 MB/sec)
 1048576 bytes took     28445 usec (  73.727 MB/sec)

 Max rate = 114.472840 MB/sec  Min latency = 23.603439 usec

Over 4xQDR

[root@centos1 bin]#  mpiexec --mca btl ^openib --mca btl_tcp_if_include ib0 -H 10.0.1.102,10.0.1.101 /home/pong
Hello from 1 of 2
Hello from 0 of 2
Timer accuracy of ~0.953674 usecs

       8 bytes took       146 usec (   0.109 MB/sec)
      16 bytes took       119 usec (   0.269 MB/sec)
      32 bytes took        58 usec (   1.105 MB/sec)
      64 bytes took       122 usec (   1.049 MB/sec)
     128 bytes took       183 usec (   1.400 MB/sec)
     256 bytes took       215 usec (   2.381 MB/sec)
     512 bytes took       195 usec (   5.257 MB/sec)
    1024 bytes took       155 usec (  13.215 MB/sec)
    2048 bytes took       137 usec (  29.878 MB/sec)
    4096 bytes took       180 usec (  45.510 MB/sec)
    8192 bytes took       223 usec (  73.497 MB/sec)
   16384 bytes took       284 usec ( 115.398 MB/sec)
   32768 bytes took       446 usec ( 146.915 MB/sec)
   65536 bytes took      1020 usec ( 128.508 MB/sec)
  131072 bytes took      1210 usec ( 216.653 MB/sec)
  262144 bytes took      2058 usec ( 254.752 MB/sec)
  524288 bytes took      3883 usec ( 270.034 MB/sec)
 1048576 bytes took      4247 usec ( 493.802 MB/sec)

  Asynchronous ping-pong

       8 bytes took        66 usec (   0.242 MB/sec)
      16 bytes took        53 usec (   0.602 MB/sec)
      32 bytes took        54 usec (   1.183 MB/sec)
      64 bytes took        36 usec (   3.555 MB/sec)
     128 bytes took        36 usec (   7.111 MB/sec)
     256 bytes took        38 usec (  13.506 MB/sec)
     512 bytes took        37 usec (  27.709 MB/sec)
    1024 bytes took        38 usec (  54.025 MB/sec)
    2048 bytes took        54 usec (  76.017 MB/sec)
    4096 bytes took       118 usec (  69.414 MB/sec)
    8192 bytes took       124 usec ( 132.153 MB/sec)
   16384 bytes took       166 usec ( 197.186 MB/sec)
   32768 bytes took       260 usec ( 252.182 MB/sec)
   65536 bytes took       407 usec ( 322.060 MB/sec)
  131072 bytes took       601 usec ( 436.141 MB/sec)
  262144 bytes took       946 usec ( 554.189 MB/sec)
  524288 bytes took      1739 usec ( 602.968 MB/sec)
 1048576 bytes took      3158 usec ( 664.057 MB/sec)

  Bi-directional asynchronous ping-pong

       8 bytes took        40 usec (   0.399 MB/sec)
      16 bytes took        38 usec (   0.839 MB/sec)
      32 bytes took        35 usec (   1.839 MB/sec)
      64 bytes took        38 usec (   3.377 MB/sec)
     128 bytes took        37 usec (   6.927 MB/sec)
     256 bytes took        39 usec (  13.175 MB/sec)
     512 bytes took        35 usec (  29.217 MB/sec)
    1024 bytes took        39 usec (  52.378 MB/sec)
    2048 bytes took        57 usec (  71.882 MB/sec)
    4096 bytes took        60 usec ( 136.348 MB/sec)
    8192 bytes took        91 usec ( 180.366 MB/sec)
   16384 bytes took       166 usec ( 197.470 MB/sec)
   32768 bytes took       255 usec ( 256.895 MB/sec)
   65536 bytes took       522 usec ( 251.145 MB/sec)
  131072 bytes took       975 usec ( 268.895 MB/sec)
  262144 bytes took      1842 usec ( 284.663 MB/sec)
  524288 bytes took      3575 usec ( 293.301 MB/sec)
 1048576 bytes took      8998 usec ( 233.071 MB/sec)

 Max rate = 664.056547 MB/sec  Min latency = 17.404556 usec

[[Source cli}}