Difference between revisions of "Infiniband performance/CentOS and openMPI"

From Teknologisk videncenter
Jump to: navigation, search
m (Created page with "A total increase from 1Gbps ethernet from approx. 114MBps (23 us latency) to approx 630MBps (15 us latency). not quite good enough - Working with Infiniband PSM =Over 1GB Eth...")
 
m (Over 4xQDR)
 
(3 intermediate revisions by the same user not shown)
Line 1: Line 1:
A total increase from 1Gbps ethernet from approx. 114MBps (23 us latency) to approx 630MBps (15 us latency). not quite good enough - Working with [[Infiniband PSM]]
+
A total increase from 1Gbps ethernet from approx. 114MBps (23 us latency) to approx 676MBps (16 us latency). not quite good enough - Working with [[Infiniband PSM]]
 
=Over 1GB Ethernet=
 
=Over 1GB Ethernet=
 
<source lang=cli>
 
<source lang=cli>
Line 72: Line 72:
 
=Over 4xQDR=
 
=Over 4xQDR=
 
<source lang=cli>
 
<source lang=cli>
[root@centos1 bin]#  mpiexec --mca btl ^openib --mca btl_tcp_if_include ib0 -H 10.0.1.102,10.0.1.101 /home/pong
+
 
 +
[root@centos1 bin]#  mpiexec --mca btl ^openib\
 +
--mca btl_tcp_if_include ib0 -H 10.0.1.102,10.0.1.101 /home/pong
 
Hello from 1 of 2
 
Hello from 1 of 2
 
Hello from 0 of 2
 
Hello from 0 of 2
Timer accuracy of ~0.953674 usecs
+
Timer accuracy of ~1.192093 usecs
  
       8 bytes took       84 usec (  0.191 MB/sec)
+
       8 bytes took       129 usec (  0.124 MB/sec)
       16 bytes took        68 usec (  0.471 MB/sec)
+
       16 bytes took        79 usec (  0.405 MB/sec)
       32 bytes took        84 usec (  0.763 MB/sec)
+
       32 bytes took        97 usec (  0.660 MB/sec)
       64 bytes took      101 usec (  1.269 MB/sec)
+
       64 bytes took      104 usec (  1.231 MB/sec)
     128 bytes took       126 usec (  2.034 MB/sec)
+
     128 bytes took       80 usec (  3.196 MB/sec)
     256 bytes took       130 usec (  3.940 MB/sec)
+
     256 bytes took       72 usec (  7.111 MB/sec)
     512 bytes took       132 usec (   7.753 MB/sec)
+
     512 bytes took       53 usec ( 19.347 MB/sec)
     1024 bytes took       114 usec (  17.971 MB/sec)
+
     1024 bytes took       91 usec (  22.487 MB/sec)
     2048 bytes took      126 usec (  32.538 MB/sec)
+
     2048 bytes took      128 usec (  31.992 MB/sec)
     4096 bytes took      113 usec (  72.489 MB/sec)
+
     4096 bytes took      134 usec (  61.138 MB/sec)
     8192 bytes took       143 usec ( 114.532 MB/sec)
+
     8192 bytes took       94 usec ( 174.415 MB/sec)
   16384 bytes took      289 usec ( 113.398 MB/sec)
+
   16384 bytes took      152 usec ( 215.422 MB/sec)
   32768 bytes took      505 usec ( 129.782 MB/sec)
+
   32768 bytes took      365 usec ( 179.541 MB/sec)
   65536 bytes took     1095 usec ( 119.694 MB/sec)
+
   65536 bytes took       581 usec ( 225.587 MB/sec)
   131072 bytes took      1437 usec ( 182.431 MB/sec)
+
   131072 bytes took      1296 usec ( 202.265 MB/sec)
   262144 bytes took      2516 usec ( 208.379 MB/sec)
+
   262144 bytes took      2255 usec ( 232.504 MB/sec)
   524288 bytes took      3891 usec ( 269.488 MB/sec)
+
   524288 bytes took      3513 usec ( 298.476 MB/sec)
  1048576 bytes took      3487 usec ( 601.442 MB/sec)
+
  1048576 bytes took      4194 usec ( 500.034 MB/sec)
  
 
   Asynchronous ping-pong
 
   Asynchronous ping-pong
  
       8 bytes took        43 usec (  0.373 MB/sec)
+
       8 bytes took        44 usec (  0.363 MB/sec)
 
       16 bytes took        34 usec (  0.945 MB/sec)
 
       16 bytes took        34 usec (  0.945 MB/sec)
       32 bytes took        32 usec (  1.988 MB/sec)
+
       32 bytes took        38 usec (  1.688 MB/sec)
       64 bytes took        41 usec (  3.121 MB/sec)
+
       64 bytes took        34 usec (  3.754 MB/sec)
     128 bytes took        38 usec (  6.711 MB/sec)
+
     128 bytes took        37 usec (  6.927 MB/sec)
     256 bytes took        34 usec (  15.017 MB/sec)
+
     256 bytes took        42 usec (  12.202 MB/sec)
     512 bytes took        39 usec (  26.349 MB/sec)
+
     512 bytes took        33 usec (  31.123 MB/sec)
     1024 bytes took        40 usec (  51.131 MB/sec)
+
     1024 bytes took        42 usec (  48.806 MB/sec)
     2048 bytes took        58 usec (  70.699 MB/sec)
+
     2048 bytes took        47 usec (  86.767 MB/sec)
     4096 bytes took        60 usec ( 136.348 MB/sec)
+
     4096 bytes took        54 usec ( 152.034 MB/sec)
     8192 bytes took        72 usec ( 227.548 MB/sec)
+
     8192 bytes took        82 usec ( 199.766 MB/sec)
   16384 bytes took       95 usec ( 345.324 MB/sec)
+
   16384 bytes took       110 usec ( 298.132 MB/sec)
   32768 bytes took      153 usec ( 428.159 MB/sec)
+
   32768 bytes took      200 usec ( 327.626 MB/sec)
   65536 bytes took      326 usec ( 402.162 MB/sec)
+
   65536 bytes took      322 usec ( 406.925 MB/sec)
   131072 bytes took      811 usec ( 323.196 MB/sec)
+
   131072 bytes took      515 usec ( 509.033 MB/sec)
   262144 bytes took     1298 usec ( 403.935 MB/sec)
+
   262144 bytes took       894 usec ( 586.563 MB/sec)
   524288 bytes took      2064 usec ( 508.034 MB/sec)
+
   524288 bytes took      1666 usec ( 629.371 MB/sec)
  1048576 bytes took      3331 usec ( 629.597 MB/sec)
+
  1048576 bytes took      3102 usec ( 676.050 MB/sec)
  
 
   Bi-directional asynchronous ping-pong
 
   Bi-directional asynchronous ping-pong
  
       8 bytes took        39 usec (  0.412 MB/sec)
+
       8 bytes took        39 usec (  0.409 MB/sec)
       16 bytes took        37 usec (  0.866 MB/sec)
+
       16 bytes took        41 usec (  0.780 MB/sec)
       32 bytes took        36 usec (  1.778 MB/sec)
+
       32 bytes took        35 usec (  1.839 MB/sec)
       64 bytes took        40 usec (  3.196 MB/sec)
+
       64 bytes took        34 usec (  3.781 MB/sec)
     128 bytes took        36 usec (  7.111 MB/sec)
+
     128 bytes took        38 usec (  6.711 MB/sec)
     256 bytes took        37 usec (  13.855 MB/sec)
+
     256 bytes took        32 usec (  16.026 MB/sec)
     512 bytes took        31 usec (  33.038 MB/sec)
+
     512 bytes took        36 usec (  28.443 MB/sec)
     1024 bytes took        39 usec (  52.378 MB/sec)
+
     1024 bytes took        36 usec (  56.887 MB/sec)
     2048 bytes took        52 usec (  78.807 MB/sec)
+
     2048 bytes took        55 usec (  74.695 MB/sec)
     4096 bytes took        63 usec ( 130.151 MB/sec)
+
     4096 bytes took        98 usec ( 83.600 MB/sec)
     8192 bytes took      116 usec ( 141.398 MB/sec)
+
     8192 bytes took      135 usec ( 121.198 MB/sec)
   16384 bytes took      178 usec ( 184.235 MB/sec)
+
   16384 bytes took      145 usec ( 226.051 MB/sec)
   32768 bytes took      250 usec ( 262.288 MB/sec)
+
   32768 bytes took      241 usec ( 271.887 MB/sec)
   65536 bytes took      560 usec ( 234.038 MB/sec)
+
   65536 bytes took      563 usec ( 232.849 MB/sec)
   131072 bytes took   204089 usec (   1.284 MB/sec)
+
   131072 bytes took     1331 usec ( 196.974 MB/sec)
   262144 bytes took      2302 usec ( 227.760 MB/sec)
+
   262144 bytes took      2535 usec ( 206.831 MB/sec)
   524288 bytes took      3068 usec ( 341.782 MB/sec)
+
   524288 bytes took      4167 usec ( 251.648 MB/sec)
  1048576 bytes took      7490 usec ( 279.988 MB/sec)
+
  1048576 bytes took      6992 usec ( 299.932 MB/sec)
  
  Max rate = 629.596523 MB/sec  Min latency = 15.497208 usec
+
  Max rate = 676.050497 MB/sec  Min latency = 15.974045 usec
 
</source>
 
</source>
 +
{{Source cli}}
 
[[Category:Infiniband]][[Category:CentOS]]
 
[[Category:Infiniband]][[Category:CentOS]]

Latest revision as of 06:24, 28 August 2012

A total increase from 1Gbps ethernet from approx. 114MBps (23 us latency) to approx 676MBps (16 us latency). not quite good enough - Working with Infiniband PSM

Over 1GB Ethernet

[root@centos1 bin]#  mpiexec --mca btl ^openib --mca btl_tcp_if_include eth0 -H 10.0.1.102,10.0.1.101 /home/pong
Hello from 1 of 2
Hello from 0 of 2
Timer accuracy of ~0.953674 usecs

       8 bytes took       121 usec (   0.132 MB/sec)
      16 bytes took        58 usec (   0.552 MB/sec)
      32 bytes took        60 usec (   1.065 MB/sec)
      64 bytes took       239 usec (   0.536 MB/sec)
     128 bytes took       256 usec (   1.000 MB/sec)
     256 bytes took        94 usec (   5.450 MB/sec)
     512 bytes took       268 usec (   3.818 MB/sec)
    1024 bytes took       151 usec (  13.570 MB/sec)
    2048 bytes took       311 usec (  13.165 MB/sec)
    4096 bytes took       444 usec (  18.443 MB/sec)
    8192 bytes took       272 usec (  60.227 MB/sec)
   16384 bytes took       424 usec (  77.256 MB/sec)
   32768 bytes took       687 usec (  95.377 MB/sec)
   65536 bytes took      1693 usec (  77.419 MB/sec)
  131072 bytes took      2788 usec (  94.024 MB/sec)
  262144 bytes took      5031 usec ( 104.209 MB/sec)
  524288 bytes took      9545 usec ( 109.855 MB/sec)
 1048576 bytes took     18478 usec ( 113.495 MB/sec)

  Asynchronous ping-pong

       8 bytes took        56 usec (   0.286 MB/sec)
      16 bytes took        49 usec (   0.652 MB/sec)
      32 bytes took        54 usec (   1.183 MB/sec)
      64 bytes took       258 usec (   0.496 MB/sec)
     128 bytes took       243 usec (   1.054 MB/sec)
     256 bytes took       246 usec (   2.081 MB/sec)
     512 bytes took       293 usec (   3.495 MB/sec)
    1024 bytes took       292 usec (   7.012 MB/sec)
    2048 bytes took       227 usec (  18.046 MB/sec)
    4096 bytes took       381 usec (  21.502 MB/sec)
    8192 bytes took       252 usec (  65.014 MB/sec)
   16384 bytes took       405 usec (  80.894 MB/sec)
   32768 bytes took       682 usec (  96.111 MB/sec)
   65536 bytes took      1555 usec (  84.293 MB/sec)
  131072 bytes took      2936 usec (  89.282 MB/sec)
  262144 bytes took      4778 usec ( 109.732 MB/sec)
  524288 bytes took      9541 usec ( 109.902 MB/sec)
 1048576 bytes took     18320 usec ( 114.473 MB/sec)

  Bi-directional asynchronous ping-pong

       8 bytes took        47 usec (   0.339 MB/sec)
      16 bytes took        51 usec (   0.627 MB/sec)
      32 bytes took        48 usec (   1.335 MB/sec)
      64 bytes took       219 usec (   0.584 MB/sec)
     128 bytes took       201 usec (   1.274 MB/sec)
     256 bytes took       255 usec (   2.007 MB/sec)
     512 bytes took       265 usec (   3.866 MB/sec)
    1024 bytes took       318 usec (   6.439 MB/sec)
    2048 bytes took       381 usec (  10.751 MB/sec)
    4096 bytes took       200 usec (  40.953 MB/sec)
    8192 bytes took       254 usec (  64.525 MB/sec)
   16384 bytes took       421 usec (  77.825 MB/sec)
   32768 bytes took       735 usec (  89.159 MB/sec)
   65536 bytes took      1538 usec (  85.220 MB/sec)
  131072 bytes took      3177 usec (  82.509 MB/sec)
  262144 bytes took      5665 usec (  92.551 MB/sec)
  524288 bytes took     11319 usec (  92.639 MB/sec)
 1048576 bytes took     28445 usec (  73.727 MB/sec)

 Max rate = 114.472840 MB/sec  Min latency = 23.603439 usec

Over 4xQDR

[root@centos1 bin]#  mpiexec --mca btl ^openib\
 --mca btl_tcp_if_include ib0 -H 10.0.1.102,10.0.1.101 /home/pong
Hello from 1 of 2
Hello from 0 of 2
Timer accuracy of ~1.192093 usecs

       8 bytes took       129 usec (   0.124 MB/sec)
      16 bytes took        79 usec (   0.405 MB/sec)
      32 bytes took        97 usec (   0.660 MB/sec)
      64 bytes took       104 usec (   1.231 MB/sec)
     128 bytes took        80 usec (   3.196 MB/sec)
     256 bytes took        72 usec (   7.111 MB/sec)
     512 bytes took        53 usec (  19.347 MB/sec)
    1024 bytes took        91 usec (  22.487 MB/sec)
    2048 bytes took       128 usec (  31.992 MB/sec)
    4096 bytes took       134 usec (  61.138 MB/sec)
    8192 bytes took        94 usec ( 174.415 MB/sec)
   16384 bytes took       152 usec ( 215.422 MB/sec)
   32768 bytes took       365 usec ( 179.541 MB/sec)
   65536 bytes took       581 usec ( 225.587 MB/sec)
  131072 bytes took      1296 usec ( 202.265 MB/sec)
  262144 bytes took      2255 usec ( 232.504 MB/sec)
  524288 bytes took      3513 usec ( 298.476 MB/sec)
 1048576 bytes took      4194 usec ( 500.034 MB/sec)

  Asynchronous ping-pong

       8 bytes took        44 usec (   0.363 MB/sec)
      16 bytes took        34 usec (   0.945 MB/sec)
      32 bytes took        38 usec (   1.688 MB/sec)
      64 bytes took        34 usec (   3.754 MB/sec)
     128 bytes took        37 usec (   6.927 MB/sec)
     256 bytes took        42 usec (  12.202 MB/sec)
     512 bytes took        33 usec (  31.123 MB/sec)
    1024 bytes took        42 usec (  48.806 MB/sec)
    2048 bytes took        47 usec (  86.767 MB/sec)
    4096 bytes took        54 usec ( 152.034 MB/sec)
    8192 bytes took        82 usec ( 199.766 MB/sec)
   16384 bytes took       110 usec ( 298.132 MB/sec)
   32768 bytes took       200 usec ( 327.626 MB/sec)
   65536 bytes took       322 usec ( 406.925 MB/sec)
  131072 bytes took       515 usec ( 509.033 MB/sec)
  262144 bytes took       894 usec ( 586.563 MB/sec)
  524288 bytes took      1666 usec ( 629.371 MB/sec)
 1048576 bytes took      3102 usec ( 676.050 MB/sec)

  Bi-directional asynchronous ping-pong

       8 bytes took        39 usec (   0.409 MB/sec)
      16 bytes took        41 usec (   0.780 MB/sec)
      32 bytes took        35 usec (   1.839 MB/sec)
      64 bytes took        34 usec (   3.781 MB/sec)
     128 bytes took        38 usec (   6.711 MB/sec)
     256 bytes took        32 usec (  16.026 MB/sec)
     512 bytes took        36 usec (  28.443 MB/sec)
    1024 bytes took        36 usec (  56.887 MB/sec)
    2048 bytes took        55 usec (  74.695 MB/sec)
    4096 bytes took        98 usec (  83.600 MB/sec)
    8192 bytes took       135 usec ( 121.198 MB/sec)
   16384 bytes took       145 usec ( 226.051 MB/sec)
   32768 bytes took       241 usec ( 271.887 MB/sec)
   65536 bytes took       563 usec ( 232.849 MB/sec)
  131072 bytes took      1331 usec ( 196.974 MB/sec)
  262144 bytes took      2535 usec ( 206.831 MB/sec)
  524288 bytes took      4167 usec ( 251.648 MB/sec)
 1048576 bytes took      6992 usec ( 299.932 MB/sec)

 Max rate = 676.050497 MB/sec  Min latency = 15.974045 usec