Difference between revisions of "Beowulf Aug2010/Installation/Test Applicationer"
From Teknologisk videncenter
m (New page: ==Brug 100% CPU== Skal måske startes flere gange for at bruge begge kerne. <source lang=bash> #!/bin/bash while : ; do true done </source>) |
m (→Memory test) |
||
(11 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
+ | <accesscontrol>teacher</accesscontrol> | ||
==Brug 100% CPU== | ==Brug 100% CPU== | ||
Skal måske startes flere gange for at bruge begge kerne. | Skal måske startes flere gange for at bruge begge kerne. | ||
Line 8: | Line 9: | ||
done | done | ||
</source> | </source> | ||
+ | ==Teste MPI båndbrede== | ||
+ | Testen er lavet på vores Dell PowerEdge 1750 med onboard netkort i en 3550 100mbit/s switch | ||
+ | <pre> | ||
+ | Hello from 0 of 2 | ||
+ | Timer accuracy of ~0.953674 usecs | ||
+ | |||
+ | Hello from 1 of 2 | ||
+ | 8 bytes took 331 usec ( 0.048 MB/sec) | ||
+ | 16 bytes took 206 usec ( 0.155 MB/sec) | ||
+ | 32 bytes took 213 usec ( 0.301 MB/sec) | ||
+ | 64 bytes took 229 usec ( 0.559 MB/sec) | ||
+ | 128 bytes took 266 usec ( 0.962 MB/sec) | ||
+ | 256 bytes took 334 usec ( 1.533 MB/sec) | ||
+ | 512 bytes took 472 usec ( 2.170 MB/sec) | ||
+ | 1024 bytes took 758 usec ( 2.701 MB/sec) | ||
+ | 2048 bytes took 1112 usec ( 3.683 MB/sec) | ||
+ | 4096 bytes took 1402 usec ( 5.843 MB/sec) | ||
+ | 8192 bytes took 2138 usec ( 7.664 MB/sec) | ||
+ | 16384 bytes took 3555 usec ( 9.218 MB/sec) | ||
+ | 32768 bytes took 6403 usec ( 10.235 MB/sec) | ||
+ | 65536 bytes took 12508 usec ( 10.479 MB/sec) | ||
+ | 131072 bytes took 23635 usec ( 11.091 MB/sec) | ||
+ | 262144 bytes took 46029 usec ( 11.390 MB/sec) | ||
+ | 524288 bytes took 90590 usec ( 11.575 MB/sec) | ||
+ | 1048576 bytes took 179719 usec ( 11.669 MB/sec) | ||
+ | |||
+ | Asynchronous ping-pong | ||
+ | |||
+ | 8 bytes took 212 usec ( 0.075 MB/sec) | ||
+ | 16 bytes took 205 usec ( 0.156 MB/sec) | ||
+ | 32 bytes took 211 usec ( 0.303 MB/sec) | ||
+ | 64 bytes took 237 usec ( 0.540 MB/sec) | ||
+ | 128 bytes took 264 usec ( 0.969 MB/sec) | ||
+ | 256 bytes took 336 usec ( 1.524 MB/sec) | ||
+ | 512 bytes took 470 usec ( 2.179 MB/sec) | ||
+ | 1024 bytes took 753 usec ( 2.720 MB/sec) | ||
+ | 2048 bytes took 1076 usec ( 3.807 MB/sec) | ||
+ | 4096 bytes took 1407 usec ( 5.822 MB/sec) | ||
+ | 8192 bytes took 2141 usec ( 7.653 MB/sec) | ||
+ | 16384 bytes took 3661 usec ( 8.950 MB/sec) | ||
+ | 32768 bytes took 6428 usec ( 10.195 MB/sec) | ||
+ | 65536 bytes took 12507 usec ( 10.480 MB/sec) | ||
+ | 131072 bytes took 23708 usec ( 11.057 MB/sec) | ||
+ | 262144 bytes took 46073 usec ( 11.380 MB/sec) | ||
+ | 524288 bytes took 90696 usec ( 11.561 MB/sec) | ||
+ | 1048576 bytes took 179756 usec ( 11.667 MB/sec) | ||
+ | |||
+ | Bi-directional asynchronous ping-pong | ||
+ | |||
+ | 8 bytes took 197 usec ( 0.081 MB/sec) | ||
+ | 16 bytes took 189 usec ( 0.169 MB/sec) | ||
+ | 32 bytes took 196 usec ( 0.327 MB/sec) | ||
+ | 64 bytes took 200 usec ( 0.640 MB/sec) | ||
+ | 128 bytes took 230 usec ( 1.114 MB/sec) | ||
+ | 256 bytes took 324 usec ( 1.580 MB/sec) | ||
+ | 512 bytes took 454 usec ( 2.256 MB/sec) | ||
+ | 1024 bytes took 735 usec ( 2.786 MB/sec) | ||
+ | 2048 bytes took 1075 usec ( 3.810 MB/sec) | ||
+ | 4096 bytes took 1421 usec ( 5.765 MB/sec) | ||
+ | 8192 bytes took 2187 usec ( 7.491 MB/sec) | ||
+ | 16384 bytes took 3630 usec ( 9.027 MB/sec) | ||
+ | 32768 bytes took 6534 usec ( 10.030 MB/sec) | ||
+ | 65536 bytes took 13344 usec ( 9.823 MB/sec) | ||
+ | 131072 bytes took 25400 usec ( 10.321 MB/sec) | ||
+ | 262144 bytes took 49689 usec ( 10.551 MB/sec) | ||
+ | 524288 bytes took 124407 usec ( 8.429 MB/sec) | ||
+ | 1048576 bytes took 261184 usec ( 8.029 MB/sec) | ||
+ | |||
+ | Max rate = 11.669048 MB/sec Min latency = 94.413757 usec | ||
+ | |||
+ | </pre> | ||
+ | Og de adskiller sig jo ikke det store fra dem i [[Weekend_Projekt_-_Test_Cluster|Test Clusteret]] | ||
+ | <br/><br/> | ||
+ | Her er de så hvis man bruger Intel netkortet igennem en 100mbit/s switch: | ||
+ | <pre> | ||
+ | Hello from 0 of 2 | ||
+ | Timer accuracy of ~0.953674 usecs | ||
+ | |||
+ | 8 bytes took 504 usec ( 0.032 MB/sec) | ||
+ | 16 bytes took 266 usec ( 0.120 MB/sec) | ||
+ | 32 bytes took 199 usec ( 0.322 MB/sec) | ||
+ | Hello from 1 of 2 | ||
+ | 64 bytes took 262 usec ( 0.489 MB/sec) | ||
+ | 128 bytes took 262 usec ( 0.977 MB/sec) | ||
+ | 256 bytes took 491 usec ( 1.042 MB/sec) | ||
+ | 512 bytes took 460 usec ( 2.227 MB/sec) | ||
+ | 1024 bytes took 1012 usec ( 2.024 MB/sec) | ||
+ | 2048 bytes took 1084 usec ( 3.778 MB/sec) | ||
+ | 4096 bytes took 1780 usec ( 4.602 MB/sec) | ||
+ | 8192 bytes took 2179 usec ( 7.519 MB/sec) | ||
+ | 16384 bytes took 3735 usec ( 8.774 MB/sec) | ||
+ | 32768 bytes took 6395 usec ( 10.248 MB/sec) | ||
+ | 65536 bytes took 12623 usec ( 10.384 MB/sec) | ||
+ | 131072 bytes took 13000 usec ( 20.165 MB/sec) | ||
+ | 262144 bytes took 24516 usec ( 21.386 MB/sec) | ||
+ | 524288 bytes took 47452 usec ( 22.098 MB/sec) | ||
+ | 1048576 bytes took 91873 usec ( 22.827 MB/sec) | ||
+ | |||
+ | Asynchronous ping-pong | ||
+ | |||
+ | 8 bytes took 434 usec ( 0.037 MB/sec) | ||
+ | 16 bytes took 189 usec ( 0.169 MB/sec) | ||
+ | 32 bytes took 234 usec ( 0.273 MB/sec) | ||
+ | 64 bytes took 221 usec ( 0.579 MB/sec) | ||
+ | 128 bytes took 482 usec ( 0.531 MB/sec) | ||
+ | 256 bytes took 325 usec ( 1.576 MB/sec) | ||
+ | 512 bytes took 714 usec ( 1.435 MB/sec) | ||
+ | 1024 bytes took 743 usec ( 2.756 MB/sec) | ||
+ | 2048 bytes took 1255 usec ( 3.264 MB/sec) | ||
+ | 4096 bytes took 1419 usec ( 5.773 MB/sec) | ||
+ | 8192 bytes took 2247 usec ( 7.292 MB/sec) | ||
+ | 16384 bytes took 3574 usec ( 9.168 MB/sec) | ||
+ | 32768 bytes took 6691 usec ( 9.794 MB/sec) | ||
+ | 65536 bytes took 13006 usec ( 10.078 MB/sec) | ||
+ | 131072 bytes took 13254 usec ( 19.779 MB/sec) | ||
+ | 262144 bytes took 24478 usec ( 21.419 MB/sec) | ||
+ | 524288 bytes took 47442 usec ( 22.102 MB/sec) | ||
+ | 1048576 bytes took 91929 usec ( 22.813 MB/sec) | ||
+ | |||
+ | Bi-directional asynchronous ping-pong | ||
+ | |||
+ | 8 bytes took 490 usec ( 0.033 MB/sec) | ||
+ | 16 bytes took 247 usec ( 0.130 MB/sec) | ||
+ | 32 bytes took 248 usec ( 0.258 MB/sec) | ||
+ | 64 bytes took 247 usec ( 0.518 MB/sec) | ||
+ | 128 bytes took 248 usec ( 1.032 MB/sec) | ||
+ | 256 bytes took 498 usec ( 1.028 MB/sec) | ||
+ | 512 bytes took 528 usec ( 1.939 MB/sec) | ||
+ | 1024 bytes took 748 usec ( 2.738 MB/sec) | ||
+ | 2048 bytes took 1261 usec ( 3.248 MB/sec) | ||
+ | 4096 bytes took 1521 usec ( 5.386 MB/sec) | ||
+ | 8192 bytes took 2270 usec ( 7.218 MB/sec) | ||
+ | 16384 bytes took 3785 usec ( 8.658 MB/sec) | ||
+ | 32768 bytes took 6564 usec ( 9.984 MB/sec) | ||
+ | 65536 bytes took 13499 usec ( 9.710 MB/sec) | ||
+ | 131072 bytes took 19491 usec ( 13.450 MB/sec) | ||
+ | 262144 bytes took 36962 usec ( 14.185 MB/sec) | ||
+ | 524288 bytes took 71936 usec ( 14.577 MB/sec) | ||
+ | 1048576 bytes took 160475 usec ( 13.068 MB/sec) | ||
+ | |||
+ | Max rate = 22.826658 MB/sec Min latency = 94.413757 usec | ||
+ | </pre> | ||
+ | ==HDD test== | ||
+ | http://www.linux-mag.com/id/7906/2/ | ||
+ | <pre> | ||
+ | iostat -x -m /dev/sda1 1 | ||
+ | </pre> | ||
+ | <pre> | ||
+ | ************************************* | ||
+ | root@NewClusterH:~# hdparm -Tt /dev/sdb | ||
+ | |||
+ | /dev/sdb: | ||
+ | Timing cached reads: 19890 MB in 2.00 seconds = 9969.08 MB/sec | ||
+ | Timing buffered disk reads: 2318 MB in 3.00 seconds = 772.65 MB/sec | ||
+ | ************************************* | ||
+ | |||
+ | test af write på SSD | ||
+ | |||
+ | ******************************** | ||
+ | root@NewClusterH:/ssd# dd count=500 bs=100M if=/dev/zero of=/ssd/test.img | ||
+ | 500+0 records in | ||
+ | 500+0 records out | ||
+ | 52428800000 bytes (52 GB) copied, 68,0062 s, 771 MB/s | ||
+ | ******************************** | ||
+ | |||
+ | test af read på SSD | ||
+ | |||
+ | ******************************** | ||
+ | root@NewClusterH:/ssd# dd count=500 bs=100M if=/ssd/test.img of=/dev/null | ||
+ | 500+0 records in | ||
+ | 500+0 records out | ||
+ | 52428800000 bytes (52 GB) copied, 63,5971 s, 824 MB/s | ||
+ | ******************************** | ||
+ | |||
+ | test af write på WD | ||
+ | |||
+ | ******************************** | ||
+ | root@NewClusterH:/raid5# dd count=500 bs=100M if=/dev/zero of=/raid5/test.img | ||
+ | 500+0 records in | ||
+ | 500+0 records out | ||
+ | 52428800000 bytes (52 GB) copied, 150,079 s, 349 MB/s | ||
+ | ******************************** | ||
+ | |||
+ | test af read på WD | ||
+ | |||
+ | ******************************** | ||
+ | root@NewClusterH:/raid5# dd count=500 bs=100M of=/dev/null if=/raid5/test.img | ||
+ | 500+0 records in | ||
+ | 500+0 records out | ||
+ | 52428800000 bytes (52 GB) copied, 156,591 s, 335 MB/s | ||
+ | ******************************** | ||
+ | </pre> | ||
+ | ==Memory test== | ||
+ | http://www.cs.virginia.edu/stream/ref.html#start | ||
+ | <pre> | ||
+ | rael@newclusterh:~/stream$ gcc -O stream.c -o stream | ||
+ | rael@newclusterh:~/stream$ ./stream | ||
+ | ------------------------------------------------------------- | ||
+ | STREAM version $Revision: 5.9 $ | ||
+ | ------------------------------------------------------------- | ||
+ | This system uses 8 bytes per DOUBLE PRECISION word. | ||
+ | ------------------------------------------------------------- | ||
+ | Array size = 2000000, Offset = 0 | ||
+ | Total memory required = 45.8 MB. | ||
+ | Each test is run 10 times, but only | ||
+ | the *best* time for each is used. | ||
+ | ------------------------------------------------------------- | ||
+ | Printing one line per active thread.... | ||
+ | ------------------------------------------------------------- | ||
+ | Your clock granularity/precision appears to be 1 microseconds. | ||
+ | Each test below will take on the order of 2170 microseconds. | ||
+ | (= 2170 clock ticks) | ||
+ | Increase the size of the arrays if this shows that | ||
+ | you are not getting at least 20 clock ticks per test. | ||
+ | ------------------------------------------------------------- | ||
+ | WARNING -- The above is only a rough guideline. | ||
+ | For best results, please be sure you know the | ||
+ | precision of your system timer. | ||
+ | ------------------------------------------------------------- | ||
+ | Function Rate (MB/s) Avg time Min time Max time | ||
+ | Copy: 12071.0251 0.0027 0.0027 0.0027 | ||
+ | Scale: 11632.6684 0.0028 0.0028 0.0028 | ||
+ | Add: 13065.5196 0.0037 0.0037 0.0037 | ||
+ | Triad: 13072.3065 0.0037 0.0037 0.0037 | ||
+ | ------------------------------------------------------------- | ||
+ | Solution Validates | ||
+ | ------------------------------------------------------------- | ||
+ | rael@newclusterh:~/stream$ | ||
+ | |||
+ | </pre> | ||
+ | ===Links=== | ||
+ | http://home.comcast.net/~fbui/bandwidth.html | ||
+ | [[Category:CoE]] |
Latest revision as of 14:10, 5 April 2011
<accesscontrol>teacher</accesscontrol>
Brug 100% CPU
Skal måske startes flere gange for at bruge begge kerne.
#!/bin/bash
while : ; do
true
done
Teste MPI båndbrede
Testen er lavet på vores Dell PowerEdge 1750 med onboard netkort i en 3550 100mbit/s switch
Hello from 0 of 2 Timer accuracy of ~0.953674 usecs Hello from 1 of 2 8 bytes took 331 usec ( 0.048 MB/sec) 16 bytes took 206 usec ( 0.155 MB/sec) 32 bytes took 213 usec ( 0.301 MB/sec) 64 bytes took 229 usec ( 0.559 MB/sec) 128 bytes took 266 usec ( 0.962 MB/sec) 256 bytes took 334 usec ( 1.533 MB/sec) 512 bytes took 472 usec ( 2.170 MB/sec) 1024 bytes took 758 usec ( 2.701 MB/sec) 2048 bytes took 1112 usec ( 3.683 MB/sec) 4096 bytes took 1402 usec ( 5.843 MB/sec) 8192 bytes took 2138 usec ( 7.664 MB/sec) 16384 bytes took 3555 usec ( 9.218 MB/sec) 32768 bytes took 6403 usec ( 10.235 MB/sec) 65536 bytes took 12508 usec ( 10.479 MB/sec) 131072 bytes took 23635 usec ( 11.091 MB/sec) 262144 bytes took 46029 usec ( 11.390 MB/sec) 524288 bytes took 90590 usec ( 11.575 MB/sec) 1048576 bytes took 179719 usec ( 11.669 MB/sec) Asynchronous ping-pong 8 bytes took 212 usec ( 0.075 MB/sec) 16 bytes took 205 usec ( 0.156 MB/sec) 32 bytes took 211 usec ( 0.303 MB/sec) 64 bytes took 237 usec ( 0.540 MB/sec) 128 bytes took 264 usec ( 0.969 MB/sec) 256 bytes took 336 usec ( 1.524 MB/sec) 512 bytes took 470 usec ( 2.179 MB/sec) 1024 bytes took 753 usec ( 2.720 MB/sec) 2048 bytes took 1076 usec ( 3.807 MB/sec) 4096 bytes took 1407 usec ( 5.822 MB/sec) 8192 bytes took 2141 usec ( 7.653 MB/sec) 16384 bytes took 3661 usec ( 8.950 MB/sec) 32768 bytes took 6428 usec ( 10.195 MB/sec) 65536 bytes took 12507 usec ( 10.480 MB/sec) 131072 bytes took 23708 usec ( 11.057 MB/sec) 262144 bytes took 46073 usec ( 11.380 MB/sec) 524288 bytes took 90696 usec ( 11.561 MB/sec) 1048576 bytes took 179756 usec ( 11.667 MB/sec) Bi-directional asynchronous ping-pong 8 bytes took 197 usec ( 0.081 MB/sec) 16 bytes took 189 usec ( 0.169 MB/sec) 32 bytes took 196 usec ( 0.327 MB/sec) 64 bytes took 200 usec ( 0.640 MB/sec) 128 bytes took 230 usec ( 1.114 MB/sec) 256 bytes took 324 usec ( 1.580 MB/sec) 512 bytes took 454 usec ( 2.256 MB/sec) 1024 bytes took 735 usec ( 2.786 MB/sec) 2048 bytes took 1075 usec ( 3.810 MB/sec) 4096 bytes took 1421 usec ( 5.765 MB/sec) 8192 bytes took 2187 usec ( 7.491 MB/sec) 16384 bytes took 3630 usec ( 9.027 MB/sec) 32768 bytes took 6534 usec ( 10.030 MB/sec) 65536 bytes took 13344 usec ( 9.823 MB/sec) 131072 bytes took 25400 usec ( 10.321 MB/sec) 262144 bytes took 49689 usec ( 10.551 MB/sec) 524288 bytes took 124407 usec ( 8.429 MB/sec) 1048576 bytes took 261184 usec ( 8.029 MB/sec) Max rate = 11.669048 MB/sec Min latency = 94.413757 usec
Og de adskiller sig jo ikke det store fra dem i Test Clusteret
Her er de så hvis man bruger Intel netkortet igennem en 100mbit/s switch:
Hello from 0 of 2 Timer accuracy of ~0.953674 usecs 8 bytes took 504 usec ( 0.032 MB/sec) 16 bytes took 266 usec ( 0.120 MB/sec) 32 bytes took 199 usec ( 0.322 MB/sec) Hello from 1 of 2 64 bytes took 262 usec ( 0.489 MB/sec) 128 bytes took 262 usec ( 0.977 MB/sec) 256 bytes took 491 usec ( 1.042 MB/sec) 512 bytes took 460 usec ( 2.227 MB/sec) 1024 bytes took 1012 usec ( 2.024 MB/sec) 2048 bytes took 1084 usec ( 3.778 MB/sec) 4096 bytes took 1780 usec ( 4.602 MB/sec) 8192 bytes took 2179 usec ( 7.519 MB/sec) 16384 bytes took 3735 usec ( 8.774 MB/sec) 32768 bytes took 6395 usec ( 10.248 MB/sec) 65536 bytes took 12623 usec ( 10.384 MB/sec) 131072 bytes took 13000 usec ( 20.165 MB/sec) 262144 bytes took 24516 usec ( 21.386 MB/sec) 524288 bytes took 47452 usec ( 22.098 MB/sec) 1048576 bytes took 91873 usec ( 22.827 MB/sec) Asynchronous ping-pong 8 bytes took 434 usec ( 0.037 MB/sec) 16 bytes took 189 usec ( 0.169 MB/sec) 32 bytes took 234 usec ( 0.273 MB/sec) 64 bytes took 221 usec ( 0.579 MB/sec) 128 bytes took 482 usec ( 0.531 MB/sec) 256 bytes took 325 usec ( 1.576 MB/sec) 512 bytes took 714 usec ( 1.435 MB/sec) 1024 bytes took 743 usec ( 2.756 MB/sec) 2048 bytes took 1255 usec ( 3.264 MB/sec) 4096 bytes took 1419 usec ( 5.773 MB/sec) 8192 bytes took 2247 usec ( 7.292 MB/sec) 16384 bytes took 3574 usec ( 9.168 MB/sec) 32768 bytes took 6691 usec ( 9.794 MB/sec) 65536 bytes took 13006 usec ( 10.078 MB/sec) 131072 bytes took 13254 usec ( 19.779 MB/sec) 262144 bytes took 24478 usec ( 21.419 MB/sec) 524288 bytes took 47442 usec ( 22.102 MB/sec) 1048576 bytes took 91929 usec ( 22.813 MB/sec) Bi-directional asynchronous ping-pong 8 bytes took 490 usec ( 0.033 MB/sec) 16 bytes took 247 usec ( 0.130 MB/sec) 32 bytes took 248 usec ( 0.258 MB/sec) 64 bytes took 247 usec ( 0.518 MB/sec) 128 bytes took 248 usec ( 1.032 MB/sec) 256 bytes took 498 usec ( 1.028 MB/sec) 512 bytes took 528 usec ( 1.939 MB/sec) 1024 bytes took 748 usec ( 2.738 MB/sec) 2048 bytes took 1261 usec ( 3.248 MB/sec) 4096 bytes took 1521 usec ( 5.386 MB/sec) 8192 bytes took 2270 usec ( 7.218 MB/sec) 16384 bytes took 3785 usec ( 8.658 MB/sec) 32768 bytes took 6564 usec ( 9.984 MB/sec) 65536 bytes took 13499 usec ( 9.710 MB/sec) 131072 bytes took 19491 usec ( 13.450 MB/sec) 262144 bytes took 36962 usec ( 14.185 MB/sec) 524288 bytes took 71936 usec ( 14.577 MB/sec) 1048576 bytes took 160475 usec ( 13.068 MB/sec) Max rate = 22.826658 MB/sec Min latency = 94.413757 usec
HDD test
http://www.linux-mag.com/id/7906/2/
iostat -x -m /dev/sda1 1
************************************* root@NewClusterH:~# hdparm -Tt /dev/sdb /dev/sdb: Timing cached reads: 19890 MB in 2.00 seconds = 9969.08 MB/sec Timing buffered disk reads: 2318 MB in 3.00 seconds = 772.65 MB/sec ************************************* test af write på SSD ******************************** root@NewClusterH:/ssd# dd count=500 bs=100M if=/dev/zero of=/ssd/test.img 500+0 records in 500+0 records out 52428800000 bytes (52 GB) copied, 68,0062 s, 771 MB/s ******************************** test af read på SSD ******************************** root@NewClusterH:/ssd# dd count=500 bs=100M if=/ssd/test.img of=/dev/null 500+0 records in 500+0 records out 52428800000 bytes (52 GB) copied, 63,5971 s, 824 MB/s ******************************** test af write på WD ******************************** root@NewClusterH:/raid5# dd count=500 bs=100M if=/dev/zero of=/raid5/test.img 500+0 records in 500+0 records out 52428800000 bytes (52 GB) copied, 150,079 s, 349 MB/s ******************************** test af read på WD ******************************** root@NewClusterH:/raid5# dd count=500 bs=100M of=/dev/null if=/raid5/test.img 500+0 records in 500+0 records out 52428800000 bytes (52 GB) copied, 156,591 s, 335 MB/s ********************************
Memory test
http://www.cs.virginia.edu/stream/ref.html#start
rael@newclusterh:~/stream$ gcc -O stream.c -o stream rael@newclusterh:~/stream$ ./stream ------------------------------------------------------------- STREAM version $Revision: 5.9 $ ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size = 2000000, Offset = 0 Total memory required = 45.8 MB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Printing one line per active thread.... ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 2170 microseconds. (= 2170 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) Avg time Min time Max time Copy: 12071.0251 0.0027 0.0027 0.0027 Scale: 11632.6684 0.0028 0.0028 0.0028 Add: 13065.5196 0.0037 0.0037 0.0037 Triad: 13072.3065 0.0037 0.0037 0.0037 ------------------------------------------------------------- Solution Validates ------------------------------------------------------------- rael@newclusterh:~/stream$