Difference between revisions of "Linux Cluster til Center of Excelence/nVidia GPU"

From Teknologisk videncenter
Jump to: navigation, search
m (nVidea Tesla S1070)
m (Documentation)
 
(22 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 +
= Billig løsning for at komme igang =
 +
Anvende nVidia [[ Linux Cluster til Center of Excelence/nVidia GPU#CUDA|CUDA]] compatibelt grafrikkort fx. [http://www.midtdata.dk/hardware_1000180575_XFX-Geforce-9800GTX+---512MB-DDR3---PCI-E---OC.html GeForce 9800GTX+] til kr. 1.361,- inkl. moms.
 +
 
=nVidea Tesla C1060 GPU =
 
=nVidea Tesla C1060 GPU =
 
*Pris pr. 14. maj 2010 kr. [http://www.edbpriser.dk/Product/Details.aspx?q=tesla&sp=all&pid=483374 7980,-]
 
*Pris pr. 14. maj 2010 kr. [http://www.edbpriser.dk/Product/Details.aspx?q=tesla&sp=all&pid=483374 7980,-]
[[Image:Tesla s1060 1.gif|Tesla S1060 GPU]]
+
[[Image:Tesla s1060 1.gif|S1060 GPU]]
 +
{|border=1 ;style="margin: 0 auto; text-align: center;cellpadding="5" cellspacing="0"
 +
|+Oplysninger hentet fra [http://www.nvidia.com/object/product_tesla_c1060_us.html Tesla c1060 Specifications]
 +
|Form Factor ||10.5" x 4.376", Dual Slot
 +
|-
 +
|# of Tesla GPUs ||1
 +
|-
 +
|# of Streaming Processor Cores ||240
 +
|-
 +
|Frequency of processor cores ||1.3 GHz
 +
|-
 +
|Single Precision floating point performance (peak) ||933
 +
|-
 +
|Double Precision floating point performance (peak) ||78
 +
|-
 +
|Floating Point Precision ||IEEE 754 single & double
 +
|-
 +
|Total Dedicated Memory ||4 GDDR3
 +
|-
 +
|Memory Speed ||800MHz
 +
|-
 +
|Memory Interface ||512-bit
 +
|-
 +
|Memory Bandwidth ||102 GB/sec
 +
|-
 +
|Max Power Consumption ||187.8 W
 +
|-
 +
|System Interface ||PCIe x16
 +
|-
 +
|Auxiliary Power Connectors ||6-pin & 8-pin
 +
|-
 +
|Thermal Solution ||Active fan sink
 +
|-
 +
|Software Development Tools ||[http://www.nvidia.com/object/tesla_software.html C-based CUDA Toolkit]
 +
|}
  
 
= nVidea Tesla S1070 =
 
= nVidea Tesla S1070 =
Tesla S1070 er et 1U kabinet med 4 GPU'er delt i to sektioner.  
+
Tesla S1070 er et 1U kabinet med 4 GPU'er delt i to sektioner. Tesla S1070 skal tilsluttes en eller to host PC'er. Pris cirka. kr. 60.000,- (Ser ikke ud til at kunne købes i DK)
 
{|
 
{|
 
|[[Image:Tesla s1070 1.gif|thumb|500px|left|4 GPU'er i kabinettet med 4 x 240 kerner = 960 kerner]]
 
|[[Image:Tesla s1070 1.gif|thumb|500px|left|4 GPU'er i kabinettet med 4 x 240 kerner = 960 kerner]]
Line 38: Line 75:
 
|Software Development Tools ||[http://www.nvidia.com/object/tesla_software.html C-based CUDA Toolkit]
 
|Software Development Tools ||[http://www.nvidia.com/object/tesla_software.html C-based CUDA Toolkit]
 
|}
 
|}
 +
=Sammenligning af GPU'er=
 +
{|border=1 ;style="margin: 0 auto; text-align: center;cellpadding="5" cellspacing="0"
 +
|-
 +
|Model || GFlops || Kerner || Mem. bandwith GB/s || kr. || kr. pr. GFlop
 +
|-
 +
|GTS 250 || 470 || 128 || 70 || 915 || 1.95
 +
|}
 +
 +
=CUDA=
 +
''C''ompute ''U''nified ''D''evice ''A''rchitecture
 +
*[http://developer.nvidia.com/page/home.html CUDA Developer homepage] **
 +
*See nVidia's [http://www.nvidia.com/object/cuda_home_new.html CUDA zone]
 +
*See nVidia's [http://developer.nvidia.com/object/gpu_computing_online.html Online Seminars]
 +
 +
==Documentation==
 +
*[http://developer.download.nvidia.com/GPU_Programming_Guide/GPU_Programming_Guide_G80.pdf Programming Guide CUDA 8]
 +
*[http://developer.download.nvidia.com/compute/cuda/2_3/toolkit/docs/NVIDIA_CUDA_BestPracticesGuide_2.3.pdf CUDA C best practices guide]
 +
*[http://developer.nvidia.com/object/cuda-by-example.html CUDA by example] (Also available on Safari)
 +
 +
==Links til artikelserie af Rob Farber==
 +
*[http://www.drdobbs.com/high-performance-computing/207200659 CUDA, Supercomputing for the Masses: Part 1] CUDA lets you work with familiar programming concepts while developing software that can run on a GPU
 +
*[http://www.drdobbs.com/high-performance-computing/207402986 CUDA, Supercomputing for the Masses: Part 2] A first kernel
 +
*[http://www.drdobbs.com/high-performance-computing/207603131 CUDA, Supercomputing for the Masses: Part 3] Error handling and global memory performance limitations
 +
*[http://www.drdobbs.com/architecture-and-design/208401741 CUDA, Supercomputing for the Masses: Part 4] Understanding and using shared memory (1)
 +
*[http://www.drdobbs.com/high-performance-computing/208801731 CUDA, Supercomputing for the Masses: Part 5] Understanding and using shared memory (2)
 +
*[http://www.drdobbs.com/architecture-and-design/209601096 CUDA, Supercomputing for the Masses: Part 6] Global memory and the CUDA profiler
 +
*[http://www.drdobbs.com/high-performance-computing/210102115 CUDA, Supercomputing for the Masses: Part 7] Double the fun with next-generation CUDA hardware
 +
*[http://www.drdobbs.com/architecture-and-design/210602684 CUDA, Supercomputing for the Masses: Part 8] Using libraries with CUDA
 +
*[http://www.drdobbs.com/high-performance-computing/211800683 CUDA, Supercomputing for the Masses: Part  9] Extending High-level Languages with CUDA
 +
*[http://www.drdobbs.com/architecture-and-design/212903437  CUDA, Supercomputing for the Masses: Part 10] CUDPP, a powerful data-parallel CUDA library
 +
*[http://www.drdobbs.com/high-performance-computing/215900921 CUDA, Supercomputing for the Masses: Part 11] Revisiting CUDA memory spaces
 +
*[http://www.drdobbs.com/high-performance-computing/217500110 CUDA, Supercomputing for the Masses: Part 12] CUDA 2.2 Changes the Data Movement Paradigm
 +
*[http://www.drdobbs.com/high-performance-computing/218100902 CUDA, Supercomputing for the Masses: Part 13] Using texture memory in CUDA
 +
*[http://www.drdobbs.com/high-performance-computing/220601124 CUDA, Supercomputing for the Masses: Part 14] Debugging CUDA and using CUDA-GDB
 +
*[http://www.drdobbs.com/architecture-and-design/222600097 CUDA, Supercomputing for the Masses: Part 15] Using Pixel Buffer Objects with CUDA and OpenGL
 +
== Links til kursusbeskrivelser ==
 +
*[http://courses.ece.illinois.edu/ece498/al/index.html Applied Parallel Programming] UNIVERSITY OF ILLINOIS '''(+++)'''
 +
*[http://developer.nvidia.com/object/gpu_computing_online.html GPU Computing Online Seminars Juni juli 2010]
 +
 +
== Software ==
 +
*[http://developer.nvidia.com/object/cuda_3_0_downloads.html GetCUDA]
 +
**[http://developer.download.nvidia.com/compute/cuda/3_0/docs/GettingStartedWindows.pdf Getting Started in Windows environment]
 +
== Perl ==
 +
*[http://psilambda.com/products/kappa/ Kappa] CUDA made easier
 +
*[http://search.cpan.org/~brian/KappaCUDA-1.1.1/KappaCUDA.pod KappaCUDA] Perl module
 +
 +
=Links=
 +
*[http://research.microsoft.com/en-us/um/redmond/events/escience2008/matsuoka-escience2008.pdf Tsunami cluster] Interessant artikel bla. strømforbrug
 +
*[http://forums.nvidia.com/lofiversion/index.php?t70731.html MatLAB and Tesla] How-to eksempel med Linux
 +
*[http://www.computer.org/portal/c/document_library/get_file?uuid=6fefff2f-cf51-487b-b744-18ac6f3872ac&groupId=53319 IEEE on GPU's] '''Good HeTh'''
 +
[[Category:Cluster]] [[Category:Linux]][[Category:CoE]][[Category:CUDA]]

Latest revision as of 10:04, 21 November 2010

Billig løsning for at komme igang

Anvende nVidia CUDA compatibelt grafrikkort fx. GeForce 9800GTX+ til kr. 1.361,- inkl. moms.

nVidea Tesla C1060 GPU

  • Pris pr. 14. maj 2010 kr. 7980,-

S1060 GPU

Oplysninger hentet fra Tesla c1060 Specifications
Form Factor 10.5" x 4.376", Dual Slot
# of Tesla GPUs 1
# of Streaming Processor Cores 240
Frequency of processor cores 1.3 GHz
Single Precision floating point performance (peak) 933
Double Precision floating point performance (peak) 78
Floating Point Precision IEEE 754 single & double
Total Dedicated Memory 4 GDDR3
Memory Speed 800MHz
Memory Interface 512-bit
Memory Bandwidth 102 GB/sec
Max Power Consumption 187.8 W
System Interface PCIe x16
Auxiliary Power Connectors 6-pin & 8-pin
Thermal Solution Active fan sink
Software Development Tools C-based CUDA Toolkit

nVidea Tesla S1070

Tesla S1070 er et 1U kabinet med 4 GPU'er delt i to sektioner. Tesla S1070 skal tilsluttes en eller to host PC'er. Pris cirka. kr. 60.000,- (Ser ikke ud til at kunne købes i DK)

4 GPU'er i kabinettet med 4 x 240 kerner = 960 kerner
Tesla S1070 tilsluttes en eller to host PC'er (Der er to PCI-express kanaler)

Specifikationer

Oplysninger hentet fra Tesla S1070 Specifications
Number of Tesla GPUs 4
Number of Streaming Processor Cores 960 (240 per processor)
Frequency of processor cores 1.296 to 1.44 GHz
Single Precision floating point performance (peak) 3.73 to 4.14 TFlops
Double Precision floating point performance (peak) 311 to 345 GFlops
Floating Point Precision IEEE 754 single & double
Total Dedicated Memory 16
Memory Interface 512-bit
Memory Bandwidth 408 GB/sec
Max Power Consumption 800 W (typical)
System Interface PCIe x16 or x8
Software Development Tools C-based CUDA Toolkit

Sammenligning af GPU'er

Model GFlops Kerner Mem. bandwith GB/s kr. kr. pr. GFlop
GTS 250 470 128 70 915 1.95

CUDA

Compute Unified Device Architecture

Documentation

Links til artikelserie af Rob Farber

Links til kursusbeskrivelser

Software

Perl

Links