HPC Challenge Benchmark Results - HPCC Results - Optimized Runs Only - 41 Systems

Home	Rules	News	Download	FAQ	Links	Collaborators	Sponsors	Upload	Results

Manufacturer/Processor Type - Speed - Count - Threads - Processes

Includes the manufacturer/processor type, processor speed, number of processors, threads, and number of processes.
Move mouse over this column for each row to display additional information, including; manufacturer, system name, interconnect, MPI, affiliation, and submission date.

Computer System

Name and version of Message Passing Interface (MPI) implementation.

Run Type

Run Type, indicates whether the benchmark was a base run or was optimized.

Processors

Processors, this is the number of processors used in the benchmark, entered in the form by the benchmark submitter.

G-HPL ( system performance )

HPL, solves a randomly generated dense linear system of equations in double floating-point precision (IEEE 64-bit) arithmetic using MPI. The linear system matrix is stored in a two-dimensional block-cyclic fashion and multiple variants of code are provided for computational kernels and communication patterns. The solution method is LU factorization through Gaussian elimination with partial row pivoting followed by a backward substitution. Unit: Tera Flops per Second

G-PTRANS (A=A+B^T, MPI) ( system performance )

PTRANS (A=A+B^T, MPI), implements a parallel matrix transpose for two-dimensional block-cyclic storage. It is an important benchmark because it exercises the communications of the computer heavily on a realistic problem where pairs of processors communicate with each other simultaneously. It is a useful test of the total communications capacity of the network. Unit: Giga Bytes per Second

S-DGEMM ( single MPI process )

Single MPI process DGEMM, benchmark measures the floating-point execution rate of double precision real matrix-matrix multiply performed by the DGEMM subroutine from the BLAS (Basic Linear Algebra Subprograms). It is run on single computational process chosen at random. Unit: Giga Flops per Second

EP-DGEMM ( embarrassingly parallel )

Embarrassingly Parallel DGEMM, benchmark measures the floating-point execution rate of double precision real matrix-matrix multiply performed by the DGEMM subroutine from the BLAS (Basic Linear Algebra Subprograms). It is run in embarrassingly parallel manner - all computational processes perform the benchmark at the same time, the arithmetic average rate is reported. Unit: Giga Flops per Second

S-STREAM ( single MPI process )

The Single MPI process STREAM benchmark is a simple synthetic benchmark program that measures sustainable memory bandwidth and the corresponding computation rate for simple vector kernels. It is run on single computational chosen at random. Unit: Giga Bytes per Second

EP-STREAM ( per process )

The Embarrassingly Parallel STREAM benchmark is a simple synthetic benchmark program that measures sustainable memory bandwidth and the corresponding computation rate for simple vector kernels. It is run in embarrassingly parallel manner - all computational processes perform the benchmark at the same time, the average is computed. Unit: Giga Bytes per Second

S-Random Access ( single MPI process )

Single MPI process Random Access (also called GUPs), measures the rate at which the computer can update pseudo-random locations of its memory. The single CPU version runs the code locally on a randomly chosen processor. No explicit communication is performed and so the performance of the local memory subsystem is revealed. Unit: Giga Updates per Second

EP-RandomAccess ( embarrassingly parallel )

Embarrassingly Parallel Random Access (also called GUPs), measures the rate at which the computer can update pseudo-random locations of its memory. The embarrassingly parallel version runs the code locally on each processor. No explicit communication is performed (but shared-memory effects might occur). Unit: Giga Updates per Second

G-Random Access ( system performance )

Global Random Access (also called GUPs), measures the rate at which the computer can update pseudo-random locations of its memory - this rate is expressed in billions (giga) of updates per second (GUP/s). The MPI version generates the updating sequence locally and then distributes it using all-to-all collective communication. Unit: Giga Updates per Second

S-FFT ( single MPI process )

Single MPI process FFT, measures the floating point rate of execution of double precision complex one-dimensional Discrete Fourier Tranform (DFT). The vector size is a power of two. Unit: Giga Flops per Second

EP-FFT ( embarrassingly parallel )

Embarrassingly Parallel FFT, performs the same test as FFT but in embarrassingly parallel fashion - the code is run locally on each processor. No explicit communication is performed (but shared-memory effects might occur). Unit: Giga Flops per Second

G-FFT ( system performance )

Global FFT, performs the same test as FFT but across the entire system by distributing the input vector in block fashion across all the processes. Unit: Giga Flops per Second

Maximum Ping-Pong Latency

Maximum Ping-Pong Latency, reports the maximum latency for a number of non-simultaneous ping-pong tests. The ping-pongs are performed between as many as possible (there is an upper bound on the time it takes to complete this test) distinct pairs of processors. The test uses MPI standard send and receive routines. Unit: micro-seconds

Randomly-Ordered Ring Latency ( per process )

Randomly-Ordered Ring Latency, reports latency in the ring communication pattern. The communicating processes are ordered randomly in the ring. The result is averaged over various random assignments of processes in the ring.
Unit: micro-seconds

Minimum Ping-Pong Bandwidth

Minimum Ping-Pong Bandwidth, reports the minimum bandwidth for a number of non-simultaneous ping-pong tests. The ping-pongs are performed between as many as possible (there is an upper bound on the time it takes to complete this test) distinct pairs of processors. The test uses MPI standard send and receive routines. Unit: Giga Bytes per second

Randomly Ordered Ring Bandwidth ( per process )

Randomly Ordered Ring Bandwidth, reports bandwidth achieved in the ring communication pattern. The communicating processes are ordered randomly in the ring. The result is averaged over various random assignments of processes in the ring.
Unit: Giga Bytes per second per process

Naturally Ordered Ring Bandwidth ( per process )

Naturally Ordered Ring Bandwidth, reports bandwidth achieved in the ring communication pattern. The ring is formed with consecutive processes in MPI_COMM_WORLD.
Unit: Giga Bytes per second per process

Description above

See the row above this column for this columns description

Description below

See the row below this column for this columns description