HPC Challenge Benchmark Results - Global and per Processor Results - Optimized Runs Only - 41 Systems

Home	Rules	News	Download	FAQ	Links	Collaborators	Sponsors	Upload	Results

Manufacturer/Processor Type, Speed, Count, Threads, Processes

Includes the manufacturer/processor type, processor speed, number of processors, threads, and number of processes.
Move mouse over this column for each row to display additional information, including; manufacturer, system name, interconnect, MPI, affiliation, and submission date.

Run Type

Run Type, indicates whether the benchmark was a base run or was optimized.

Processors

Processors, this is the number of processors used in the benchmark, entered in the form by the benchmark submitter.

G-HPL ( system performance )

HPL, Solves a randomly generated dense linear system of equations in double floating-point precision (IEEE 64-bit) arithmetic using MPI. The linear system matrix is stored in a two-dimensional block-cyclic fashion and multiple variants of code are provided for computational kernels and communication patterns. The solution method is LU factorization through Gaussian elimination with partial row pivoting followed by a backward substitution. Unit: Tera Flops per Second

PP-HPL ( per processor )

G-PTRANS (A=A+B^T, MPI) ( system performance )

PTRANS (A=A+B^T, MPI), Implements a parallel matrix transpose for two-dimensional block-cyclic storage. It is an important benchmark because it exercises the communications of the computer heavily on a realistic problem where pairs of processors communicate with each other simultaneously. It is a useful test of the total communications capacity of the network. Unit: Giga Bytes per Second

PP-PTRANS (A=A+B^T, MPI) ( per processor )

G-RandomAccess ( system performance )

Global RandomAccess, also called GUPs, measures the rate at which the computer can update pseudo-random locations of its memory - this rate is expressed in billions (giga) of updates per second (GUP/s). Unit: Giga Updates per Second

PP-RandomAccess ( per processor )

PP-RandomAccess, also called GUPs, measures the rate at which the computer can update pseudo-random locations of its memory - this rate is expressed in billions (giga) of updates per second (GUP/s). Unit: Giga Updates per Second

EP-STREAM ( per process )

The Embarrassingly Parallel STREAM benchmark is a simple synthetic benchmark program that measures sustainable memory bandwidth and the corresponding computation rate for simple numerical vector kernels. It is run in embarrassingly parallel manner - all computational processes perform the benchmark at the same time, the arithmetic average rate is reported. Unit: Giga Bytes per Second

G-FFT ( system performance )

Global FFT, performs the same test as FFT but across the entire system by distributing the input vector in block fashion across all the processes. Unit: Giga Flops per Second

PP-FFT ( per processor )

PP-FFT, performs the same test as FFT but across the entire system by distributing the input vector in block fashion across all the processes. Unit: Giga Flops per Second

EP-DGEMM ( per process )

Embarrassingly Parallel DGEMM, benchmark measures the floating-point execution rate of double precision real matrix-matrix multiply performed by the DGEMM subroutine from the BLAS (Basic Linear Algebra Subprograms). It is run in embarrassingly parallel manner - all computational processes perform the benchmark at the same time, the arithmetic average rate is reported. Unit: Giga Flops per Second

Randomly Ordered Ring Bandwidth ( per process )

Randomly Ordered Ring Bandwidth, reports bandwidth achieved in the ring communication pattern. The communicating processes are ordered randomly in the ring (with respect to the natural ordering of the MPI default communicator). The result is averaged over various random assignments of processes in the ring. Unit: Giga Bytes per second per process

Randomly-Ordered Ring Latency ( per process )

Randomly-Ordered Ring Latency ( per process ), reports latency in the ring communication pattern. The communicating processes are ordered randomly in the ring (with respect to the natural ordering of the MPI default communicator) in the ring. The result is averaged over various random assignments of processes in the ring. Unit: micro-seconds

Global and per Processor Results - Optimized Runs Only - 41 Systems - Generated on Thu Jun 23 15:56:50 2022