System Information | ||||
Affiliation: | Argonne National Lab - LCF | URL: | www.alcf.anl.gov | |
Location: | USA, Illinois, Argonne | System Use: | Government | |
System Manufacturer: | IBM | System Name: | Blue Gene/P | |
Interconnect Manufacturer: | IBM | Interconnect Type: | Torus | |
Operating System: | Blue Gene CNK | MPI: | MPICH 2 | |
MPI Wtick: | 0.000000001176471 | BLAS: | ESSL 4.3 | |
Language: | C | Compiler: | IBM XL C/C++ 9.00 | |
Compiler Flags: | -DHPCC_MEMALLCTR -g -O3 -qhot -qsmp=omp -qmaxmem=-1 -DBGPOPT | Processor Type: | PowerPC 450 | |
Processor Speed: | 0.85 GHz | Total Processors: | 40960 | |
Processors Entered: | 32768 | Processors determined: | 131072 | |
Cores per chip: | 4 | HPL Processes: | 32768 | |
MPI Processes: | 32768 | Threads Entered: | 4 | |
Threads determined: | 4 | FLOPs per cycle: | ||
Theoretical peak: | 557 TFlop/s | Total memory: | 81920 GiB | |
FFT library: | ||||
Explain Optimizations: | ||||
The RandomAccess algorithm is similar in principle to the algorithm The IBM Blue Gene/P system supports direct use of messaging DMA hardware in parallel with use of MPI for applications messaging. To enable this direct use mode for DMA an initialization call to setup the DMA fifos must be executed before invoking the MPI_Init call. The optimized HPCC code has introduced a function call dma_init which is invoked just before MPI_init for this purpose. This is a method that has been put in to support special messaging situations and that is used in a number of production codes including QCD. It is also well documented in the Blue Gene redbook.
|
HPL | ||||
HPL: | 173.362 Tflop/s | HPL time: | 57080.7 | |
HPL eps: | 1.11022e-16 | HPL Rnorm1: | 0.00000115715 | |
HPL Anorm1: | 615309 | HPL AnormI: | 615249 | |
HPL Xnorm1: | 4574500 | HPL XnormI: | 11.9858 | |
HPL N: | 2457601 | HPL NB: | 120 | |
HPL NProw: | 128 | HPL NPcol: | 256 | |
HPL depth: | 1 | HPL NBdiv: | 6 | |
HPL NBmin: | 6 | HPL CPfact: | C | |
HPL CRfact: | R | HPL CPtop: | 3 | |
HPL order: | R | |||
HPL dMach EPS: | 1.110223e-16 | HPL sMach EPS: | 0.00000005960464 | |
HPL dMach sfMin: | 0 | HPL sMach sfMin: | 1.175494e-38 | |
HPL dMach Base: | 2 | HPL sMach Base: | 2 | |
HPL dMach Prec: | 2.220446e-16 | HPL sMach Prec: | 0.0000001192093 | |
HPL dMach mLen: | 53 | HPL sMach mLen: | 24 | |
HPL dMach Rnd: | 1 | HPL sMach Rnd: | 1 | |
HPL dMach eMin: | -1021 | HPL sMach eMin: | -125 | |
HPL dMach rMin: | 0 | HPL sMach rMin: | 1.175494e-38 | |
HPL dMach eMax: | 1024 | HPL sMach eMax: | 128 | |
HPL dMach rMax: | 1.797693e308 | HPL sMach rMax: | 3.402823e38 | |
dweps: | 1.110223e-16 | sweps: | 0.00000005960464 |
PTRANS | ||||
PTRANS: | 625.204 GB/s | PTRANS time: | 18.948 seconds | |
PTRANS residual: | 0 | PTRANS N: | 1228800 | |
PTRANS NB: | 120 | PTRANS NProw: | 128 | |
PTRANS NPcol: | 256 |
STREAM | ||||
S-STREAM Copy: | 5.43815 GB/s | S-STREAM Scale: | 3.62631 GB/s | |
S-STREAM Add: | 3.97957 GB/s | S-STREAM Triad: | 3.97997 GB/s | |
EP-STREAM Copy: | 5.43754 GB/s | EP-STREAM Scale: | 3.6263 GB/s | |
EP-STREAM Add: | 3.9796 GB/s | EP-STREAM Triad: | 3.97996 GB/s | |
STREAM Vector Size: | 61440050 | STREAM Threads: | 4 |
RandomAccess | ||||
S-RandomAccess: | 0.0096932 Gup/s | EP-RandomAccess: | 0.00969341 Gup/s | |
G-RandomAccess: | 103.18 Gup/s | G-RandomAccess N: | 4398046511104 | |
G-RandomAccess time: | 170.5 seconds | G-RandomAccess Check Time: | 1009.14 seconds | |
G-RandomAccess Errors: | 0 | G-RandomAccess Errors Fraction: | 0 | |
G-RandomAccess TimeBound: | -1 | G-RandomAccess ExeUpdates: | 17592186044416 | |
RandomAccess N: | 134217728 |
FFT | ||||
S-FFT: | 1.21389 GFlop/s | EP-FFT: | 1.21354 GFlop/s | |
MPIFFT: | 5079.59 GFlop/s | MPIFFT N: | 549755813888 | |
MPIFFT Max Error: | 0.0000000000000024651 | MPIFFT time0: | 0.397244 seconds | |
MPIFFT time1: | 4.26304 seconds | MPIFFT time2: | 2.08924 seconds | |
MPIFFT time3: | 5.30936 seconds | MPIFFT time4: | 3.88742 seconds | |
MPIFFT time5: | 4.96885 seconds | MPIFFT time6: | 0.189394 seconds | |
FFTEnblk: | 16 | FFTEnp: | 8 | |
FFTEl2size: | 1048576 |
DGEMM | ||||
S-DGEMM: | 9.67524 GFlop/s | EP-DGEMM: | 9.67646 GFlop/s | |
DGEMM N: | 7837 |
RandomRing Latency/Bandwidth | ||||
RandomRing Latency: | 6.23889 usec | RandomRing Bandwidth: | 0.0219922 GB/s |
NaturalRing Latency/Bandwidth | ||||
NaturalRing Latency: | 4.85518 usec | NaturalRing Bandwidth: | 0.743607 GB/s |
PingPong Latency/Bandwidth | ||||
Maximum PingPong Latency: | 6.61654 usec | Maximum PingPong Bandwidth: | 0.385704 GB/s | |
Minimum PingPong Latency: | 3.58265 usec | Minimum PingPong Bandwidth: | 0.379582 GB/s | |
Average PingPong Latency: | 5.06575 usec | Average PingPong Bandwidth: | 0.385048 GB/s |
Size of Data Types | ||||
char: | 1 byte | short: | 2 bytes | |
int: | 4 bytes | long: | 4 bytes | |
void ptr: | 4 bytes | float: | 4 bytes | |
double: | 8 bytes | size t: | 4 bytes | |
s64Int: | 8 bytes | u64Int: | 8 bytes |
OpenMP | ||||
M OpenMP: | 200505 | OpenMP Num Threads: | 4 | |
OpenMP Num Procs: | 4 | OpenMP Max Threads: | 4 |
Memory | ||||
MemProc: | -1 | MemSpec: | -1 | |
MemVal: | -1 |
CPS | ||||
CPS_HPCC_FFT_235: | 0 | CPS_HPCC_FFTW_ESTIMATE: | 0 | |
CPS_HPCC_MEMALLCTR: | 1 | CPS_HPL_USE_GETPROCESSTIMES: | 0 | |
CPS_RA_SANDIA_NOPT: | 0 | CPS_RA_SANDIA_OPT2: | 0 |