System Information | ||||
Affiliation: | Oak Ridge National Lab | URL: | http://nccs.gov/ | |
Location: | USA, Tennessee, Oak Ridge | System Use: | Research | |
System Manufacturer: | Cray Inc. | System Name: | XT3 Dual-Core | |
Interconnect Manufacturer: | Cray, Inc | Interconnect Type: | Cray SeaStar | |
Operating System: | Unicos/lc 1.5.25 | MPI: | xt-mpt 1.5.25 | |
MPI Wtick: | 0.000001 | BLAS: | ACML 3.0 | |
Language: | C | Compiler: | PGI 6.1.4 | |
Compiler Flags: | -fastsse | Processor Type: | AMD Opteron | |
Processor Speed: | 2.6 GHz | Total Processors: | 10424 | |
Processors Entered: | 10404 | Processors determined: | 10404 | |
Cores per chip: | 2 | HPL Processes: | 10404 | |
MPI Processes: | 10404 | Threads Entered: | 1 | |
Threads determined: | 1 | FLOPs per cycle: | 2 | |
Theoretical peak: | 54.1 TFlop/s | Total memory: | GiB | |
FFT library: | ||||
Explain Optimizations: | ||||
* Replaced Streams C code with equivalent assembly code. * Used vendor MPI optimization to MPI_Cart_* functions to put communicating neighbors on the same physical node. * Used MPIRandomAccess optimization from Sandia that combined messages so that many small messages could be combined into fewer large messages that were then passed together via alltoall operations. This work is documented at: http://www.cs.sandia.gov/~sjplimp/algorithms.html#gups |
HPL | ||||
HPL: | 43.5056 Tflop/s | HPL time: | 18485.3 | |
HPL eps: | 1.11022e-16 | HPL Rnorm1: | 0.000000226069 | |
HPL Anorm1: | 266722 | HPL AnormI: | 266717 | |
HPL Xnorm1: | 2223320 | HPL XnormI: | 13.0527 | |
HPL N: | 1064520 | HPL NB: | 60 | |
HPL NProw: | 102 | HPL NPcol: | 102 | |
HPL depth: | 1 | HPL NBdiv: | 2 | |
HPL NBmin: | 4 | HPL CPfact: | R | |
HPL CRfact: | R | HPL CPtop: | 1 | |
HPL order: | R | |||
HPL dMach EPS: | 1.110223e-16 | HPL sMach EPS: | 0.00000005960464 | |
HPL dMach sfMin: | 0 | HPL sMach sfMin: | 1.175494e-38 | |
HPL dMach Base: | 2 | HPL sMach Base: | 2 | |
HPL dMach Prec: | 2.220446e-16 | HPL sMach Prec: | 0.0000001192093 | |
HPL dMach mLen: | 53 | HPL sMach mLen: | 24 | |
HPL dMach Rnd: | 1 | HPL sMach Rnd: | 1 | |
HPL dMach eMin: | -1021 | HPL sMach eMin: | -125 | |
HPL dMach rMin: | 0 | HPL sMach rMin: | 1.175494e-38 | |
HPL dMach eMax: | 1024 | HPL sMach eMax: | 128 | |
HPL dMach rMax: | 1.797693e308 | HPL sMach rMax: | 3.402823e38 | |
dweps: | 1.110223e-16 | sweps: | 0.00000005960464 |
PTRANS | ||||
PTRANS: | 2038.92 GB/s | PTRANS time: | 1.11157 seconds | |
PTRANS residual: | 0 | PTRANS N: | 532260 | |
PTRANS NB: | 63 | PTRANS NProw: | 102 | |
PTRANS NPcol: | 102 |
STREAM | ||||
S-STREAM Copy: | 5.38758 GB/s | S-STREAM Scale: | 4.13775 GB/s | |
S-STREAM Add: | 3.44623 GB/s | S-STREAM Triad: | 5.17433 GB/s | |
EP-STREAM Copy: | 2.54167 GB/s | EP-STREAM Scale: | 2.23672 GB/s | |
EP-STREAM Add: | 2.05365 GB/s | EP-STREAM Triad: | 2.55092 GB/s | |
STREAM Vector Size: | 36305920 | STREAM Threads: | 1 |
RandomAccess | ||||
S-RandomAccess: | 0.0181075 Gup/s | EP-RandomAccess: | 0.0101491 Gup/s | |
G-RandomAccess: | 10.6711 Gup/s | G-RandomAccess N: | 1099511627776 | |
G-RandomAccess time: | 4.06674 seconds | G-RandomAccess Check Time: | 23.9385 seconds | |
G-RandomAccess Errors: | 0 | G-RandomAccess Errors Fraction: | 0 | |
G-RandomAccess TimeBound: | 4621.32 | G-RandomAccess ExeUpdates: | 43396665408 | |
RandomAccess N: | 67108864 |
FFT | ||||
S-FFT: | 0.735931 GFlop/s | EP-FFT: | 0.653607 GFlop/s | |
MPIFFT: | 1122.7 GFlop/s | MPIFFT N: | 68719476736 | |
MPIFFT Max Error: | 0.00000000000000221632 | MPIFFT time0: | 0 seconds | |
MPIFFT time1: | 2.58289 seconds | MPIFFT time2: | 1.32588 seconds | |
MPIFFT time3: | 2.12467 seconds | MPIFFT time4: | 2.33529 seconds | |
MPIFFT time5: | 2.47716 seconds | MPIFFT time6: | 0.00000190735 seconds | |
FFTEnblk: | 16 | FFTEnp: | 4 | |
FFTEl2size: | 1048576 |
DGEMM | ||||
S-DGEMM: | 4.79513 GFlop/s | EP-DGEMM: | 4.79356 GFlop/s | |
DGEMM N: | 5218 |
RandomRing Latency/Bandwidth | ||||
RandomRing Latency: | 17.0356 usec | RandomRing Bandwidth: | 0.0820073 GB/s |
NaturalRing Latency/Bandwidth | ||||
NaturalRing Latency: | 16.1443 usec | NaturalRing Bandwidth: | 0.201726 GB/s |
PingPong Latency/Bandwidth | ||||
Maximum PingPong Latency: | 8.68738 usec | Maximum PingPong Bandwidth: | 1.15307 GB/s | |
Minimum PingPong Latency: | 5.36442 usec | Minimum PingPong Bandwidth: | 1.14708 GB/s | |
Average PingPong Latency: | 7.00607 usec | Average PingPong Bandwidth: | 1.15009 GB/s |
Size of Data Types | ||||
char: | 1 byte | short: | 2 bytes | |
int: | 4 bytes | long: | 8 bytes | |
void ptr: | 8 bytes | float: | 4 bytes | |
double: | 8 bytes | size t: | 8 bytes | |
s64Int: | 8 bytes | u64Int: | 8 bytes |
OpenMP | ||||
M OpenMP: | -1 | OpenMP Num Threads: | 0 | |
OpenMP Num Procs: | 0 | OpenMP Max Threads: | 0 |
Memory | ||||
MemProc: | -1 | MemSpec: | -1 | |
MemVal: | -1 |
CPS | ||||
CPS_HPCC_FFT_235: | CPS_HPCC_FFTW_ESTIMATE: | |||
CPS_HPCC_MEMALLCTR: | CPS_HPL_USE_GETPROCESSTIMES: | |||
CPS_RA_SANDIA_NOPT: | CPS_RA_SANDIA_OPT2: |