The STREAM2 Home Page
Introduction
STREAM2 is an attempt to extend the functionality of the STREAM benchmark
in two important ways:
STREAM2 measures sustained bandwidth at all levels of the cache hierarchy,
and
STREAM2 more clearly exposes the performance differences between reads
and writes
STREAM2 is based on the same ideas as STREAM, but uses a different
set of vector kernels:
FILL: similar to bzero(), but
fills with a constant instead of zero
COPY: similar to bcopy(), and
the same as STREAM Copy
DAXPY: similar to STREAM Triad, but overwrites one of
the input vectors instead of writing results to a third vector
SUM: sum reduction on a single
vector -- reads only, no writes
Kernel
Code
Bytes/iter
read
Bytes/iter
written
FLOPS/iter
COPY
a(i) = b(i)
8 (+8)
8
0
DAXPY
a(i) = a(i) + q*b(i)
16
8
2
SUM
sum = sum + a(i)
8
0
1
Table 1: Characteristics of the STREAM2 kernels. The value
in parentheses in the "Bytes/iter read" column indicates the number of
additional bytes read per iteration on machines with a "write allocate"
cache policy.
Source Code
The
STREAM2
source code is provided in Fortran77 -- you are welcome to translate
it to C, but I have not gotten around to it yet. The control
flow is a bit more complex than STREAM because of the looping over multiple
iterations of many different vector lengths.
The main feature is that the same number of work is done for each vector
length, so the shorter vector lengths are iterated many times and the longer
vector lengths fewer times.
Sample Results
Here are some sample results off of machines in my house and office.
The machines listed are described in Table 2.
Machine
CPU
MHz
L1
Data
Cache
L2
Data Cache
Peak
L2 cache
Bandwidth
bus
width @ speed
Peak
Memory
Bandwidth
IBM RS/6000-397
POWER2-SC
160
128kB @ 160 MHz
none
N/A
256 bits @ 80 MHz
2560 MB/s
Upgraded Mac clone
PowerPC G3
367.5
32kB @ 367.5 MHz
512 kB @ 183.75 MHz
2940 MB/s
64 bits @ 52.5 MHz
420 MB/s
PowerComputing
PowerCurve 601/120
PowerPC 601
120
64kB (I+D) @ 120 MHz
256kB @ 40 MHz
320 MB/s
64 bits @ 40 MHz
320 MB/s
Mac Quadra 650
Motorola 68040
33
8 kB @ 33 MHz
none
N/A
32 bits @ 33 MHz
132 MB/s
FILL
COPY
DAXPY
SUM