about the Japanese Earth Simulator vector processor, in particular the
length of any given vector module. Today, an article on nanowires
from the Hot Chips conference gave (or implied) some of this
information. Question: am I interpreting this correctly? If you
have more information than I do, and I make a mistake, please post a
correction and maybe copy me?
The Earth simulator has 5120 processors and a performance of 40
Teraflops peak, which I knew. But today I learned that it has 640
nodes, which implies a "vector length" of 8 (5120/640 = 8). This
surprises me; I would have thought the vector length would be more
like 32 or 64. Obviously not.
If we divide 40 Tflops by 5,120 processors, we come up with 7.8
Gflops/sec for each processor. Assuming the processors use
multiply-addition (aka accumulate), that's 3.9 giga mult-adds per
processor per second. Assuming 64-bit precision and a 64-bit data
width per processor, that means data must be fetched and stored at a
4Ghz rate - slightly faster than the clock rate of the current fastest
P4. This means the memory must be on-chip because there is no way,
today, to move data over a conventional bus at that rate.
In contrast, we have the proposed Cray Red Storm Opteron-based
supercomputer using 10,368 Opterons. Surprisingly, this is based on
groups of 4 (not 8) Opterons on one circuit board. I guess they
couldn't place 8 on a reasonable-sized board along with cooling and
memory. In any event, the peak performance is also 40Tflops, or
3.8Gflops per Opteron. That's two flops every 1.9GHz clock, which the
Opteron's SSE2 unit can do at 64-bit precision.
Remember, Red Storm is a proposal while the Earth Simulator actually
exists and is busily crunching numbers as you read this.
For "vector-friendly" problems such as Linpack, the Japanese vector
processor can achieve higher real-world performance than the
Opteron-based system. Contrary-wise, there are real-world algorithms
that are more friendly to the Opteron-based system. This is probably
why the US Government's official body promoting supercomputers is
pursuing both kinds of supercomputers.
It seems to me that the Opteron-based approach has a HUGE cost
advantage over a vector processor. There is very little demand (one?)
for an Earth Simulator vector processor, so the processor design must
be customized just for one vector-processing computer. _Fiercely_
expensive development cost ($400 million admitted Earth Simulator
price tag) to be amortized over a production run of one computer?
Ouch!
Again, if you have more information on the Earth Simulator design,
please share it with us.