PDA

View Full Version : IBM Hurricane chipset leads x86 tpmc 4-way


Robert Myers
04-01-2005, 06:40 AM
Greetings,

In the competition for 4-way performance, IBM xSeries 366 with X3
architecture and 3.66MHz Xeons is bested only by a P5 entry from IBM
and an Itanium entry from HP:

http://www.tpc.org/tpcc/results/tpccadvanced3.asp

Filter for 4-way systems and sort on tpmc.

HP's most competitive x86 entry is Opteron, with about 80% of the
performance of the xSeries 366. Whatever the debilities of the
off-chip memory controller and NetBurst, IBM seems to have overcome
them for this benchmark.

RM

Rob Stow
04-01-2005, 07:08 AM
Robert Myers wrote: Greetings, In the competition for 4-way performance, IBM xSeries 366 with X3 architecture and 3.66MHz Xeons is bested only by a P5 entry from IBM and an Itanium entry from HP: http://www.tpc.org/tpcc/results/tpccadvanced3.asp Filter for 4-way systems and sort on tpmc. HP's most competitive x86 entry is Opteron, with about 80% of the performance of the xSeries 366. Whatever the debilities of the off-chip memory controller and NetBurst, IBM seems to have overcome them for this benchmark.

More like 87%. And the cost/TPMC is twice as high for the IBM.

Also, if you look at the entries for MicroSoft SQL Server 2000
the results are much closer: 141504 vs 130623, or 92%. Makes
me wonder what the HP box could do if it was running IBM's DB2 -
especially since such a big fuss was made when IBM ported it to
Opteron long before Intel had an AMD64 chip.

George Macdonald
04-01-2005, 11:31 AM
On Fri, 01 Apr 2005 09:40:30 -0500, Robert Myers <rmyers1400@comcast.net>
wrote:
Greetings,In the competition for 4-way performance, IBM xSeries 366 with X3architecture and 3.66MHz Xeons is bested only by a P5 entry from IBMand an Itanium entry from HP:http://www.tpc.org/tpcc/results/tpccadvanced3.aspFilter for 4-way systems and sort on tpmc.HP's most competitive x86 entry is Opteron, with about 80% of theperformance of the xSeries 366. Whatever the debilities of theoff-chip memory controller and NetBurst, IBM seems to have overcomethem for this benchmark.

I figured Hurricane might be something special.:-) What is Dell to do
now?... beg for a umm, Hurricane?:-) BTW Sun seems to be notably missing
from those charts apart from some somewhat outdated systems. It'd be
interesting to see where the V40z fits in.

--
Rgds, George Macdonald

Rick Jones
04-01-2005, 01:51 PM
In comp.sys.intel George Macdonald <fammacd=!SPAM^nothanks@tellurian.com> wrote: I figured Hurricane might be something special.:-) What is Dell to do now?... beg for a umm, Hurricane?:-) BTW Sun seems to be notably missing from those charts apart from some somewhat outdated systems. It'd be interesting to see where the V40z fits in.

I seem to recall Sun going on record (pre-Opteron days) about no
longer being terribly impressed with the TPC-C benchmark and stating
that as the reason they were no longer publishing TPC-C results.
There may be some stuff on that topic on their website. I suspect we
would not see TPC-C results for a V40z unless Sun had a change of
heart.

rick jones
--
Wisdom Teeth are impacted, people are affected by the effects of events.
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to post, OR email to raj in cup.hp.com but NOT BOTH...

Robert Myers
04-01-2005, 01:54 PM
On Fri, 01 Apr 2005 14:31:34 -0500, George Macdonald
<fammacd=!SPAM^nothanks@tellurian.com> wrote:
On Fri, 01 Apr 2005 09:40:30 -0500, Robert Myers <rmyers1400@comcast.net>wrote:Greetings,In the competition for 4-way performance, IBM xSeries 366 with X3architecture and 3.66MHz Xeons is bested only by a P5 entry from IBMand an Itanium entry from HP:http://www.tpc.org/tpcc/results/tpccadvanced3.aspFilter for 4-way systems and sort on tpmc.HP's most competitive x86 entry is Opteron, with about 80% of theperformance of the xSeries 366. Whatever the debilities of theoff-chip memory controller and NetBurst, IBM seems to have overcomethem for this benchmark.I figured Hurricane might be something special.:-) What is Dell to donow?... beg for a umm, Hurricane?:-)

As well as I understand the importance of the chipset, I'm kind of
amazed IBM managed to make so much out of it. I'm sure that somebody
at Intel is paying attention.

RM

Rob Stow
04-01-2005, 04:23 PM
Robert Myers wrote: On Fri, 01 Apr 2005 14:31:34 -0500, George Macdonald <fammacd=!SPAM^nothanks@tellurian.com> wrote:On Fri, 01 Apr 2005 09:40:30 -0500, Robert Myers <rmyers1400@comcast.net>wrote:Greetings,In the competition for 4-way performance, IBM xSeries 366 with X3architecture and 3.66MHz Xeons is bested only by a P5 entry from IBMand an Itanium entry from HP:http://www.tpc.org/tpcc/results/tpccadvanced3.aspFilter for 4-way systems and sort on tpmc.HP's most competitive x86 entry is Opteron, with about 80% of theperformance of the xSeries 366. Whatever the debilities of theoff-chip memory controller and NetBurst, IBM seems to have overcomethem for this benchmark.I figured Hurricane might be something special.:-) What is Dell to donow?... beg for a umm, Hurricane?:-) As well as I understand the importance of the chipset, I'm kind of amazed IBM managed to make so much out of it. I'm sure that somebody at Intel is paying attention.

I was wondering about that myself. Until Hurricane, Xeon
couldn't come close to competing with Opteron, and now it looks
like it can actually outperform Opteron. Xeon without Hurricane
is about to become a lot harder to sell, so the prospect of IBM
not sharing Hurricane has probably caused a lot of anxiety at
Intel.

I am looking forward to seeing reviews that show what Hurricane
does for Xeon in other benchmarks.

Yousuf Khan
04-01-2005, 08:52 PM
Rick Jones wrote: I seem to recall Sun going on record (pre-Opteron days) about no longer being terribly impressed with the TPC-C benchmark and stating that as the reason they were no longer publishing TPC-C results. There may be some stuff on that topic on their website. I suspect we would not see TPC-C results for a V40z unless Sun had a change of heart.

Well that was because those Ultrasparcs couldn't compete against anybody
either on absolute perf or price/perf. Nowadays, they have a compelling
price/perf story, so you'll likely see them publish again.

Yousuf Khan

Robert Myers
04-02-2005, 03:31 AM
On Sat, 02 Apr 2005 00:23:34 GMT, Rob Stow <rob.stow.nospam@shaw.ca>
wrote:
Robert Myers wrote: On Fri, 01 Apr 2005 14:31:34 -0500, George Macdonald <fammacd=!SPAM^nothanks@tellurian.com> wrote:On Fri, 01 Apr 2005 09:40:30 -0500, Robert Myers <rmyers1400@comcast.net>wrote:>Greetings,>>In the competition for 4-way performance, IBM xSeries 366 with X3>architecture and 3.66MHz Xeons is bested only by a P5 entry from IBM>and an Itanium entry from HP:>>http://www.tpc.org/tpcc/results/tpccadvanced3.asp>>Filter for 4-way systems and sort on tpmc.>>HP's most competitive x86 entry is Opteron, with about 80% of the>performance of the xSeries 366. Whatever the debilities of the>off-chip memory controller and NetBurst, IBM seems to have overcome>them for this benchmark.I figured Hurricane might be something special.:-) What is Dell to donow?... beg for a umm, Hurricane?:-) As well as I understand the importance of the chipset, I'm kind of amazed IBM managed to make so much out of it. I'm sure that somebody at Intel is paying attention.I was wondering about that myself. Until Hurricane, Xeoncouldn't come close to competing with Opteron, and now it lookslike it can actually outperform Opteron. Xeon without Hurricaneis about to become a lot harder to sell, so the prospect of IBMnot sharing Hurricane has probably caused a lot of anxiety atIntel.
Intel has become a mystery. One assumes that Intel understands both
the market and the engineering. The surprise for Intel (probably) is
the need to make Xeon competitive in the 4P 64-bit space, a slot it
had reserved for Itanium. Now Intel has to scramble--or maybe they've
already got an answer well along.

There is an interview with an IBM project manager at

http://www.techworld.com/opsys/features/index.cfm?FeatureID=1204

<quote>

At design time, there was a maniacal focus on latency reduction. When
you can cut the time it takes it gets from one point to the next you
can increase performance, so chipset latency has been cut by two and
half times -- down from 265 nanoseconds to 108 nanoseconds.

The way we do that is through snoop bus filtering. It looks across to
the other bus -- because the system uses two buses, two per CPU -- and
the snoop filter does intelligent caching. It can see what's in the
other cache without having to send traffic across the FSB to find out.
Other chipsets cannot do that and need the L3. And if you don't need
L3 cache you shouldn't have to pay the premium to buy it.

</quote>

There's lots of mystification about mainframe features, but that seems
to be the crux of the matter as far as performance is concerned.
Surely Intel can figure it out, one would think.

But the mystery of Intel remains. Whatever are they thinking? I
think they believe they are still going to pull Itanium out of the
fire, that's what I think they're thinking, and I'd still hesitate to
bet against their doing it--technically, at least.

RM

George Macdonald
04-03-2005, 07:12 AM
On Fri, 01 Apr 2005 16:54:37 -0500, Robert Myers <rmyers1400@comcast.net>
wrote:
On Fri, 01 Apr 2005 14:31:34 -0500, George Macdonald<fammacd=!SPAM^nothanks@tellurian.com> wrote:On Fri, 01 Apr 2005 09:40:30 -0500, Robert Myers <rmyers1400@comcast.net>wrote:Greetings,In the competition for 4-way performance, IBM xSeries 366 with X3architecture and 3.66MHz Xeons is bested only by a P5 entry from IBMand an Itanium entry from HP:http://www.tpc.org/tpcc/results/tpccadvanced3.aspFilter for 4-way systems and sort on tpmc.HP's most competitive x86 entry is Opteron, with about 80% of theperformance of the xSeries 366. Whatever the debilities of theoff-chip memory controller and NetBurst, IBM seems to have overcomethem for this benchmark.I figured Hurricane might be something special.:-) What is Dell to donow?... beg for a umm, Hurricane?:-)As well as I understand the importance of the chipset, I'm kind ofamazed IBM managed to make so much out of it. I'm sure that somebodyat Intel is paying attention.

You mean they must be thinking: "hmmm, dual FSB.... why didn't we think of
that"?:-) It seems like an insanely simple idea but with an entry cost in
development which maybe only IBM could contemplate. It'll be interesting
to see how long they keep it in-house but it looks, at first glance, like
it blows everything else out of the water... in the Intel server sphere
anyway. I figure Mikey's probably taken the week-end off on this one.:-)

--
Rgds, George Macdonald

Robert Myers
04-03-2005, 01:11 PM
On Sun, 03 Apr 2005 11:12:10 -0400, George Macdonald
<fammacd=!SPAM^nothanks@tellurian.com> wrote:
On Fri, 01 Apr 2005 16:54:37 -0500, Robert Myers <rmyers1400@comcast.net>wrote:On Fri, 01 Apr 2005 14:31:34 -0500, George Macdonald<fammacd=!SPAM^nothanks@tellurian.com> wrote:On Fri, 01 Apr 2005 09:40:30 -0500, Robert Myers <rmyers1400@comcast.net>wrote:>Greetings,>>In the competition for 4-way performance, IBM xSeries 366 with X3>architecture and 3.66MHz Xeons is bested only by a P5 entry from IBM>and an Itanium entry from HP:>>http://www.tpc.org/tpcc/results/tpccadvanced3.asp>>Filter for 4-way systems and sort on tpmc.>>HP's most competitive x86 entry is Opteron, with about 80% of the>performance of the xSeries 366. Whatever the debilities of the>off-chip memory controller and NetBurst, IBM seems to have overcome>them for this benchmark.I figured Hurricane might be something special.:-) What is Dell to donow?... beg for a umm, Hurricane?:-)As well as I understand the importance of the chipset, I'm kind ofamazed IBM managed to make so much out of it. I'm sure that somebodyat Intel is paying attention.You mean they must be thinking: "hmmm, dual FSB.... why didn't we think ofthat"?:-) It seems like an insanely simple idea but with an entry cost indevelopment which maybe only IBM could contemplate. It'll be interestingto see how long they keep it in-house but it looks, at first glance, likeit blows everything else out of the water... in the Intel server sphereanyway. I figure Mikey's probably taken the week-end off on this one.:-)

I'm really confused about where the dual FSB's are, and I find the
links out there practically unreadable because of the factor of two
problem. Dual (Intel) something or others are going to have dual
FSB's, and one link says X3 takes advantage of the dual FSB of Potomoc
and Cranford. Are the dual FSB's for dual core chips? One link says
the dual FSB makes the Intel FSB competitive with hypertransport.
Dual FSB for a single core chip?

Intel apparently has its own chipset on the way to accommodate its own
dual FSB's, so IBM is apparently just a little bit ahead of the curve
(dual FSB is, after all, bandwidth, and Keith here, who's being
suspiciously silent during this exchange, only wants to talk about
latency as did the IBM project manager previously mentioned elsewhere
in this thread). That is to say, the magic of Hurricane is apparently
latency and not bandwidth, although if you've got a frontside bus
jammed with multiple memory requests, I suppose the two might be
indistinguishable.

I think it's pretty apparent I'm confused, between dual cores, dual
FSB's, factors of two, and all the codenames. What's more, it seems
pretty clear that NetBurst isn't dead. Anybody commented on _that_?

RM

Tony Hill
04-03-2005, 06:25 PM
On Sat, 02 Apr 2005 00:23:34 GMT, Rob Stow <rob.stow.nospam@shaw.ca>
wrote:
Robert Myers wrote:I figured Hurricane might be something special.:-) What is Dell to donow?... beg for a umm, Hurricane?:-) As well as I understand the importance of the chipset, I'm kind of amazed IBM managed to make so much out of it. I'm sure that somebody at Intel is paying attention.I was wondering about that myself. Until Hurricane, Xeoncouldn't come close to competing with Opteron, and now it lookslike it can actually outperform Opteron. Xeon without Hurricaneis about to become a lot harder to sell, so the prospect of IBMnot sharing Hurricane has probably caused a lot of anxiety atIntel.

Keep in mind that there's a LOT more at play here than just a new
chipset. They are also using new processors with a 33% faster bus
speed than previous XeonMP chips which probably plays a LARGE part in
things. Toss in 64-bit support and I think that a LOT of people are
putting WAY too much emphasis on the chipset itself.
I am looking forward to seeing reviews that show what Hurricanedoes for Xeon in other benchmarks.

Let's also wait to see how HP's new DL580G3 and Dell's PowerEdge 6850
servers compare. Both of these servers make use of the new XeonMP
processors but pair it with Intel's i8500 chipset.

Sure, the Hurricane chipset does add a few extra features, but I'm not
convinced that it's in any way the one and only reason why the
performance of these new systems has improved so much.

-------------
Tony Hill
hilla <underscore> 20 <at> yahoo <dot> ca

George Macdonald
04-03-2005, 10:55 PM
On Sun, 03 Apr 2005 17:11:48 -0400, Robert Myers <rmyers1400@comcast.net>
wrote:
On Sun, 03 Apr 2005 11:12:10 -0400, George Macdonald<fammacd=!SPAM^nothanks@tellurian.com> wrote:On Fri, 01 Apr 2005 16:54:37 -0500, Robert Myers <rmyers1400@comcast.net>wrote:On Fri, 01 Apr 2005 14:31:34 -0500, George Macdonald<fammacd=!SPAM^nothanks@tellurian.com> wrote:>On Fri, 01 Apr 2005 09:40:30 -0500, Robert Myers <rmyers1400@comcast.net>>wrote:>>>Greetings,>>>>In the competition for 4-way performance, IBM xSeries 366 with X3>>architecture and 3.66MHz Xeons is bested only by a P5 entry from IBM>>and an Itanium entry from HP:>>>>http://www.tpc.org/tpcc/results/tpccadvanced3.asp>>>>Filter for 4-way systems and sort on tpmc.>>>>HP's most competitive x86 entry is Opteron, with about 80% of the>>performance of the xSeries 366. Whatever the debilities of the>>off-chip memory controller and NetBurst, IBM seems to have overcome>>them for this benchmark.>>I figured Hurricane might be something special.:-) What is Dell to do>now?... beg for a umm, Hurricane?:-)As well as I understand the importance of the chipset, I'm kind ofamazed IBM managed to make so much out of it. I'm sure that somebodyat Intel is paying attention.You mean they must be thinking: "hmmm, dual FSB.... why didn't we think ofthat"?:-) It seems like an insanely simple idea but with an entry cost indevelopment which maybe only IBM could contemplate. It'll be interestingto see how long they keep it in-house but it looks, at first glance, likeit blows everything else out of the water... in the Intel server sphereanyway. I figure Mikey's probably taken the week-end off on this one.:-)I'm really confused about where the dual FSB's are, and I find thelinks out there practically unreadable because of the factor of twoproblem. Dual (Intel) something or others are going to have dualFSB's, and one link says X3 takes advantage of the dual FSB of Potomocand Cranford. Are the dual FSB's for dual core chips? One link saysthe dual FSB makes the Intel FSB competitive with hypertransport.Dual FSB for a single core chip?

My *impression* is that instead of having say, two CPUs on a single FSB, as
would be normal with an Intel system, each sits on its own FSB and the
chipset handles the interface, arbitration, snooping etc. between the two.
Beyond that we'll just have to wait for the details I suppose.
Intel apparently has its own chipset on the way to accommodate its owndual FSB's, so IBM is apparently just a little bit ahead of the curve(dual FSB is, after all, bandwidth, and Keith here, who's beingsuspiciously silent during this exchange, only wants to talk aboutlatency as did the IBM project manager previously mentioned elsewherein this thread). That is to say, the magic of Hurricane is apparentlylatency and not bandwidth, although if you've got a frontside busjammed with multiple memory requests, I suppose the two might beindistinguishable.I think it's pretty apparent I'm confused, between dual cores, dualFSB's, factors of two, and all the codenames. What's more, it seemspretty clear that NetBurst isn't dead. Anybody commented on _that_?

You mean Netburst really does exist?:-)

--
Rgds, George Macdonald

Robert Myers
04-04-2005, 12:42 AM
On Sun, 03 Apr 2005 22:25:18 -0400, Tony Hill
<hilla_nospam_20@yahoo.ca> wrote:
On Sat, 02 Apr 2005 00:23:34 GMT, Rob Stow <rob.stow.nospam@shaw.ca>wrote:Robert Myers wrote:>I figured Hurricane might be something special.:-) What is Dell to do>now?... beg for a umm, Hurricane?:-) As well as I understand the importance of the chipset, I'm kind of amazed IBM managed to make so much out of it. I'm sure that somebody at Intel is paying attention.I was wondering about that myself. Until Hurricane, Xeoncouldn't come close to competing with Opteron, and now it lookslike it can actually outperform Opteron. Xeon without Hurricaneis about to become a lot harder to sell, so the prospect of IBMnot sharing Hurricane has probably caused a lot of anxiety atIntel.Keep in mind that there's a LOT more at play here than just a newchipset. They are also using new processors with a 33% faster busspeed than previous XeonMP chips which probably plays a LARGE part inthings. Toss in 64-bit support and I think that a LOT of people areputting WAY too much emphasis on the chipset itself.
The claimed reductions in latency, "from 265 nanoseconds to 108
nanoseconds," are hard to argue away. If that's an accurate measure
of relative latency and not a hyped-up marketing claim, it will have a
big impact.
I am looking forward to seeing reviews that show what Hurricanedoes for Xeon in other benchmarks.Let's also wait to see how HP's new DL580G3 and Dell's PowerEdge 6850servers compare. Both of these servers make use of the new XeonMPprocessors but pair it with Intel's i8500 chipset.Sure, the Hurricane chipset does add a few extra features, but I'm notconvinced that it's in any way the one and only reason why theperformance of these new systems has improved so much.
"One and only?" Not likely. Most significant? From looking at the
effects of latency on tpc-c in other situations, I'd bet it is. To
see what Intel has as a counter we will, indeed, have to wait and see.

RM

Robert Myers
04-04-2005, 12:51 AM
On Mon, 04 Apr 2005 02:55:53 -0400, George Macdonald
<fammacd=!SPAM^nothanks@tellurian.com> wrote:
On Sun, 03 Apr 2005 17:11:48 -0400, Robert Myers <rmyers1400@comcast.net>wrote:On Sun, 03 Apr 2005 11:12:10 -0400, George Macdonald<fammacd=!SPAM^nothanks@tellurian.com> wrote:On Fri, 01 Apr 2005 16:54:37 -0500, Robert Myers <rmyers1400@comcast.net>wrote:>On Fri, 01 Apr 2005 14:31:34 -0500, George Macdonald><fammacd=!SPAM^nothanks@tellurian.com> wrote:>
>>>>I figured Hurricane might be something special.:-) What is Dell to do>>now?... beg for a umm, Hurricane?:-)>>As well as I understand the importance of the chipset, I'm kind of>amazed IBM managed to make so much out of it. I'm sure that somebody>at Intel is paying attention.You mean they must be thinking: "hmmm, dual FSB.... why didn't we think ofthat"?:-) It seems like an insanely simple idea but with an entry cost indevelopment which maybe only IBM could contemplate. It'll be interestingto see how long they keep it in-house but it looks, at first glance, likeit blows everything else out of the water... in the Intel server sphereanyway. I figure Mikey's probably taken the week-end off on this one.:-)I'm really confused about where the dual FSB's are, and I find thelinks out there practically unreadable because of the factor of twoproblem. Dual (Intel) something or others are going to have dualFSB's, and one link says X3 takes advantage of the dual FSB of Potomocand Cranford. Are the dual FSB's for dual core chips? One link saysthe dual FSB makes the Intel FSB competitive with hypertransport.Dual FSB for a single core chip?My *impression* is that instead of having say, two CPUs on a single FSB, aswould be normal with an Intel system, each sits on its own FSB and thechipset handles the interface, arbitration, snooping etc. between the two.Beyond that we'll just have to wait for the details I suppose.
That makes sense.

<snip>
I think it's pretty apparent I'm confused, between dual cores, dualFSB's, factors of two, and all the codenames. What's more, it seemspretty clear that NetBurst isn't dead. Anybody commented on _that_?You mean Netburst really does exist?:-)

Well, apparently, and (although I've given up on code names--there are
just too many right now), there is a possibility it will survive yet
another cycle of reincarnation to 65nm.

RM

Felger Carbon
04-04-2005, 06:33 AM
"Robert Myers" <rmyers1400@comcast.net> wrote in message
news:bpv151pncm4anval2etvta8pc7ks59e5k8@4ax.com... On Mon, 04 Apr 2005 02:55:53 -0400, George Macdonald <fammacd=!SPAM^nothanks@tellurian.com> wrote:You mean Netburst really does exist?:-) Well, apparently, and (although I've given up on code names--there
are just too many right now), there is a possibility it will survive yet another cycle of reincarnation to 65nm.

Robert, Geo. was making a funny. Netburst is the P4
microarchitecture.

Robert Myers
04-04-2005, 10:36 AM
On Mon, 04 Apr 2005 14:33:47 GMT, "Felger Carbon" <fmsfnf@jfoops.net>
wrote:
"Robert Myers" <rmyers1400@comcast.net> wrote in messagenews:bpv151pncm4anval2etvta8pc7ks59e5k8@4ax.com... On Mon, 04 Apr 2005 02:55:53 -0400, George Macdonald <fammacd=!SPAM^nothanks@tellurian.com> wrote:You mean Netburst really does exist?:-) Well, apparently, and (although I've given up on code names--thereare just too many right now), there is a possibility it will survive yet another cycle of reincarnation to 65nm.Robert, Geo. was making a funny. Netburst is the P4microarchitecture.
Well, yes, I do understand that. I've lost track of the code names,
but when Intel simultaneously announced the cancellation of a Netburst
project and the start of more than one multiple core Pentium-M
projects, I assumed, and I don't think I was alone, that NetBurst, or
the Pentium 4 architecture, however you wish to refer to it, was at
the end of its life. Apparently not.

RM

daytripper
04-04-2005, 02:20 PM
On Mon, 04 Apr 2005 14:36:15 -0400, Robert Myers <rmyers1400@comcast.net>
wrote:
On Mon, 04 Apr 2005 14:33:47 GMT, "Felger Carbon" <fmsfnf@jfoops.net>wrote:"Robert Myers" <rmyers1400@comcast.net> wrote in messagenews:bpv151pncm4anval2etvta8pc7ks59e5k8@4ax.com... On Mon, 04 Apr 2005 02:55:53 -0400, George Macdonald <fammacd=!SPAM^nothanks@tellurian.com> wrote: >You mean Netburst really does exist?:-) Well, apparently, and (although I've given up on code names--thereare just too many right now), there is a possibility it will survive yet another cycle of reincarnation to 65nm.Robert, Geo. was making a funny. Netburst is the P4microarchitecture.Well, yes, I do understand that. I've lost track of the code names,but when Intel simultaneously announced the cancellation of a Netburstproject and the start of more than one multiple core Pentium-Mprojects, I assumed, and I don't think I was alone, that NetBurst, orthe Pentium 4 architecture, however you wish to refer to it, was atthe end of its life. Apparently not.RM

Hold on to your original thought. It's not wrong.

/daytripper (What's old is new again...)

Robert Myers
04-04-2005, 03:25 PM
On Mon, 04 Apr 2005 18:20:30 -0400, daytripper
<day_trippr@REMOVEyahoo.com> wrote:
On Mon, 04 Apr 2005 14:36:15 -0400, Robert Myers <rmyers1400@comcast.net>wrote:On Mon, 04 Apr 2005 14:33:47 GMT, "Felger Carbon" <fmsfnf@jfoops.net>wrote:"Robert Myers" <rmyers1400@comcast.net> wrote in messagenews:bpv151pncm4anval2etvta8pc7ks59e5k8@4ax.com...> On Mon, 04 Apr 2005 02:55:53 -0400, George Macdonald> <fammacd=!SPAM^nothanks@tellurian.com> wrote:>> >You mean Netburst really does exist?:-)>> Well, apparently, and (although I've given up on code names--thereare> just too many right now), there is a possibility it will survive yet> another cycle of reincarnation to 65nm.Robert, Geo. was making a funny. Netburst is the P4microarchitecture.Well, yes, I do understand that. I've lost track of the code names,but when Intel simultaneously announced the cancellation of a Netburstproject and the start of more than one multiple core Pentium-Mprojects, I assumed, and I don't think I was alone, that NetBurst, orthe Pentium 4 architecture, however you wish to refer to it, was atthe end of its life. Apparently not.Hold on to your original thought. It's not wrong./daytripper (What's old is new again...)

But, if I'm not mistaken, Presler is a NetBurst core and it will
appear at 65 nm. That's one more scale shrink than I thought NetBurst
had to live.

RM

daytripper
04-04-2005, 04:22 PM
On Mon, 04 Apr 2005 19:25:07 -0400, Robert Myers <rmyers1400@comcast.net>
wrote:
On Mon, 04 Apr 2005 18:20:30 -0400, daytripper<day_trippr@REMOVEyahoo.com> wrote:On Mon, 04 Apr 2005 14:36:15 -0400, Robert Myers <rmyers1400@comcast.net>wrote:Well, yes, I do understand that. I've lost track of the code names,but when Intel simultaneously announced the cancellation of a Netburstproject and the start of more than one multiple core Pentium-Mprojects, I assumed, and I don't think I was alone, that NetBurst, orthe Pentium 4 architecture, however you wish to refer to it, was atthe end of its life. Apparently not.Hold on to your original thought. It's not wrong./daytripper (What's old is new again...)But, if I'm not mistaken, Presler is a NetBurst core and it willappear at 65 nm. That's one more scale shrink than I thought NetBursthad to live.

The roadmaps are loaded with sacrificial elements.
P4 is a Dead Chip Walking...

/daytripper

Tony Hill
04-04-2005, 06:26 PM
On Mon, 04 Apr 2005 02:55:53 -0400, George Macdonald
<fammacd=!SPAM^nothanks@tellurian.com> wrote:
On Sun, 03 Apr 2005 17:11:48 -0400, Robert Myers <rmyers1400@comcast.net>wrote:I'm really confused about where the dual FSB's are, and I find thelinks out there practically unreadable because of the factor of twoproblem. Dual (Intel) something or others are going to have dualFSB's, and one link says X3 takes advantage of the dual FSB of Potomocand Cranford. Are the dual FSB's for dual core chips? One link saysthe dual FSB makes the Intel FSB competitive with hypertransport.Dual FSB for a single core chip?My *impression* is that instead of having say, two CPUs on a single FSB, aswould be normal with an Intel system, each sits on its own FSB and thechipset handles the interface, arbitration, snooping etc. between the two.Beyond that we'll just have to wait for the details I suppose.

More to the point here, I believe that in the specific case of the
Hurricane chipset it's actually 2 CPUs per bus instead of 4 CPUs as
would normally be the case. This should work fine for the current
XeonMP chips as the Xeon seems to scale ok at this rate. Going by
most benches the 2P Xeon didn't seem to suffer much due to lack of
bandwidth, but 4P Xeon systems got clobbered for this reason.

Of course, the problem here is that dual-core chips should just bring
this bottleneck right back again, since then the "dual FSB" Hurricane
will be back to one bus per 4 cores.

-------------
Tony Hill
hilla <underscore> 20 <at> yahoo <dot> ca

Tony Hill
04-04-2005, 06:26 PM
On Mon, 04 Apr 2005 04:42:17 -0400, Robert Myers
<rmyers1400@comcast.net> wrote:
On Sun, 03 Apr 2005 22:25:18 -0400, Tony Hill<hilla_nospam_20@yahoo.ca> wrote:Keep in mind that there's a LOT more at play here than just a newchipset. They are also using new processors with a 33% faster busspeed than previous XeonMP chips which probably plays a LARGE part inthings. Toss in 64-bit support and I think that a LOT of people areputting WAY too much emphasis on the chipset itself.The claimed reductions in latency, "from 265 nanoseconds to 108nanoseconds," are hard to argue away. If that's an accurate measureof relative latency and not a hyped-up marketing claim, it will have abig impact.

If it's remotely accurate, then yes. Unfortunately they really didn't
provide any context for this. Is this in comparison to the previous
IBM chipset? And is this just straight latency to memory for a single
chip on a single access or some sort of average? If it's just
straight latency than the original 265ns number was pretty weak to
begin with, Intel's latest desktop chipsets are down under 100ns and
their servers should be somewhere around 130-150ns (though I haven't
seen many tests for the latter).
I am looking forward to seeing reviews that show what Hurricanedoes for Xeon in other benchmarks.Let's also wait to see how HP's new DL580G3 and Dell's PowerEdge 6850servers compare. Both of these servers make use of the new XeonMPprocessors but pair it with Intel's i8500 chipset.Sure, the Hurricane chipset does add a few extra features, but I'm notconvinced that it's in any way the one and only reason why theperformance of these new systems has improved so much."One and only?" Not likely. Most significant? From looking at theeffects of latency on tpc-c in other situations, I'd bet it is. Tosee what Intel has as a counter we will, indeed, have to wait and see.

64-bit support should offer about a 10% improvement all on it's own.
Combine that with a 20% increase in clock speed and a 66% faster
system bus... Also, if I understand the whole "dual bus" idea
properly (ie 2 buses with 2 CPUs connected to each one in a 4P system
vs. 4 processors on a single bus in a traditional Xeon system) I think
this could make up for a lot of the difference as well. This is
exactly how Intel's new E8500 chipset's "dual bus" design operates as
well.

A nice chipset to be sure, but I think people are singing the praises
a bit too much and too soon. My guess is that it's only going to end
up being no more than 5% faster than Intel's new E8500 chipset. Sure,
it's a hell of a lot faster than the previous generation of
chipset/processor combination, but it's how this compares to current
chipset/processor combos that matters. IBM just happened to be first
out the door with benchmarks this time around, but I expect others to
follow suit soon enough.

-------------
Tony Hill
hilla <underscore> 20 <at> yahoo <dot> ca

Robert Myers
04-05-2005, 03:24 AM
On Mon, 04 Apr 2005 22:26:05 -0400, Tony Hill
<hilla_nospam_20@yahoo.ca> wrote:
On Mon, 04 Apr 2005 04:42:17 -0400, Robert Myers<rmyers1400@comcast.net> wrote:On Sun, 03 Apr 2005 22:25:18 -0400, Tony Hill<hilla_nospam_20@yahoo.ca> wrote:Keep in mind that there's a LOT more at play here than just a newchipset. They are also using new processors with a 33% faster busspeed than previous XeonMP chips which probably plays a LARGE part inthings. Toss in 64-bit support and I think that a LOT of people areputting WAY too much emphasis on the chipset itself.The claimed reductions in latency, "from 265 nanoseconds to 108nanoseconds," are hard to argue away. If that's an accurate measureof relative latency and not a hyped-up marketing claim, it will have abig impact.If it's remotely accurate, then yes. Unfortunately they really didn'tprovide any context for this. Is this in comparison to the previousIBM chipset? And is this just straight latency to memory for a singlechip on a single access or some sort of average? If it's juststraight latency than the original 265ns number was pretty weak tobegin with, Intel's latest desktop chipsets are down under 100ns andtheir servers should be somewhere around 130-150ns (though I haven'tseen many tests for the latter).
Since the quote didn't provide nearly enough information to interpret
the latency claims as absolute numbers, I was careful to characterize
it as a measure of relative latency. I'm assuming that IBM would have
the integrity to do an apples-to-apples comparison with their own
hardware, no matter what the absolute numbers may mean. Were the
project manager in marketing, he might have been shrewd enough to say
that they shaved over a hundred nanoseconds off the chipset latency.
I'd be reluctant to say that IBM server chipsets had high latency for
server chipsets based on that soundbyte. You can do as you please,
but see another comparison to a previous Summit generation below.
>I am looking forward to seeing reviews that show what Hurricane>does for Xeon in other benchmarks.Let's also wait to see how HP's new DL580G3 and Dell's PowerEdge 6850servers compare. Both of these servers make use of the new XeonMPprocessors but pair it with Intel's i8500 chipset.Sure, the Hurricane chipset does add a few extra features, but I'm notconvinced that it's in any way the one and only reason why theperformance of these new systems has improved so much."One and only?" Not likely. Most significant? From looking at theeffects of latency on tpc-c in other situations, I'd bet it is. Tosee what Intel has as a counter we will, indeed, have to wait and see.64-bit support should offer about a 10% improvement all on it's own.Combine that with a 20% increase in clock speed and a 66% fastersystem bus... Also, if I understand the whole "dual bus" ideaproperly (ie 2 buses with 2 CPUs connected to each one in a 4P systemvs. 4 processors on a single bus in a traditional Xeon system) I thinkthis could make up for a lot of the difference as well. This isexactly how Intel's new E8500 chipset's "dual bus" design operates aswell.
I'd be surprised to learn that server applications are driving
frontside bus bandwidth requirements. One of the reasons you can get
away with hanging so much hardware off a frontside bus in server
applications is that server CPU's spend so much of their time stalled
for memory--a latency, not a bandwidth, problem. Predictable,
computationally-intensive calculations are typically the most
demanding of bandwidth.
A nice chipset to be sure, but I think people are singing the praisesa bit too much and too soon. My guess is that it's only going to endup being no more than 5% faster than Intel's new E8500 chipset. Sure,it's a hell of a lot faster than the previous generation ofchipset/processor combination, but it's how this compares to currentchipset/processor combos that matters. IBM just happened to be firstout the door with benchmarks this time around, but I expect others tofollow suit soon enough.
IBM may have taken a lesson from HP:

http://www.lostcircuits.com/tradeshow/idf_2002/4.shtml

<quote>

The server market with its higher longevity of equipment was hurt even
worse than the desktop market, in addition, the platforms available
for the IPF were designed for future scalability and expandability and
somewhat missed the current economic requirements. Examples are the
i870 and the IBM EXA (Summit) chipsets geared towards the very
high-end and comparable with 80,000 lbs trucks. To drop a bomb into
this scenario, Hewlett Packard showcased their zx1, comparable with a
high performance street bike to outrun the competition before they
even know what hit them.

The concept is fairly simple. Take the IPF 64 bit architecture, pare
it free of all excessive fat and provide a platform suitable for both
IA64 as well as for the IPF-compatible PA-RISC processor line.
Features trimmed off comprise the 32 MB L4 cache (IBM EXA), Memory
Mirroring to ensure hot-swapping of DIMMs and x-way scalability. The
result is an up to 4-way scalable platform with enhanced ECC or rather
memory protection to allow Chip Kill. Heart of the chipset is the zx1
Memory & I/O controller featuring eight I/O links to PCI and PCI-X as
well as AGP-4X (to be upgraded to AGP-8X). On the other side, the zx1
controller offers links to no less than 12 memory expander chips
capable of handling up to 64 DIMMs for 128 GB of system memory. Memory
bandwidth scales from 8.5 GB/s in direct-attached designs (without the
optional expanders) to 12.8 GB/s using the expander chips that further
act like registers to decrease the signal load on the memory bus.

This is, however, not the key advantage of the zx1. Because of the
high complexity and scalability, the i870 and EAX chipset are
relatively slow. That is, in addition to the 32 ns latency intrinsic
to McKinley for each memory access, the arbitration within the complex
maze of superscalable interconnections cause another roughly 270 ns
latency until the requested data get back to the processor, so we are
talking about a total of 300 ns access time for a memory request. The
zx1 on the other hand manages to do the quarter mile in 11.2 seconds,
er, make that 112 ns for the memory access latency which is almost 3
times as fast (in direct-attached configurations). Adding the expander
chips costs another 25 ns but compensates with higher bandwidth and
the zx1 is still about twice as fast as the competition.

</quote>

That may also help to put the stated latencies into some perspective
(previous generation Summit compared to zx1 in almost the same way).
Notice the disappearance of the L4 cache (and X3 does away with L3, as
well). A three-year program from IBM? The timing is just about
right.

Intel can design a chip that will come close in performance? I'm sure
they can. Will they? Intel's track record on chipsets has been
spotty (to be charitable, at that).

The only real problem left in computation is getting the data where
you want it when you need it. The parts that do the computing are
almost afterthoughts compared to the machinery dedicated to getting
instructions and data to arrive on time and coping with what happens
when they don't. It's about time the memory subsystem got more
attention, and I hope this isn't the end of it.

.... Of course, you could rid yourself of most of these problems
entirely by changing the whole computing paradigm, but that's for
another thread.

RM

Robert Myers
04-05-2005, 03:35 AM
On Mon, 04 Apr 2005 20:22:06 -0400, daytripper
<day_trippr@REMOVEyahoo.com> wrote:
On Mon, 04 Apr 2005 19:25:07 -0400, Robert Myers <rmyers1400@comcast.net>wrote:On Mon, 04 Apr 2005 18:20:30 -0400, daytripper<day_trippr@REMOVEyahoo.com> wrote:On Mon, 04 Apr 2005 14:36:15 -0400, Robert Myers <rmyers1400@comcast.net>wrote:>Well, yes, I do understand that. I've lost track of the code names,>but when Intel simultaneously announced the cancellation of a Netburst>project and the start of more than one multiple core Pentium-M>projects, I assumed, and I don't think I was alone, that NetBurst, or>the Pentium 4 architecture, however you wish to refer to it, was at>the end of its life. Apparently not.>Hold on to your original thought. It's not wrong./daytripper (What's old is new again...)But, if I'm not mistaken, Presler is a NetBurst core and it willappear at 65 nm. That's one more scale shrink than I thought NetBursthad to live.The roadmaps are loaded with sacrificial elements.P4 is a Dead Chip Walking...
Sacrificial, or butt-covering and misleading? Intel seems to have
made more out of Prescott than one might have imagined, given how
disappointing the first results were and how quickly they backed away.
One might have imagined a world of Xeon firesales in a market being
swept by Opteron. Xeon has taken a licking, but it's still ticking.

If Intel really can scale Prescott to 65nm, that would be news, I
think, since it seems like they barely got it to work at 90nm. Or
maybe they've made significant progress or maybe they think they'll
make significant progress by the time the chips are to be released.

If they do throw NetBurst overboard, they'll have alot of explaining
to do, I think. "You remember that architecture you liked so much,
you know, the Pentium IIII, well we've listened to you and we've
decided that the right thing to do is to give the market what it
really wanted all along, only better." I'll bet Pentium M isn't a
superstar on SpecFP. Not for nothing are the guys in marketing at
Intel so important. The guys who tweak icc for the benchmarks
probably earn their money, too.

I wonder if Intel knows what it's really going to release.

RM

Robert Redelmeier
04-05-2005, 07:06 AM
In comp.sys.ibm.pc.hardware.chips Robert Myers <rmyers1400@comcast.net> wrote: There is an interview with an IBM project manager at http://www.techworld.com/opsys/features/index.cfm?FeatureID=1204 <quote> At design time, there was a maniacal focus on latency reduction. When you can cut the time it takes it gets from one point to the next you can increase performance, so chipset latency has been cut by two and half times -- down from 265 nanoseconds to 108 nanoseconds.

This is extremely important. Latency improvement have lagged
horribly (early PCs latency was less than 1 byte bandwidth fetch,
current machines are waiting 300+ bytes -- 256byte cachelines, anyone? :)

Latency elements have been consuming more calc time and are more
important for performance improvements. Especially TPMC wich AFAIK
is a relational database benchmark with linked-lists that boils
down to a massive pointer chasing exercise governed by latency.

I'd like to see how some of the AMD K8s with on-CPU memory
controllers do on TPMC.

-- Robert

George Macdonald
04-05-2005, 06:37 PM
On Tue, 05 Apr 2005 15:06:59 GMT, Robert Redelmeier
<redelm@ev1.net.invalid> wrote:
In comp.sys.ibm.pc.hardware.chips Robert Myers <rmyers1400@comcast.net> wrote: There is an interview with an IBM project manager at http://www.techworld.com/opsys/features/index.cfm?FeatureID=1204 <quote> At design time, there was a maniacal focus on latency reduction. When you can cut the time it takes it gets from one point to the next you can increase performance, so chipset latency has been cut by two and half times -- down from 265 nanoseconds to 108 nanoseconds.This is extremely important. Latency improvement have laggedhorribly (early PCs latency was less than 1 byte bandwidth fetch,current machines are waiting 300+ bytes -- 256byte cachelines, anyone? :)

Wasn't a part of Netburst the 128byte L2 cache line?:-) That is two
sectors of 64bytes of course.
Latency elements have been consuming more calc time and are moreimportant for performance improvements. Especially TPMC wich AFAIKis a relational database benchmark with linked-lists that boilsdown to a massive pointer chasing exercise governed by latency.I'd like to see how some of the AMD K8s with on-CPU memorycontrollers do on TPMC.

The fastest one listed here:
http://www.tpc.org/tpcc/results/tpcc_results.asp?print=false&orderby=tpm&sortby=desc
AFAICT is the HP ProLiant DL585/2.6GHz - does OK but nothing spectacular
and as already pointed out the Hurricane based IBM eServer xSeries 366
whacks it, though DB2 (vs. SQL Server) contributes a good amount (~half) of
the difference.

OTOH there are only HP and a couple of obsolete Racksaver Opteron systems
listed... and the above IBM system is close to $1M, so ~2.7 times the cost
of the HP Opteron system. As already noted, Sun is notably absent and IBM
apparently does not market Opteron into this market.

--
Rgds, George Macdonald

Robert Redelmeier
04-06-2005, 08:27 AM
In comp.sys.ibm.pc.hardware.chips George Macdonald <fammacd=!SPAM^nothanks@tellurian.com> wrote: The fastest one listed here: http://www.tpc.org/tpcc/results/tpcc_results.asp?print=false&orderby=tpm&sortby=desc AFAICT is the HP ProLiant DL585/2.6GHz - does OK but nothing

Yes, that is a 4-way Opteron. I fear that such a setup would
require a Northbridge and eliminate the single-thread latency
advantage of an on-CPU memory controller. Does anyone know?

I very much like SMP, but I think I like on-CPU memory
controllers even more. Maybe like Tony I should wait for
dual cores before I replace my aging BP6 (dual Celerons)

-- Robert

Rick Jones
04-06-2005, 09:01 AM
In comp.sys.intel Yousuf Khan <bbbl67@ezrs.com> wrote: Rick Jones wrote: I seem to recall Sun going on record (pre-Opteron days) about no longer being terribly impressed with the TPC-C benchmark and stating that as the reason they were no longer publishing TPC-C results. There may be some stuff on that topic on their website. I suspect we would not see TPC-C results for a V40z unless Sun had a change of heart.
Well that was because those Ultrasparcs couldn't compete against anybody either on absolute perf or price/perf. Nowadays, they have a compelling price/perf story, so you'll likely see them publish again.

If we take the premis of the non-competitiveness of the UltraSPARCs as
truth (since I post from .hp.com I have to be a bit circumspect :)
wouldn't Sun still have "issues" with the comparison of Opteron to
UltraSPARC? If the benchmark were suddenly "reformed" and so OK to
publish using Sun, Opteron-based systems, it would seem to continue to
beg the question of why it is not suited for UltraSPARC systems.

rick jones
--
The glass is neither half-empty nor half-full. The glass has a leak.
The real question is "Can it be patched?"
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to post, OR email to raj in cup.hp.com but NOT BOTH...

George Macdonald
04-06-2005, 02:04 PM
On Wed, 06 Apr 2005 16:27:14 GMT, Robert Redelmeier
<redelm@ev1.net.invalid> wrote:
In comp.sys.ibm.pc.hardware.chips George Macdonald <fammacd=!SPAM^nothanks@tellurian.com> wrote: The fastest one listed here: http://www.tpc.org/tpcc/results/tpcc_results.asp?print=false&orderby=tpm&sortby=desc AFAICT is the HP ProLiant DL585/2.6GHz - does OK but nothingYes, that is a 4-way Opteron. I fear that such a setup wouldrequire a Northbridge and eliminate the single-thread latencyadvantage of an on-CPU memory controller. Does anyone know?

Eliminate?... Compomrise?:-) It's not what you'd call a "turkey", with
performance just 10K points lower than the Hurricane system... and at 1/2.7
the cost, it's certainly a bargain. You mean "require a Northbridge" to
get better performance as opposed to the Hypertransport links with
worst-case 2-hop memory accesses? I don't know how practical it would be
but IMO AMD should look to bumping performance on the local HT links.
I very much like SMP, but I think I like on-CPU memorycontrollers even more. Maybe like Tony I should wait fordual cores before I replace my aging BP6 (dual Celerons)

Sounds like a plan. It's not clear to me from the roadmaps how long a
socket 939/940 dual core will exist - there seems to be some overlap with
the socket M2 chips and DDR-II memory controllers. Could be there's going
to be a window of err, opportunity.

--
Rgds, George Macdonald

Robert Redelmeier
04-06-2005, 04:01 PM
In comp.sys.ibm.pc.hardware.chips George Macdonald wrote:On Wed, 06 Apr 2005 16:27:14 GMT, Robert Redelmeier wrote:Yes, that is a 4-way Opteron. I fear that such a setup wouldrequire a Northbridge and eliminate the single-thread latencyadvantage of an on-CPU memory controller. Does anyone know? Eliminate?... Compomrise?:-) It's not what you'd call a "turkey", with performance just 10K points lower than the Hurricane system... and at 1/2.7 the cost, it's certainly a bargain. You mean "require a Northbridge" to get better performance

Of course SMP requires a Northbridge for better overall
SMP performance. Mostly by keeping banks open and running
concurrent precharges. But a Northbridge _cannot_ improve a
single random fetch. It's just silicon in the way, and will
want to buffer or queue.
as opposed to the Hypertransport links with worst-case 2-hop memory accesses? I don't know how practical it would be but IMO AMD should look to bumping performance on the local HT links.

2 hop? Sounds ugly. Under what circumstances?

Frankly, I'm a little surprised no-one runs any latency
benchmarks on RAM. A little pointer-chasing exercise isn't
hard to write, and would be very revealing.

Hey, I resemble that remarque :) Maybe I should go write one!

-- Robert

George Macdonald
04-07-2005, 01:47 AM
On Thu, 07 Apr 2005 00:01:31 GMT, Robert Redelmeier
<redelm@ev1.net.invalid> wrote:
In comp.sys.ibm.pc.hardware.chips George Macdonald wrote:On Wed, 06 Apr 2005 16:27:14 GMT, Robert Redelmeier wrote:Yes, that is a 4-way Opteron. I fear that such a setup wouldrequire a Northbridge and eliminate the single-thread latencyadvantage of an on-CPU memory controller. Does anyone know? Eliminate?... Compomrise?:-) It's not what you'd call a "turkey", with performance just 10K points lower than the Hurricane system... and at 1/2.7 the cost, it's certainly a bargain. You mean "require a Northbridge" to get better performanceOf course SMP requires a Northbridge for better overallSMP performance. Mostly by keeping banks open and runningconcurrent precharges. But a Northbridge _cannot_ improve asingle random fetch. It's just silicon in the way, and willwant to buffer or queue. as opposed to the Hypertransport links with worst-case 2-hop memory accesses? I don't know how practical it would be but IMO AMD should look to bumping performance on the local HT links.2 hop? Sounds ugly. Under what circumstances?

When you have 4 CPUs interconnected with 3 HT-links each and at least one
of those has to be used for I/O, some of the accesses have to involve two
hops.
Frankly, I'm a little surprised no-one runs any latencybenchmarks on RAM. A little pointer-chasing exercise isn'thard to write, and would be very revealing.

Dave Wang has discussed it in some detail - one of his pet subjects I
believe. IIRC he was measuring round-trip times the "hard" way with
probes.

--
Rgds, George Macdonald

Tony Hill
04-07-2005, 05:56 AM
On Tue, 05 Apr 2005 07:24:14 -0400, Robert Myers
<rmyers1400@comcast.net> wrote:
On Mon, 04 Apr 2005 22:26:05 -0400, Tony Hill<hilla_nospam_20@yahoo.ca> wrote:If it's remotely accurate, then yes. Unfortunately they really didn'tprovide any context for this. Is this in comparison to the previousIBM chipset? And is this just straight latency to memory for a singlechip on a single access or some sort of average? If it's juststraight latency than the original 265ns number was pretty weak tobegin with, Intel's latest desktop chipsets are down under 100ns andtheir servers should be somewhere around 130-150ns (though I haven'tseen many tests for the latter).Since the quote didn't provide nearly enough information to interpretthe latency claims as absolute numbers, I was careful to characterizeit as a measure of relative latency. I'm assuming that IBM would havethe integrity to do an apples-to-apples comparison with their ownhardware, no matter what the absolute numbers may mean. Were theproject manager in marketing, he might have been shrewd enough to saythat they shaved over a hundred nanoseconds off the chipset latency.I'd be reluctant to say that IBM server chipsets had high latency forserver chipsets based on that soundbyte. You can do as you please,but see another comparison to a previous Summit generation below.

I would say that it's safe to assume IBM has sufficient integrity to
do an apples-to-apples comparison. However I just wanted to point out
that their reduction in latency has happened at about the same time
that everyone else in the industry has also been working hard to
reduce latency in chipsets. Given the numbers it looks like IBM has
been more successful than anyone else, cutting 150ns off memory
latency is very impressive. Other companies (in particular Intel and
nVidia on the desktop side and presumably Intel on the server side as
well) only managed about a 100ns reduction in the same time frame.
64-bit support should offer about a 10% improvement all on it's own.Combine that with a 20% increase in clock speed and a 66% fastersystem bus... Also, if I understand the whole "dual bus" ideaproperly (ie 2 buses with 2 CPUs connected to each one in a 4P systemvs. 4 processors on a single bus in a traditional Xeon system) I thinkthis could make up for a lot of the difference as well. This isexactly how Intel's new E8500 chipset's "dual bus" design operates aswell.I'd be surprised to learn that server applications are drivingfrontside bus bandwidth requirements. One of the reasons you can getaway with hanging so much hardware off a frontside bus in serverapplications is that server CPU's spend so much of their time stalledfor memory--a latency, not a bandwidth, problem.

There are limits to everything, and remember that the P4/Xeon core
seems to be rather bandwidth-hungry. Keep in mind that the old Xeons
had 4 processors hanging off a single 400MT/s, 64-bit wide bus. That
was only 3.2GB/s of memory bandwidth for 4 cores running at 3.0GHz.
You don't need very high bandwidth requirements before that became a
bottleneck.
Predictable,computationally-intensive calculations are typically the mostdemanding of bandwidth.

Indeed, and if SPEC CFP2000_rate scores are anything to go by, the old
4P XeonMP systems absolutely sucked in such situations.
That may also help to put the stated latencies into some perspective(previous generation Summit compared to zx1 in almost the same way).Notice the disappearance of the L4 cache (and X3 does away with L3, aswell). A three-year program from IBM? The timing is just aboutright.

Yup, sounds reasonable.
Intel can design a chip that will come close in performance? I'm surethey can. Will they? Intel's track record on chipsets has beenspotty (to be charitable, at that).

Indeed, particularly for server chipsets. However, one would assume
that they DO have the resources and know-how to design such a chipset
if they felt it was needed. They never seemed to worry much about
latency on their desktop chipsets until the i865/i875, but there they
managed to cut ~100ns off the latency of these chips when compared to
the previous generation.
The only real problem left in computation is getting the data whereyou want it when you need it. The parts that do the computing arealmost afterthoughts compared to the machinery dedicated to gettinginstructions and data to arrive on time and coping with what happenswhen they don't. It's about time the memory subsystem got moreattention, and I hope this isn't the end of it.

I would say that it's only the beginning. The logical next step is to
integrate the memory controller right onto the CPU itself...
... Of course, you could rid yourself of most of these problemsentirely by changing the whole computing paradigm, but that's foranother thread.

I'll leave that thread to you! :>

-------------
Tony Hill
hilla <underscore> 20 <at> yahoo <dot> ca

Robert Redelmeier
04-07-2005, 06:33 AM
In comp.sys.ibm.pc.hardware.chips George Macdonald wrote: On Thu, 07 Apr 2005 00:01:31 GMT, Robert Redelmeier wrote:Frankly, I'm a little surprised no-one runs any latencybenchmarks on RAM. A little pointer-chasing exercise isn'thard to write, and would be very revealing.
Dave Wang has discussed it in some detail - one of his pet subjects I believe. IIRC he was measuring round-trip times the "hard" way with probes.

Well, the deed is done (code below). Perhaps not as sharp as
bus-snooping, but at least this gives program-visible read latency:

Latency System CPU@MHz mem.ctl RAM
ns

144 P3@1000 laptop SO-PC133?
148 2*P3@860 Serverworks ??
178 P4@1800 i850 RDRAM
184 K7@1667 SiS735 PC133
185 P3@600 440BX PC100
217 2*Cel@500 440BX PC90
234 P2@350 440BX PC100?
288 P2@333 440BX PC66

I do need to find & test some more modern systems, but I'm
underwhelmed by the slowness of latency improvement.



compile: $ gcc -O2 lat10m.c
run: $ time ./a.out [multiply user time by 100 to give ns]

/* lat10m.c - Measure latency of 10 million fresh memory reads
(C) Copyright 2005 Robert Redelmeier - GPL v2.0 licence granted */
int p[ 1<<21 ] ;
main (void) {
int i, j ;
for ( i=0 ; i < 1<<21 ; i++ ) p[i] = 0x1FFFFF & (i-5000) ;
for ( j=i=0 ; i < 9600000 ; i++ ) j = p[j] ;
return j ; }


-- Robert


MyLounge.com Site Map
Forum: Cars, Cell Phone, Database, Games, Home Improvement, IT, Music, School, Sports, Web Design, Web Server, Weight Loss

The MyLounge.com forum is intended for informational use only and should not be relied upon and is not a substitute for any advice. The information contained on MyLounge.com are opinions and suggestions of members and is not a representation of the opinions of MyLounge.com. MyLounge.com does not warrant or vouch for the accuracy, completeness or usefulness of any postings or the qualifications of any person responding. Please consult a expert or seek the services of an attorney in your area for more accuracy on your specific situation. Please note that our forums also serve as mirrors to Usenet newsgroups. Many posts you see on our forums are made by newsgroup users who may not be members of MyLounge.com Term of Service