View Full Version : Itanium Montecito stuff
Yousuf Khan
11-16-2003, 08:50 AM
Multicore, symettric multi-threading, and 24MB of cache. Looks like this one
was designed with help from the Alpha team that Intel just bought out
recently from HPaq.
Yousuf Khan
http://www.theinquirer.net/?article=12686
Yousuf Khan wrote:
Multicore, symettric multi-threading, and 24MB of cache. Looks like this one was designed with help from the Alpha team that Intel just bought out recently from HPaq. Yousuf Khan http://www.theinquirer.net/?article=12686
24 Megs of high-speed SRAM ???
Think $$$!
--
- Peter Perls¿ - web: http://u238.dk
"If you have been voting for politicians who promise to give you goodies
at someone else's expense, then you have no right to complain when they
take your money and give it to someone else, including themselves."
-- Thomas Sowell (1992)
Yousuf Khan
11-16-2003, 09:12 AM
"Peter Perlsø" <nospam@nospam.com> wrote in message
news:3fb7ade3$0$27424$edfadb0f@dread16.news.tele.dk... Multicore, symettric multi-threading, and 24MB of cache. Looks like this
one was designed with help from the Alpha team that Intel just bought out recently from HPaq. 24 Megs of high-speed SRAM ??? Think $$$!
Yeah, I'm not even sure why they're dicking around. Just get it over and
done with, put 1GB of SRAM
on it, and get rid of that DRAM already. That would be a feature of the
processor, doesn't need any external RAM. :-)
Yousuf Khan
Robert Myers
11-16-2003, 10:44 AM
On Sun, 16 Nov 2003 16:50:50 GMT, "Yousuf Khan"
<removethisspam.bjsk90.removethispam@hotmail.com> wrote:
Multicore, symettric multi-threading, and 24MB of cache. Looks like this onewas designed with help from the Alpha team that Intel just bought outrecently from HPaq. Yousuf Khanhttp://www.theinquirer.net/?article=12686
SMT was always aimed at Itanium. You can achieve most of the benefits
of OoO execution without actually going OoO by using SMT helper
threads. If you're supporting two cores with four threads each, the
huge cache is inevitable.
RM
Bill Todd
11-16-2003, 12:00 PM
"Robert Myers" <rmyers@rustuck.com> wrote in message
news:bsgfrvcg4lfs92524p2r2i2tnqe3hhbs36@4ax.com... On Sun, 16 Nov 2003 16:50:50 GMT, "Yousuf Khan" <removethisspam.bjsk90.removethispam@hotmail.com> wrote:Multicore, symettric multi-threading, and 24MB of cache. Looks like this
onewas designed with help from the Alpha team that Intel just bought outrecently from HPaq.
I kind of doubt that: those people are reportedly all working on
Tanglewood, any Itanic SMT effort aimed at shipping in 2005 would have had
to have started at least a bit before the first of them settled in at Intel,
and while they may have offered comments I suspect that whatever SMT
mechanism may be incorporated into Itanic (I'm still a bit skeptical of this
report, but it does seem to be pretty wide-spread) differs sufficiently at a
very basic level from what they were working on for EV8 that their
experience may not have been directly transferrable.
Yousuf Khanhttp://www.theinquirer.net/?article=12686 SMT was always aimed at Itanium.
Really? My impression is that the Itanic architecture was largely
established somewhat before SMT appeared on the horizon, that most of the
coordination by the University of Washington researchers was with DEC and
Alpha, and that SMT is particularly amenable to leveraging existing
mechanisms for out-of-order execution (e.g., in Alpha) that are
conspicuously absent in Itanic.
Intel may later have investigated ways to make use of SMT in Itanic, but I
think it was definitely a retrofit.
You can achieve most of the benefits of OoO execution without actually going OoO by using SMT helper threads.
Maybe. But without doubt one of the things that you sacrifice is power
efficiency (not that Itanic appears to worry about this much), since without
the OoO hardware facilities you don't have a clue whether the extra work
you're doing will be useful (and even if it is useful in preloading the
caches, when the *real* code path reaches that point the instructions still
get executed a second time anyway).
Such helper threads are also a lot more expensive in use of execution units
than OoO SMT mechanisms are (again, because of the redundant or useless
execution activity noted above), so you need more EUs (and thus more core
area, which starts to limit clock rates unless you go asynchronous) than
you'd need in an OoO SMT implementation to perform as well.
If you're supporting two cores with four threads each,
Do you have a source for the suggestion that each Montecito core supports 4
threads?
the huge cache is inevitable.
Not if you're primarily using the SMT for helper threads (not that I'm
suggesting that this as a great idea).
- bill
Robert Myers
11-16-2003, 01:02 PM
On Sun, 16 Nov 2003 15:00:21 -0500, "Bill Todd"
<billtodd@metrocast.net> wrote:
"Robert Myers" <rmyers@rustuck.com> wrote in messagenews:bsgfrvcg4lfs92524p2r2i2tnqe3hhbs36@4ax.com... On Sun, 16 Nov 2003 16:50:50 GMT, "Yousuf Khan" <removethisspam.bjsk90.removethispam@hotmail.com> wrote:
<snip> SMT was always aimed at Itanium.Really? My impression is that the Itanic architecture was largelyestablished somewhat before SMT appeared on the horizon, that most of thecoordination by the University of Washington researchers was with DEC andAlpha, and that SMT is particularly amenable to leveraging existingmechanisms for out-of-order execution (e.g., in Alpha) that areconspicuously absent in Itanic.
Oh, there I go again.
SMT at _Intel_ was always aimed at Itanium.
Intel may later have investigated ways to make use of SMT in Itanic, but Ithink it was definitely a retrofit.
I don't think there's much doubt about that.
You can achieve most of the benefits of OoO execution without actually going OoO by using SMT helper threads.Maybe. But without doubt one of the things that you sacrifice is powerefficiency (not that Itanic appears to worry about this much), since withoutthe OoO hardware facilities you don't have a clue whether the extra workyou're doing will be useful (and even if it is useful in preloading thecaches, when the *real* code path reaches that point the instructions stillget executed a second time anyway).
I expect helper threads to find a place even in OoO processors. The
available work on prescheduled speculative slices looks very
promising. A helper thread would also make things like DynamoRIO look
more attractive.
Such helper threads are also a lot more expensive in use of execution unitsthan OoO SMT mechanisms are (again, because of the redundant or uselessexecution activity noted above), so you need more EUs (and thus more corearea, which starts to limit clock rates unless you go asynchronous) thanyou'd need in an OoO SMT implementation to perform as well.
A paper at SC 2003 suggests that "arithmetic is free, bandwidth is
expensive." If someone else doesn't get there first, I'll post a
thread for discussion. It warrants a separate thread.
If you're supporting two cores with four threads each,Do you have a source for the suggestion that each Montecito core supports 4threads?
The paper I cited previously in comp.arch
:
:http://www.cs.ucsd.edu/users/jbrown/papers/sp-cmp.pdf
:
:"Speculative Precomputation on Chip Multiprocessors"
:
:which I gather is from
:
:6th Workshop on Multithreaded Execution, Architecture, and Compilation
:(MTEAC-6) Tuesday, November 19 (2002) Istanbul, Turkey.
:
:"Figure 2 indicates that across the board, SMT consistently
:provides the greatest speedup of the four configurations
:shown, even though it has the fewest overall execution
:resources and the least amount of aggregate cache capacity."
:
:with the four configurations being 4-way SMT, vs 2, 4, and 8 way CMP.
the huge cache is inevitable.Not if you're primarily using the SMT for helper threads (not that I'msuggesting that this as a great idea).
Scheduling helper threads without a roomy cache is tricky. The whole
purpose is to pull stuff into cache ahead of time, and it would be
annoying to have a helper thread bump something else out of cache that
was needed sooner than what the helper thread just pulled in.
RM
Bill Todd
11-16-2003, 04:15 PM
"Robert Myers" <rmyers@rustuck.com> wrote in message
news:aaofrv8o2m2955keltiu8e3vlhiob0n077@4ax.com... On Sun, 16 Nov 2003 15:00:21 -0500, "Bill Todd" <billtodd@metrocast.net> wrote:"Robert Myers" <rmyers@rustuck.com> wrote in messagenews:bsgfrvcg4lfs92524p2r2i2tnqe3hhbs36@4ax.com...
....
You can achieve most of the benefits of OoO execution without actually going OoO by using SMT helper threads.Maybe. But without doubt one of the things that you sacrifice is powerefficiency (not that Itanic appears to worry about this much), since
withoutthe OoO hardware facilities you don't have a clue whether the extra workyou're doing will be useful (and even if it is useful in preloading thecaches, when the *real* code path reaches that point the instructions
stillget executed a second time anyway). I expect helper threads to find a place even in OoO processors.
Possibly, but I suspect only in situations where the workload has fewer
threads than the SMT core supports: otherwise, the other core threads will
likely be far more effective servicing real threads and leaving the
individual thread IPC up to the OoO mechanisms. With Itanic, the trade-off
may be less clear (since it has more to gain on an individual thread from SP
than an OoO core does).
The available work on prescheduled speculative slices looks very promising. A helper thread would also make things like DynamoRIO look more attractive.Such helper threads are also a lot more expensive in use of execution
unitsthan OoO SMT mechanisms are (again, because of the redundant or uselessexecution activity noted above), so you need more EUs (and thus more corearea, which starts to limit clock rates unless you go asynchronous) thanyou'd need in an OoO SMT implementation to perform as well. A paper at SC 2003 suggests that "arithmetic is free, bandwidth is expensive."
Free in what respect(s)? The specific context above is power and chip area
(and by extension of the latter clock rate).
If someone else doesn't get there first, I'll post a thread for discussion. It warrants a separate thread. If you're supporting two cores with four threads each,Do you have a source for the suggestion that each Montecito core supports
4threads? The paper I cited previously in comp.arch : :http://www.cs.ucsd.edu/users/jbrown/papers/sp-cmp.pdf : :"Speculative Precomputation on Chip Multiprocessors" : :which I gather is from : :6th Workshop on Multithreaded Execution, Architecture, and Compilation :(MTEAC-6) Tuesday, November 19 (2002) Istanbul, Turkey. : :"Figure 2 indicates that across the board, SMT consistently :provides the greatest speedup of the four configurations :shown, even though it has the fewest overall execution :resources and the least amount of aggregate cache capacity." : :with the four configurations being 4-way SMT, vs 2, 4, and 8 way CMP.
That paper concentrates on SP in CMP-only environments, and uses the
4-thread SMT core only for comparison purposes. There's nothing in it to
suggest that it refers in any way specifically to Montecito.
the huge cache is inevitable.Not if you're primarily using the SMT for helper threads (not that I'msuggesting that this as a great idea). Scheduling helper threads without a roomy cache is tricky. The whole purpose is to pull stuff into cache ahead of time, and it would be annoying to have a helper thread bump something else out of cache that was needed sooner than what the helper thread just pulled in.
If that were a serious problem, it would be worst in the extremely small L1
cache and significant in the modest L2 cache. The size of the L3 cache
should be completely insensitive to it by comparison, especially with the
24-way associativity that the current Itanic2 L3 cache has: whatever data
is evicted from the L3 by the helper thread is unlikely to be very
important, whereas the new data that the helper thread is bringing in will
almost certainly be needed almost immediately.
- bill
James Boswell
11-28-2003, 03:37 AM
Yousuf Khan <removethisspam.bjsk90.removethispam@hotmail.com> wrote: "Peter Perlsø" <nospam@nospam.com> wrote in message news:3fb7ade3$0$27424$edfadb0f@dread16.news.tele.dk... Multicore, symettric multi-threading, and 24MB of cache. Looks like this one was designed with help from the Alpha team that Intel just bought out recently from HPaq. 24 Megs of high-speed SRAM ??? Think $$$! Yeah, I'm not even sure why they're dicking around. Just get it over and done with, put 1GB of SRAM on it, and get rid of that DRAM already. That would be a feature of the processor, doesn't need any external RAM. :-)
Oddly enough, IBM were going on about that..
and on a .045 process, they could probably get a gig of edram in under
200mm^2 of die area, using the 36MB edram dies they've got alongside the
POWER5 as a guide
-JB
James Boswell wrote:
Yousuf Khan <removethisspam.bjsk90.removethispam@hotmail.com> wrote:"Peter Perlsø" <nospam@nospam.com> wrote in messagenews:3fb7ade3$0$27424$edfadb0f@dread16.news.tele.dk...>Multicore, symettric multi-threading, and 24MB of cache. Looks like>this one was designed with help from the Alpha team that Intel just>bought out recently from HPaq.24 Megs of high-speed SRAM ???Think $$$!Yeah, I'm not even sure why they're dicking around. Just get it over anddone with, put 1GB of SRAMon it, and get rid of that DRAM already. That would be a feature of theprocessor, doesn't need any external RAM. :-) Oddly enough, IBM were going on about that.. and on a .045 process, they could probably get a gig of edram in under 200mm^2 of die area, using the 36MB edram dies they've got alongside the POWER5 as a guide -JB
EDRAM
Enhanced Dynamic Random Access Memory
(E-D-ram)
Another form of DRAM that includes an SRAM cache on the chip. This
allows frequently accessed data to be obtained faster. (Also known as
CDRAM.)
Just FYI.
--
- Peter Perls¿ - web: http://u238.dk
"If you have been voting for politicians who promise to give you goodies
at someone else's expense, then you have no right to complain when they
take your money and give it to someone else, including themselves."
-- Thomas Sowell (1992)
Keith R. Williams
11-28-2003, 11:09 AM
In article <3fc775dd$0$27419$edfadb0f@dread16.news.tele.dk>,
nospam@nospam.com says... James Boswell wrote: Yousuf Khan <removethisspam.bjsk90.removethispam@hotmail.com> wrote:"Peter Perlsø" <nospam@nospam.com> wrote in messagenews:3fb7ade3$0$27424$edfadb0f@dread16.news.tele.dk...>>Multicore, symettric multi-threading, and 24MB of cache. Looks like>>this one was designed with help from the Alpha team that Intel just>>bought out recently from HPaq.>>24 Megs of high-speed SRAM ???>>Think $$$!Yeah, I'm not even sure why they're dicking around. Just get it over anddone with, put 1GB of SRAMon it, and get rid of that DRAM already. That would be a feature of theprocessor, doesn't need any external RAM. :-) Oddly enough, IBM were going on about that.. and on a .045 process, they could probably get a gig of edram in under 200mm^2 of die area, using the 36MB edram dies they've got alongside the POWER5 as a guide -JB EDRAM Enhanced Dynamic Random Access Memory (E-D-ram) Another form of DRAM that includes an SRAM cache on the chip. This allows frequently accessed data to be obtained faster. (Also known as CDRAM.)
.... or embedded DRAM. Just FYI.
Indeed.
--
Keith
James Boswell
11-30-2003, 10:20 AM
Peter Perlsø <nospam@nospam.com> wrote: EDRAM Enhanced Dynamic Random Access Memory (E-D-ram) Another form of DRAM that includes an SRAM cache on the chip. This allows frequently accessed data to be obtained faster. (Also known as CDRAM.)
Or Embedded dram
which IBM are using as L3 cache on the POWER series right now.
-JB
MyLounge.com Site Map
Forum:
Cars,
Cell Phone,
Database,
Games,
Home Improvement,
IT,
Music,
School,
Sports,
Web Design,
Web Server,
Weight Loss
The MyLounge.com forum is intended for informational use only and should not
be relied upon and is not a substitute for any advice. The information contained
on MyLounge.com are opinions and suggestions of members and is not a representation
of the opinions of MyLounge.com. MyLounge.com does not warrant or vouch for
the accuracy, completeness or usefulness of any postings or the qualifications
of any person responding. Please consult a expert or seek the services of an
attorney in your area for more accuracy on your specific situation. Please note
that our forums also serve as mirrors to Usenet newsgroups. Many posts you see
on our forums are made by newsgroup users who may not be members of MyLounge.com
Term of Service
vBulletin v3.0.7, Copyright ©2000-2009, Jelsoft Enterprises Ltd.