View Full Version : AMD equivalent of Hyperthreading?
Will Dormann
08-18-2003, 06:31 PM
I do a lot of video work on my PC. The encoding process is extremely
CPU intensive and can make the system responsiveness pretty slow while
that's going on. With the filtering and compression that I do, a
1-hour clip can take upwards of 12 hours to compress on my Athlon
XP1800. If I'm doing other CPU-intensive work, such as compiling
Mozilla, it makes it even worse. (obviously)
According to this article, hyperthreading can really help out with this
kind of work:
http://www.osnews.com/story.php?news_id=2962
AMD has always seemed to be the best bang for the buck, and for this
reason I have always had AMD systems. But a pair of AMD Athlon MP CPUs
aren't exactly cheap. (nor MP motherboards).
For those reasons, a Hyperthreading-enabled P4 looks pretty enticing for
a power-user such as myself. (If the osnews article above is correct)
Does AMD have any plans to add hyperthreading-like features to the chips
coming out, such as the Opteron or Athlon64?
-WD
Rob Stow
08-18-2003, 07:21 PM
Will Dormann wrote:
I do a lot of video work on my PC. The encoding process is extremely CPU intensive and can make the system responsiveness pretty slow while that's going on. With the filtering and compression that I do, a 1-hour clip can take upwards of 12 hours to compress on my Athlon XP1800. If I'm doing other CPU-intensive work, such as compiling Mozilla, it makes it even worse. (obviously) According to this article, hyperthreading can really help out with this kind of work: http://www.osnews.com/story.php?news_id=2962 AMD has always seemed to be the best bang for the buck, and for this reason I have always had AMD systems. But a pair of AMD Athlon MP CPUs aren't exactly cheap. (nor MP motherboards). For those reasons, a Hyperthreading-enabled P4 looks pretty enticing for a power-user such as myself. (If the osnews article above is correct) Does AMD have any plans to add hyperthreading-like features to the chips coming out, such as the Opteron or Athlon64? -WD
I had a chance a few weeks ago to play with a HT-enabled machine,
and simultaneous encoding and compiling just happens to be one of
the things I tested. (I used TMPegEnc and MS VC++ 6).
HT seems to make the individual apps more responsive to user input,
but neither compute-intensive tasks completed any faster.
In other words, while you are encoding your video, HT will make the
IDE for your compiler more responsive to your mouse and keyboard
input. However, the time it takes to do the actual compiling or
encoding will not improve compared to doing the same simultaneous
tasks with HT disabled.
If you want better responsiveness *and* better performance while
doing simultaneous tasks, then SMP is the way to go.
Will Dormann wrote:
I do a lot of video work on my PC. The encoding process is extremely CPU intensive and can make the system responsiveness pretty slow while that's going on. With the filtering and compression that I do, a 1-hour clip can take upwards of 12 hours to compress on my Athlon XP1800. If I'm doing other CPU-intensive work, such as compiling Mozilla, it makes it even worse. (obviously) According to this article, hyperthreading can really help out with this kind of work: http://www.osnews.com/story.php?news_id=2962
Read this article:
http://www.pcworld.com/news/article/0,aid,107492,00.asp
AMD has always seemed to be the best bang for the buck, and for this reason I have always had AMD systems. But a pair of AMD Athlon MP CPUs aren't exactly cheap. (nor MP motherboards).
For those reasons, a Hyperthreading-enabled P4 looks pretty enticing for a power-user such as myself. (If the osnews article above is correct) Does AMD have any plans to add hyperthreading-like features to the chips coming out, such as the Opteron or Athlon64?
-WD
Stacey
08-18-2003, 08:59 PM
JK wrote:
Will Dormann wrote: I do a lot of video work on my PC. The encoding process is extremely CPU intensive and can make the system responsiveness pretty slow while that's going on. With the filtering and compression that I do, a 1-hour clip can take upwards of 12 hours to compress on my Athlon XP1800. If I'm doing other CPU-intensive work, such as compiling Mozilla, it makes it even worse. (obviously) According to this article, hyperthreading can really help out with this kind of work: http://www.osnews.com/story.php?news_id=2962 Read this article: http://www.pcworld.com/news/article/0,aid,107492,00.asp
http://www.anandtech.com/cpu/showdoc.html?i=1746&p=22
--
Stacey
Guest
08-18-2003, 11:18 PM
Stacey <fotocord@yahoo.com> wrote: Will Dormann wrote:
I do a lot of video work on my PC. The encoding process is extremely CPU intensive and can make the system responsiveness pretty slow while that's going on. With the filtering and compression that I do, a 1-hour clip can take upwards of 12 hours to compress on my Athlon XP1800. If I'm doing other CPU-intensive work, such as compiling Mozilla, it makes it even worse. (obviously) I personally compared an XP2400 vs a P4 2.4/533 (before the HT chips were out) using a vegas video 3c render test file. The P4 completed the job in 1/2 the time. For many uses AMD's are fine, for video editing they are the WRONG platform to use.
How newer games like Battlefield 1942?
--
"I killed an ant, now all my relatives are afraid of me." --unknown
/\___/\
/ /\ /\ \ Ant @ The Ant Farm: http://antfarm.ma.cx
| |o o| | E-mail: philpi@earthlink.netANT or philpi@apu.eduANT
\ _ / Remove ANT if replying by e-mail from a newsgroup.
( )
Never anonymous Bud
08-19-2003, 02:38 AM
Separating himself from Baghdad Bob, Rob Stow <rob.stow@sasktel.net> whined:
HT seems to make the individual apps more responsive to user input,but neither compute-intensive tasks completed any faster.
The P4 FAR exceeds any AMD (single) CPU in video and audio encoding.
If you have lots of cash laying around, a dual Opteron will be closer
to a single P4 in the same tasks.
http://www6.tomshardware.com/cpu/20030422/opteron-24.html
http://www.anandtech.com/cpu/showdoc.html?i=1818&p=8
To reply by email, remove the XYZ.
Lumber Cartel (tinlc) #2063. Spam this account at your own risk.
It's your SIG, say what you want to say....
Scott Alfter
08-19-2003, 10:12 AM
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
In article <bhsa2r$2mtng$2@ID-52908.news.uni-berlin.de>,
Stacey <fotocord@yahoo.com> wrote: I personally compared an XP2400 vs a P4 2.4/533 (before the HT chips wereout) using a vegas video 3c render test file. The P4 completed the job in1/2 the time. For many uses AMD's are fine, for video editing they are theWRONG platform to use.
Comparing TMPGEnc (encoding from an Avisynth script) on a dual Athlon MP
1900 to a 2.8-GHz P4 didn't quite produce opposite results, but the dually
was significantly faster than the P4 (don't remember the exact numbers
offhand, but it wasn't a small difference...I think the dually did the job
in about a third less time).
I suspect which is faster depends on what software you're running...for what
I use, dual Athlon MPs will usually be the fastest way to go.
_/_ Scott Alfter
/ v \ salfter@salfter.dyndns.org
(IIGS( http://alfter.us Top-posting!
\_^_/ pkill -9 /bin/laden >What is the most annoying thing on Usenet?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)
iD8DBQE/QmiJVgTKos01OwkRAuG1AKCx6YmuiB6xq6oeeYH2ndOHyTF9sACgyE7m
iA/Dh1mPIMWqHRKagr+vaWE=
=Jirt
-----END PGP SIGNATURE-----
Felger Carbon
08-19-2003, 02:42 PM
"Never anonymous Bud" <thekat@san.rxyzr.com> wrote in message
news:b6v3kv45oeu875aavtkhgb1dh8d3o2uouc@4ax.com... The P4 FAR exceeds any AMD (single) CPU in video and audio encoding.
Absolutely correct! Video (and somewhat audio too) is what the new P4
microarchitecture is all about. It is superb at video tasks.
If you want to run legacy applications, the more conventional AMD
microarchitecture is faster than the P4.
You pays yer money and you makes yer choice! ;-)
George Macdonald
08-19-2003, 05:27 PM
On Tue, 19 Aug 2003 22:42:50 GMT, "Felger Carbon" <fmrfne@jps.net> wrote:
"Never anonymous Bud" <thekat@san.rxyzr.com> wrote in messagenews:b6v3kv45oeu875aavtkhgb1dh8d3o2uouc@4ax.com... The P4 FAR exceeds any AMD (single) CPU in video and audio encoding.Absolutely correct! Video (and somewhat audio too) is what the new P4microarchitecture is all about. It is superb at video tasks.
So has the underflow "problem" with audio processing been licked? I
figured it had to be software but the software companies seemed to playing
dumb on it... which left the user frustrated.
Rgds, George Macdonald
"Just because they're paranoid doesn't mean you're not psychotic" - Who, me??
Stacey
08-19-2003, 06:44 PM
Scott Alfter wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 In article <bhsa2r$2mtng$2@ID-52908.news.uni-berlin.de>, Stacey <fotocord@yahoo.com> wrote: I personally compared an XP2400 vs a P4 2.4/533 (before the HT chips wereout) using a vegas video 3c render test file. The P4 completed the job in1/2 the time. For many uses AMD's are fine, for video editing they are theWRONG platform to use. Comparing TMPGEnc (encoding from an Avisynth script) on a dual Athlon MP 1900 to a 2.8-GHz P4 didn't quite produce opposite results, but the dually was significantly faster than the P4 (don't remember the exact numbers offhand, but it wasn't a small difference...I think the dually did the job in about a third less time).
Guess I should have said a single AMD system is the wrong choice. Is TMPGEnc
SSE2 optimised? Vegas and it's encoder are and a P4 runs much faster. Never
tried a dual MP system so can't coment if it would be faster than a P4 but
have noticed most video "turn key" systems use dual AMD's on the lower end
one and P4's on the higher end stuff.
Like I said for most uses AMD's work great but video editing isn't their
strong point.
--
Stacey
Stacey
08-19-2003, 06:45 PM
ANTant@zimage.com wrote:
Stacey <fotocord@yahoo.com> wrote: Will Dormann wrote: I do a lot of video work on my PC. The encoding process is extremely CPU intensive and can make the system responsiveness pretty slow while that's going on. With the filtering and compression that I do, a 1-hour clip can take upwards of 12 hours to compress on my Athlon XP1800. If I'm doing other CPU-intensive work, such as compiling Mozilla, it makes it even worse. (obviously) I personally compared an XP2400 vs a P4 2.4/533 (before the HT chips were out) using a vegas video 3c render test file. The P4 completed the job in 1/2 the time. For many uses AMD's are fine, for video editing they are the WRONG platform to use. How newer games like Battlefield 1942?
For games/general use AMD's are great. They just don't excel at video
editing and a P4 is a better choice for that use.
--
Stacey
Stacey
08-19-2003, 06:46 PM
George Macdonald wrote:
On Tue, 19 Aug 2003 22:42:50 GMT, "Felger Carbon" <fmrfne@jps.net> wrote:"Never anonymous Bud" <thekat@san.rxyzr.com> wrote in messagenews:b6v3kv45oeu875aavtkhgb1dh8d3o2uouc@4ax.com... The P4 FAR exceeds any AMD (single) CPU in video and audio encoding.Absolutely correct! Video (and somewhat audio too) is what the new P4microarchitecture is all about. It is superb at video tasks. So has the underflow "problem" with audio processing been licked?
I think the software people have finally hacked around it..
--
Stacey
David Winter
08-20-2003, 04:46 AM
I read offline, and neither of these posts made any meaningful sense to me.
Yes OK, so I should be like you and have Broadband, well soon, when the $$s
stiffen up a bit. But now, please think of offliners on pay-by-the-hour
ISPs, and those who pay by the minute for the phone line. Give a sysnopsis
and an opinion as well as the source.
DW
"Stacey" <fotocord@yahoo.com> wrote in message
news:bhsai5$2j2ke$1@ID-52908.news.uni-berlin.de...
: JK wrote:
:
: >
: >
: > Will Dormann wrote:
: >
: >> I do a lot of video work on my PC. The encoding process is extremely
: >> CPU intensive and can make the system responsiveness pretty slow while
: >> that's going on. With the filtering and compression that I do, a
: >> 1-hour clip can take upwards of 12 hours to compress on my Athlon
: >> XP1800. If I'm doing other CPU-intensive work, such as compiling
: >> Mozilla, it makes it even worse. (obviously)
: >>
: >> According to this article, hyperthreading can really help out with this
: >> kind of work:
: >> http://www.osnews.com/story.php?news_id=2962
: >
: > Read this article:
: >
: > http://www.pcworld.com/news/article/0,aid,107492,00.asp
:
:
: http://www.anandtech.com/cpu/showdoc.html?i=1746&p=22
:
:
: --
:
: Stacey
David Winter
08-20-2003, 04:54 AM
And a dual 1.9 is compared to a single 2.8, and is said to outperform by
30%.
Let's do some CRUDE and RUDE sums.
2 x 1.9 = 3.8
3.8 is 135% of 2.8
So in crude terms, one would expect 35% more crunch with the dual 1.9s
The observation was about 30% more
Give or take, the P4 single has a small edge over dual AMDs of the same
apparent CPU cycles.
DW
"Stacey" <fotocord@yahoo.com> wrote in message
news:bhun1h$3aegh$5@ID-52908.news.uni-berlin.de...
: Scott Alfter wrote:
:
: > -----BEGIN PGP SIGNED MESSAGE-----
: > Hash: SHA1
: >
: > In article <bhsa2r$2mtng$2@ID-52908.news.uni-berlin.de>,
: > Stacey <fotocord@yahoo.com> wrote:
: >> I personally compared an XP2400 vs a P4 2.4/533 (before the HT chips
: >> were
: >>out) using a vegas video 3c render test file. The P4 completed the job
in
: >>1/2 the time. For many uses AMD's are fine, for video editing they are
the
: >>WRONG platform to use.
: >
: > Comparing TMPGEnc (encoding from an Avisynth script) on a dual Athlon MP
: > 1900 to a 2.8-GHz P4 didn't quite produce opposite results, but the
dually
: > was significantly faster than the P4 (don't remember the exact numbers
: > offhand, but it wasn't a small difference...I think the dually did the
job
: > in about a third less time).
: >
:
:
: Guess I should have said a single AMD system is the wrong choice. Is
TMPGEnc
: SSE2 optimised? Vegas and it's encoder are and a P4 runs much faster.
Never
: tried a dual MP system so can't coment if it would be faster than a P4 but
: have noticed most video "turn key" systems use dual AMD's on the lower end
: one and P4's on the higher end stuff.
:
: Like I said for most uses AMD's work great but video editing isn't their
: strong point.
:
: --
:
: Stacey
Alexander Grigoriev
08-20-2003, 07:41 AM
What is "underflow problem with audio processing"? Just wondering.
"George Macdonald" <fammacd=!SPAM^nothanks@tellurian.com> wrote in message
news:3f42ca86.57318814@news.tellurian.com... On Tue, 19 Aug 2003 22:42:50 GMT, "Felger Carbon" <fmrfne@jps.net> wrote:"Never anonymous Bud" <thekat@san.rxyzr.com> wrote in messagenews:b6v3kv45oeu875aavtkhgb1dh8d3o2uouc@4ax.com... The P4 FAR exceeds any AMD (single) CPU in video and audio encoding.Absolutely correct! Video (and somewhat audio too) is what the new P4microarchitecture is all about. It is superb at video tasks. So has the underflow "problem" with audio processing been licked? I figured it had to be software but the software companies seemed to playing dumb on it... which left the user frustrated. Rgds, George Macdonald "Just because they're paranoid doesn't mean you're not psychotic" - Who,
me??
Martin Atkinson-Barr
08-20-2003, 10:08 AM
"Stacey" <fotocord@yahoo.com> wrote in message
news:bhun1h$3aegh$5@ID-52908.news.uni-berlin.de... Scott Alfter wrote: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 In article <bhsa2r$2mtng$2@ID-52908.news.uni-berlin.de>, Stacey <fotocord@yahoo.com> wrote: I personally compared an XP2400 vs a P4 2.4/533 (before the HT chips wereout) using a vegas video 3c render test file. The P4 completed the job
in1/2 the time. For many uses AMD's are fine, for video editing they are
theWRONG platform to use. Comparing TMPGEnc (encoding from an Avisynth script) on a dual Athlon MP 1900 to a 2.8-GHz P4 didn't quite produce opposite results, but the
dually was significantly faster than the P4 (don't remember the exact numbers offhand, but it wasn't a small difference...I think the dually did the
job in about a third less time). Guess I should have said a single AMD system is the wrong choice. Is
TMPGEnc SSE2 optimised? Vegas and it's encoder are and a P4 runs much faster.
Never tried a dual MP system so can't coment if it would be faster than a P4 but have noticed most video "turn key" systems use dual AMD's on the lower end one and P4's on the higher end stuff. Like I said for most uses AMD's work great but video editing isn't their strong point. -- Stacey
I think that Intel has been very active in getting video editing apps
optimized for the P4 architecture. Especially using SSE2 which is not
supported on the P3s and the Athlons. SSE2 is implemented on the Athlon64
and Opterons.
Once Windows XP64 for AMD64 is out I hope that the domestic video editors
will take advantage of it. Given the short time until the launch of the
Athlon64s I would wait before buying any video editing setup.
Keith R. Williams
08-20-2003, 06:44 PM
In article <oEM0b.926$Ej6.754@newsread4.news.pas.earthlink.net>,
alegr@earthlink.net says... What is "underflow problem with audio processing"? Just wondering.
I believe George is referring to the P4's "de normal" problem.
You might want to do a web search on "P4" and "denormal". The P4
isn't unique here, it's just more pronounced on the P4 than other
processors. The P4 architects made *many* less than optimal
decisions (shift and multiply are the other biggies) that
severely affect performance. This is apparently one that they
didn't expect to bite so hard.
--
Keith
Stacey
08-20-2003, 07:03 PM
Martin Atkinson-Barr wrote:
Stacey I think that Intel has been very active in getting video editing apps optimized for the P4 architecture. Especially using SSE2 which is not supported on the P3s and the Athlons. SSE2 is implemented on the Athlon64 and Opterons. Once Windows XP64 for AMD64 is out I hope that the domestic video editors will take advantage of it. Given the short time until the launch of the Athlon64s I would wait before buying any video editing setup.
I HOPE the AMD64 is a gain for video editing as I'll be the first to buy one
if it does. So far the reviews I've seen using current apps doesn't look
promising compared to what Intel is doing. Maybe =if= the popular apps are
written for AMD64's then this might change but what % of users will have
AMD64's vs Intel's?
--
Stacey
George Macdonald
08-21-2003, 05:11 AM
On Tue, 19 Aug 2003 22:46:47 -0400, Stacey <fotocord@yahoo.com> wrote:
George Macdonald wrote: On Tue, 19 Aug 2003 22:42:50 GMT, "Felger Carbon" <fmrfne@jps.net> wrote:"Never anonymous Bud" <thekat@san.rxyzr.com> wrote in messagenews:b6v3kv45oeu875aavtkhgb1dh8d3o2uouc@4ax.com...>> The P4 FAR exceeds any AMD (single) CPU in video and audio encoding.Absolutely correct! Video (and somewhat audio too) is what the new P4microarchitecture is all about. It is superb at video tasks. So has the underflow "problem" with audio processing been licked?I think the software people have finally hacked around it..
Uhh, like I inferred!
Rgds, George Macdonald
"Just because they're paranoid doesn't mean you're not psychotic" - Who, me??
Will Dormann
08-21-2003, 09:59 AM
Scott Alfter wrote: Comparing TMPGEnc (encoding from an Avisynth script) on a dual Athlon MP 1900 to a 2.8-GHz P4 didn't quite produce opposite results, but the dually was significantly faster than the P4 (don't remember the exact numbers offhand, but it wasn't a small difference...I think the dually did the job in about a third less time). I suspect which is faster depends on what software you're running...for what I use, dual Athlon MPs will usually be the fastest way to go.
Yeah, I think in my case it will be about the same. When processing a
video clip, I'll commonly have running AviSynth plus 2 instances of
VirtualDub, and then TMPGEnc depending on if the target is MPEG.
It looks like either Dual Athlon or HT-enabled P4 will improve system
responsiveness over my single Athlon XP that I have now. But the dual
athlon will most likely be faster.
If I had $300 to spend, that would get me a pair of Athlon MP 2400 chips
or a single HT P4 2.8GHz. The Athlon route sure sounds nicer, but the
MB cost looks to be nearly $100 more.
-WD
Felger Carbon
08-21-2003, 03:29 PM
"George Macdonald" <fammacd=!SPAM^nothanks@tellurian.com> wrote in
message news:3f44be29.4121573@news.tellurian.com...
With some software, doing audio capture, on quiet music passages the
CPU would churn badly... compared with a PIII or Athlon. The reason I
saw mentioned was that the P4 was doing underflows to denormal results
instead of truncation. I never got the details of whether the software had
been "converted" to use SSE2 but I suspect so.
Any microcomputer architecture is a collection of compromises. One of
the P4 compromises was, how fast do you want to handle the extremely
rare floating point underflows? The P4 architects assigned relatively
few assets to this function, which makes sense for average programs.
As you indicate, crunching digital audio, one frequently encounters
underflows when using single precision floating point.
Anyway there is a flag to do the cut-off/truncate in the control register so I was puzzled as to why
anybody would leave it off and allow the denormals.
Using denormals when crunching digital audio reduces the
digital-artifact noise level. For other computers than the P4, which
handle denorms more quickly, the optimum situation is to leave the
denorms "on" to reduce the noise level. Software developers evidently
had not developed software specifically intended for the P4. For the
P4, the best software compromise is to use double precision FP; this
would halve the data crunching rate (not a problem for audio on the
P4) and assure that denorms would never occur.
The P4's set of compromises is well selected for streaming video,
which Intel believes is the coming thing. It's inevitable that the
compromises will bite you on the leg in some specific situations.
Stacey
08-21-2003, 05:42 PM
David Winter wrote:
I read offline, and neither of these posts made any meaningful sense to me. Yes OK, so I should be like you and have Broadband, well soon, when the $$s stiffen up a bit. But now, please think of offliners on pay-by-the-hour ISPs, and those who pay by the minute for the phone line. Give a sysnopsis and an opinion as well as the source.
Why not read the link later if it interests you? It's an opposing in depth
article about how hyperthreading helps written by people who did extencive
testing. Seems like a waste of my time to condence this for you so you
don't have to be bothered with looking at the artcle.
BTW how long does it take to download 750B of text anyway for this to be
such a waste to you? I was on dial up until a few months ago and
downloading a whole group of text articles doesn't take very long. What
causes more bandwidth waste is top posters who snip none of the text in
their reply.
--
Stacey
Keith R. Williams
08-21-2003, 06:06 PM
In article <DBc1b.2695$lw4.2108
@newsread3.news.pas.earthlink.net>, fmrfne@jps.net says... "George Macdonald" <fammacd=!SPAM^nothanks@tellurian.com> wrote in message news:3f44be29.4121573@news.tellurian.com... With some software, doing audio capture, on quiet music passages the CPU would churn badly... compared with a PIII or Athlon. The reason I saw mentioned was that the P4 was doing underflows to denormal results instead of truncation. I never got the details of whether the software had been "converted" to use SSE2 but I suspect so. Any microcomputer architecture is a collection of compromises. One of the P4 compromises was, how fast do you want to handle the extremely rare floating point underflows? The P4 architects assigned relatively few assets to this function, which makes sense for average programs.
To be fair Felg, they also assigned little importance to multiply
and shifts. I believe (though could be wrong) this is the reason
for the crappy denorm performance. In any case, the P4 is a
horrid compromise.
As you indicate, crunching digital audio, one frequently encounters underflows when using single precision floating point.
Anyway there is a flag to do the cut-off/truncate in the control register so I was puzzled as to why anybody would leave it off and allow the denormals. Using denormals when crunching digital audio reduces the digital-artifact noise level. For other computers than the P4, which handle denorms more quickly, the optimum situation is to leave the denorms "on" to reduce the noise level. Software developers evidently had not developed software specifically intended for the P4. For the P4, the best software compromise is to use double precision FP; this would halve the data crunching rate (not a problem for audio on the P4) and assure that denorms would never occur.
Sounds good. What were the P4 architects thinking? Actually,
having "listened" to one on c.a. he was similarly stumped at
*management*. ...he's now working for AMD, BTW.
The P4's set of compromises is well selected for streaming video, which Intel believes is the coming thing. It's inevitable that the compromises will bite you on the leg in some specific situations.
I don't buy it. IMO they rushed the P4 out before the process
would allow a decent implementation. Two functional units (and
two cross-chip transfers) to do a simple integer multiply? Wow!
What a perfect example of sub-optimiztion of *two* execution
units! ...then there is the (lack of a) shifter.
No, I believe the P4 was the result of a pants-down moment for
Intel. The fact that it does well at streaming stuff is simply
because such things don't show the (current implementation of
the) P4's glass jaw.
--
Keith
Felger Carbon
08-21-2003, 07:56 PM
"Keith R. Williams" <krw@attglobal.net> wrote in message
news:MPG.19af44b53b33837798a5eb@enews.newsguy.com... In article <DBc1b.2695$lw4.2108 @newsread3.news.pas.earthlink.net>, fmrfne@jps.net says... The P4's set of compromises is well selected for streaming video, which Intel believes is the coming thing. It's inevitable that
the compromises will bite you on the leg in some specific situations. I don't buy it. IMO they rushed the P4 out before the process would allow a decent implementation. Two functional units (and two cross-chip transfers) to do a simple integer multiply? Wow! What a perfect example of sub-optimiztion of *two* execution units! ...then there is the (lack of a) shifter. No, I believe the P4 was the result of a pants-down moment for Intel. The fact that it does well at streaming stuff is simply because such things don't show the (current implementation of the) P4's glass jaw.
Keith, I'm sure that readers of the .chips NG will be shocked to learn
that you and I do not always agree. ;-)
While we seem to be agreed that the P4 does well on streaming video,
you seem to think that's an accidental benefit of the P4
microarchitecture, while I think it represents deliberate choices by
Intel and is not accidental.
Keith R. Williams
08-22-2003, 04:25 AM
In article <Lvg1b.2983$lw4.258@newsread3.news.pas.earthlink.net>,
fmrfne@jps.net says... "Keith R. Williams" <krw@attglobal.net> wrote in message news:MPG.19af44b53b33837798a5eb@enews.newsguy.com... In article <DBc1b.2695$lw4.2108 @newsread3.news.pas.earthlink.net>, fmrfne@jps.net says... The P4's set of compromises is well selected for streaming video, which Intel believes is the coming thing. It's inevitable that the compromises will bite you on the leg in some specific situations. I don't buy it. IMO they rushed the P4 out before the process would allow a decent implementation. Two functional units (and two cross-chip transfers) to do a simple integer multiply? Wow! What a perfect example of sub-optimiztion of *two* execution units! ...then there is the (lack of a) shifter. No, I believe the P4 was the result of a pants-down moment for Intel. The fact that it does well at streaming stuff is simply because such things don't show the (current implementation of the) P4's glass jaw. Keith, I'm sure that readers of the .chips NG will be shocked to learn that you and I do not always agree. ;-) While we seem to be agreed that the P4 does well on streaming video, you seem to think that's an accidental benefit of the P4 microarchitecture, while I think it represents deliberate choices by Intel and is not accidental.
Oh no! We agree here, but only as far as you go. Where Intel screwed
up the P4 is not in the bandwidth (where streaming video comes into
play), but in its performance in other areas. Deliberate choices? Sure.
They chose to spend so much silicon on making it a bandwidth behemoth
they had nothing left for computations. ;-) Appropriate choices?
Debatable (and at least one P4 architect didn't agree with those
choices). I think they've figured that out and will spend some more
silicon at 90nm (a larger Icache wouldn't hurt either).
--
Keith
Felger Carbon
08-22-2003, 11:23 AM
"Keith R. Williams" <krw@attglobal.net> wrote in message
news:MPG.19afd5b9996c06dd989a34@enews.newsguy.com... In article <Lvg1b.2983$lw4.258@newsread3.news.pas.earthlink.net>, fmrfne@jps.net says... While we seem to be agreed that the P4 does well on streaming
video, you seem to think that's an accidental benefit of the P4 microarchitecture, while I think it represents deliberate choices
by Intel and is not accidental. Oh no! We agree here, but only as far as you go. Where Intel
screwed up the P4 is not in the bandwidth (where streaming video comes into play), but in its performance in other areas. Deliberate choices?
Sure. They chose to spend so much silicon on making it a bandwidth
behemoth they had nothing left for computations. ;-) Appropriate choices? Debatable (and at least one P4 architect didn't agree with those choices). I think they've figured that out and will spend some more silicon at 90nm (a larger Icache wouldn't hurt either).
Keith (feigning horror :)! You _know_ the P4 doesn't even _have_ an
Icache! How can the Icache be larger? It does have a trace cache,
and I think the 90nm Prescott does increase the size of the trace
cache from 12K, perhaps to 16K. The shift problem is fixed, too.
Don't know yet about the integer multiply.
You know this stuff, Keith, so you can stop reading now. But for some
of the other .chips lurkers, who may not be familiar with the P4, let
me explain. Like all modern x86 CPUs, the P4 decodes complex x86
instructions into primitive operations called uops (micro operations).
The decoded uops are then stored in the P4's "trace cache", which
replaces the traditional instruction cache, or Icache.
The next time the instruction is needed, it is already decoded and can
be fetched from the trace cache, bypassing the decode latency. Up to
3 uops can be fetched and executed on one clock cycle. BUT here is
yet another P4 compromise: If the next instruction is not in the
trace cache, it must be decoded. But the P4 only has _one_
instruction decoder! So, for new code or old code that's been
expelled from the trace cache, the P4 is not even a superscalar CPU!
In contrast, the PIII and Athlon _are_ superscalar CPUs with
(necessarily) more than one instruction decoder.
Isn't microarchitecture fun? ;-)
Keith R. Williams
08-22-2003, 12:13 PM
In article <_4u1b.3510$lw4.2614@newsread3.news.pas.earthlink.net>,
fmrfne@jps.net says... "Keith R. Williams" <krw@attglobal.net> wrote in message news:MPG.19afd5b9996c06dd989a34@enews.newsguy.com... In article <Lvg1b.2983$lw4.258@newsread3.news.pas.earthlink.net>, fmrfne@jps.net says... While we seem to be agreed that the P4 does well on streaming video, you seem to think that's an accidental benefit of the P4 microarchitecture, while I think it represents deliberate choices by Intel and is not accidental. Oh no! We agree here, but only as far as you go. Where Intel screwed up the P4 is not in the bandwidth (where streaming video comes into play), but in its performance in other areas. Deliberate choices? Sure. They chose to spend so much silicon on making it a bandwidth behemoth they had nothing left for computations. ;-) Appropriate choices? Debatable (and at least one P4 architect didn't agree with those choices). I think they've figured that out and will spend some more silicon at 90nm (a larger Icache wouldn't hurt either). Keith (feigning horror :)! You _know_ the P4 doesn't even _have_ an Icache! How can the Icache be larger? It does have a trace cache,
Well, they don't put Ds in there, so it's an Icache to me! Just because
they store little ops in there instead of big ones doesn't mean it's
not an Icache. ;-)
and I think the 90nm Prescott does increase the size of the trace cache from 12K, perhaps to 16K. The shift problem is fixed, too. Don't know yet about the integer multiply.
I'd bet it is. It's a biggie too.
You know this stuff, Keith, so you can stop reading now.
Gotcha! It's getting close to beer-thirty on a FRIDAY anyway.
--
Keith
Felger Carbon
08-23-2003, 01:17 PM
"Alexander Grigoriev" <alegr@earthlink.net> wrote in message
news:%oO1b.516$Jh2.193@newsread4.news.pas.earthlink.net... So, if I understand correctly, the problem is that denormals are
slow to load into FPU (since it requires unknown shift for normalization),
and refuse to die out in a regular IIR filter; so once the source sound
is silenced to zero, the IIR filters start giving denormals on the
output and never go down to zero (because of rounding). I think one possible solution could be to set 80 bit internal
precision in FPU (instead of default 64 bits), and rounding toward zero. This
would help to eventually drive IIR output to zero. If the source code is available, though, it's better to just check
for a denormal and turn it to zero.
Alexander, you almost have it right! The problem is that digital
audio underflows using *single precision* (32-bit) floating point.
Switching to double precision (64-bit) is all that's needed, and the
P4's SSE2 unit can handle two such operations per clock. 80-bit is
overkill.
George Macdonald
08-25-2003, 03:34 PM
On Thu, 21 Aug 2003 23:29:39 GMT, "Felger Carbon" <fmrfne@jps.net> wrote:
"George Macdonald" <fammacd=!SPAM^nothanks@tellurian.com> wrote inmessage news:3f44be29.4121573@news.tellurian.com... With some software, doing audio capture, on quiet music passages theCPU would churn badly... compared with a PIII or Athlon. The reason Isaw mentioned was that the P4 was doing underflows to denormal resultsinstead of truncation. I never got the details of whether the software hadbeen "converted" to use SSE2 but I suspect so.Any microcomputer architecture is a collection of compromises. One ofthe P4 compromises was, how fast do you want to handle the extremelyrare floating point underflows? The P4 architects assigned relativelyfew assets to this function, which makes sense for average programs.As you indicate, crunching digital audio, one frequently encountersunderflows when using single precision floating point.
So is that lack of "assets" evident in the x87 stack-based FPU too?... IOW
not just the SSE2?
Anyway there is a flag to do the cut-off/truncate in the control register so I was puzzled as to whyanybody would leave it off and allow the denormals.Using denormals when crunching digital audio reduces thedigital-artifact noise level. For other computers than the P4, whichhandle denorms more quickly, the optimum situation is to leave thedenorms "on" to reduce the noise level.
Even with only 8 bits for the biased exponent do numbers that small really
matter? The difference between 2^-128, 2^-129, etc. and 0 seems awful
small here. I'm more used to the opposite problem though, where I wish
there was a programmable hardware zero tolerance/floor for FP numbers.
Rgds, George Macdonald
"Just because they're paranoid doesn't mean you're not psychotic" - Who, me??
Alexander Grigoriev
08-26-2003, 06:46 AM
Does it occurs with SSE2 instructions or with plain FPU instructions?
"Felger Carbon" <fmrfne@jps.net> wrote in message
news:jRQ1b.822$3E.699@newsread3.news.pas.earthlink.net... "Alexander Grigoriev" <alegr@earthlink.net> wrote in message news:%oO1b.516$Jh2.193@newsread4.news.pas.earthlink.net... So, if I understand correctly, the problem is that denormals are slow to load into FPU (since it requires unknown shift for normalization), and refuse to die out in a regular IIR filter; so once the source sound is silenced to zero, the IIR filters start giving denormals on the output and never go down to zero (because of rounding). I think one possible solution could be to set 80 bit internal precision in FPU (instead of default 64 bits), and rounding toward zero. This would help to eventually drive IIR output to zero. If the source code is available, though, it's better to just check for a denormal and turn it to zero. Alexander, you almost have it right! The problem is that digital audio underflows using *single precision* (32-bit) floating point. Switching to double precision (64-bit) is all that's needed, and the P4's SSE2 unit can handle two such operations per clock. 80-bit is overkill.
George Macdonald
08-27-2003, 09:52 PM
On Tue, 26 Aug 2003 04:39:19 GMT, "Felger Carbon" <fmrfne@jps.net> wrote:
"George Macdonald" <fammacd=!SPAM^nothanks@tellurian.com> wrote inmessage news:3f49d1b7.124325022@news.tellurian.com...Any microcomputer architecture is a collection of compromises. Oneofthe P4 compromises was, how fast do you want to handle theextremelyrare floating point underflows? The P4 architects assignedrelativelyfew assets to this function, which makes sense for averageprograms.As you indicate, crunching digital audio, one frequently encountersunderflows when using single precision floating point. So is that lack of "assets" evident in the x87 stack-based FPUtoo?... IOW not just the SSE2?A denorm is a denorm. I doubt if the x87 would be handled differentlythan SSE2. In any event, we are discussing what happens when the SSE2unit is used for single precision arithmetic in code written for othercomputers' SSE units, aren't we? Even if the x87 unit _is_ differentin this respect, the preferred solution would be to reprogram usingthe 2x faster SSE2 unit using double precision.
Ah so most of the audio processing software is using SSE anyway - no legacy
x87 stuff left.
Using denormals when crunching digital audio reduces thedigital-artifact noise level. For other computers than the P4,whichhandle denorms more quickly, the optimum situation is to leave thedenorms "on" to reduce the noise level. Even with only 8 bits for the biased exponent do numbers that smallreally matter? The difference between 2^-128, 2^-129, etc. and 0 seemsawful small here. I'm more used to the opposite problem though, where Iwish there was a programmable hardware zero tolerance/floor for FPnumbers.Audio, unlike video, has an extremly wide dynamic range. As you havepointed out, audio underflows do in fact occur.
Yes but if you take an example of a denormal in SP FPU as 2^-130, does the
difference between that and cut-off/truncate to 0, by setting the flag,
really introduce significant digital artifact noise? It basically changes
the floor from ~10^-45 to ~10^-38.
Rgds, George Macdonald
"Just because they're paranoid doesn't mean you're not psychotic" - Who, me??
Felger Carbon
08-27-2003, 10:58 PM
"George Macdonald" <fammacd=!SPAM^nothanks@tellurian.com> wrote in
message news:3f4d742e.57613980@news.tellurian.com... Yes but if you take an example of a denormal in SP FPU as 2^-130, does
the difference between that and cut-off/truncate to 0, by setting the
flag, really introduce significant digital artifact noise? It basically
changes the floor from ~10^-45 to ~10^-38.
Dunno. It _would_ raise the floor some, but the result would depend on
whether you're working with a rock band (all loud) or a symphony
orchestra. Keep in mind that digital audio works with lotsa little tiny
snippets of the sound. Having some snippets be _really_ small on quiet
passages should not be surprising.
In any event, the solution is for the commercial software vendors to
modify their program(s) specifically for the P4. They can do this by
using double-precision FP SSE2 with absolutely no adverse effects on the
dynamic range, nor any degradation of performance compared to
SSE-capable computers for which the software in question was doubtless
written.
Felger Carbon
08-28-2003, 12:40 PM
"Alexander Grigoriev" <alegr@earthlink.net> wrote in message
news:jgo3b.7924$3E.7065@newsread3.news.pas.earthlink.net... Are you talking about 130-bits sound or what? Denormals are at -760 dB level. If you have 24 bits source recording, your "quiet passages" won't be
less than some -140 dB. Then it's digital and analog noise, anyway.
Your logic is sound (pun intended :). However, this thread started
because denorms are in fact a problem on the P4. There is either a
problem with your logic or a problem with the real world. ;-)
Martin Atkinson-Barr wrote:
"Stacey" <fotocord@yahoo.com> wrote in message news:bhun1h$3aegh$5@ID-52908.news.uni-berlin.de... Scott Alfter wrote: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 In article <bhsa2r$2mtng$2@ID-52908.news.uni-berlin.de>, Stacey <fotocord@yahoo.com> wrote:> I personally compared an XP2400 vs a P4 2.4/533 (before the HT chips> were>out) using a vegas video 3c render test file. The P4 completed the job in>1/2 the time.
Now try comparing comparably priced processors. An Athlon XP2400+ is
around $75.What Intel processor can I get for around $74? A Pentium 4
1.4 ghz, or a Celeron 2.4 ghz?
For many uses AMD's are fine, for video editing they are the>WRONG platform to use.
Have you looked at any benchmarks for the Athlon 64? How few people
use their PCs mostly for video editing? Is it around 5% or less of PC users?
Comparing TMPGEnc (encoding from an Avisynth script) on a dual Athlon MP 1900 to a 2.8-GHz P4 didn't quite produce opposite results, but the dually was significantly faster than the P4 (don't remember the exact numbers offhand, but it wasn't a small difference...I think the dually did the job in about a third less time). Guess I should have said a single AMD system is the wrong choice. Is TMPGEnc SSE2 optimised? Vegas and it's encoder are and a P4 runs much faster. Never tried a dual MP system so can't coment if it would be faster than a P4 but have noticed most video "turn key" systems use dual AMD's on the lower end one and P4's on the higher end stuff.
LOL! Do you remember all of those links I provided to Hollywood studios
using AMD based systems? If you search the AMD website you will
find them. The Athlon 64 and Opteron have SSE2 support, so we will
probably see many more AMD processors being used for professional
video.
Like I said for most uses AMD's work great but video editing isn't their strong point.
It depends which processors we are talking about.
-- Stacey I think that Intel has been very active in getting video editing apps optimized for the P4 architecture. Especially using SSE2 which is not supported on the P3s and the Athlons. SSE2 is implemented on the Athlon64 and Opterons. Once Windows XP64 for AMD64 is out I hope that the domestic video editors will take advantage of it. Given the short time until the launch of the Athlon64s I would wait before buying any video editing setup.
MyLounge.com Site Map
Forum:
Cars,
Cell Phone,
Database,
Games,
Home Improvement,
IT,
Music,
School,
Sports,
Web Design,
Web Server,
Weight Loss
The MyLounge.com forum is intended for informational use only and should not
be relied upon and is not a substitute for any advice. The information contained
on MyLounge.com are opinions and suggestions of members and is not a representation
of the opinions of MyLounge.com. MyLounge.com does not warrant or vouch for
the accuracy, completeness or usefulness of any postings or the qualifications
of any person responding. Please consult a expert or seek the services of an
attorney in your area for more accuracy on your specific situation. Please note
that our forums also serve as mirrors to Usenet newsgroups. Many posts you see
on our forums are made by newsgroup users who may not be members of MyLounge.com
Term of Service
vBulletin v3.0.7, Copyright ©2000-2009, Jelsoft Enterprises Ltd.