[轉貼]NV3X究竟出了什麼問題?? - PCDVD數位科技討論區

DanFang

*停權中*

加入日期: Aug 2000

您的住址: Seattle, WA

文章: 6,075

[轉貼]NV3X究竟出了什麼問題??

剛剛在Anandtech找X800的測試時才看到這篇一個月前的文章...也許有些人早看過了...但還是決定貼出來問問大家的看法...
畢竟是一個月前的文章所以對DirectX 9.0c的事隻字未提.
首頁: http://www.anandtech.com/video/showdoc.html?i=2031&p=1
結論: http://www.anandtech.com/video/showdoc.html?i=2031&p=4

Final Words
In talking about pure pixel drawing power, NV35 and NV38 didn't have it too bad as their clock speed helped push fill rate up to 1800 and 1900 Mpixels/s at their theoretic peaks. This number is simply a multiplication of how many pixels can be drawn at a time and clock speed. The NV3x architecture could also push twice as many textured pixels (if multitextureing was employed) or twice as many z / stencil operations as pixels. The problems with performance in NV3x didn't come in theoretical maximum limitations, but rather in not being able to come anywhere near theoretical maximums in the real world due to all the issues we have explored in addition to a couple other caveats. Here's a brief rundown of the bottlenecks.

If a game uses single textures rather than multitextures, texture rate is automatically cut in half. If very complex vertex or pixel shaders are used, multiple clock cycles can be spent per pixel without drawing anything. This is heavily affected both by how many pixels we can be working on at one time, as well as how able the shaders are to handle common shader code. Enabling antialiasing incurs a performance hit, as does trilinear and anisotropic filtering. There will always be some overdraw (pixels being drawn on top of other pixels), which also wastes time. This all translates into a good amount of time spent not drawing pixels on an architecture without a lot of leeway for this.

In moving to NV40 there were lots of evolutionary fixes that really helped bring the performance of the architecture up. The most significant improvements were touched on earlier: the quadrupling of the pixel pipes while doubling the number of texture units (creating a 16x1 architecture), and increasing the number of vertex shader units while adding a second math unit and more registers to pixel shaders to avoid scheduling issues. Further improvements in NV40 were made to the help eliminate hidden pixels earlier in the pipeline at the vertex shaders (which helps keep from doing unnecessary work), and optimizations were made to the anisotropic filtering engine to match ATI's method of doing things with approximated (rather than actual) distances.

In the end, it wasn't the architecture of the NV3x GPU that was flawed, but rather an accumulation of an unfortunate number smaller issues that held the architecture back from its potential.

It is important to take away from this that NV40 is very evolutionary, and that NVIDIA were pushing very hard to make this next step an incredible leap in performance. In order to do so, they have had to squeeze 222 million transistors on something in the neighborhood of a 300mm^2 die. By its very nature, graphics rendering is infinitely parallelizable, and NVIDIA has taken clear advantage of this, but it has certainly come at a cost. They needed the performance leap, and now they will be in a tough position when it comes to making money on this chip. Yields will be lower than NV3x, but retail prices are not going to move beyond the $500 mark.

On a side note, ATI will most likely not be this aggressive with their new chip. The performance of the R300 was very good with what we have seen of current games, and they just won't need to push as hard as NVIDIA did this time around. Their architecture was already well suited to the current climate, and we can expect small refinements as well as an increase in the width of their pixel pipe (which also looks like it will be 16x1). Of course, ATI's performance this time around will be increased, but just how much will have to remain a mystery for a couple more weeks.

2004-05-28, 12:56 PM #1

DanFang

*停權中*

加入日期: Aug 2000

您的住址: Seattle, WA

文章: 6,075

今天又看到一篇Anand評NV3X及NV40的文章, 看完疑惑一堆, NV3X真的有這麼爛嗎? 我自己用NV38, 許多朋友用NV35/36似乎也沒這麼不爽呀...

照他的講法不只NV30是錯誤甚至連NV35/38全是錯誤...

CPU 六月號, 28頁, Anand專欄

Anand's Corner: GeForce 6--Finally

I was talking to some developers a while back about nVidia, and at the time, its flagship chip, known as NV35. I asked the developers what they thought about NV35 and they responded with a blunt: “We’re surprised nVidia has managed to get away with they have been for so long.” Indeed nVidia did get away with a lot, although the NV3X series of chips were clearly inferior to ATI’s R3XX family, nVidia honestly got off easy. It got a flack from the community, especially when it came to that dirty word “optimization” and its drivers, but people still bought NV3X parts and nVidia is still alive and well by the end of the 12-month FX product cycle.

The company was a bit bruised, and any egos that were present has since deflated, and it was almost fitting that nVidia should feel the pain it put ATI though so many product cycles in a row, where ATI’s fastest chips just never seemed to be able to dethrone its competition. nVidia lost a lot of ground and credibility with the GeForce FX series of parts, and honestly, most nVidia owners in 2003 were still nVidia owners because they had yet to upgrade their GeForce4s, quite possibly the last good nVidia chip…until now.

Talk of NV40 (nVidia’s next generation architecture) started happening early, and it started off the same way as NV38 did before it, and the same way as NV35 did before that: “We’ve learned from our mistakes, “ “Shame on us” and so forth. The first time you hear it, you believe it, especially with a company with nVidia’s once-strong reputation, but by the third time, you’re unwilling to believe anything. Those three strikes put nVidia almost out of the credibility ballgame with NV30, NV35, and NV38, so I was dubious about NV40’s “we’ve learned from our mistakes” message. (NV3X被三振了??)

Eventually some of the architecture was revealed and it became clear that nVidia had fixed what it said it would. NV40, the first chip in the GeForce6 line (note the dropping of the FX from the name), has a number of improvements that improve performance a lot. nVidia has finally admitted that the NV38 design was really designed to render four pixels per clock, and in some cases, capable of doing eight per clock. (我讀到這面色鐵青…) For a long time nVidia skirted this issue, but now that they are no longer bound by the NV3X architecture, it’s easy to point out flaws (not so easy for the poor saps who bought NV3X cards…我不覺得我是受害者呀…有這麼嚴重嗎??) While NV3X could spit out between four to eight pixels per clock, NV40 is a pure 16 pixel per clock chip, and in some rare cases can render even more.

待續...

2004-06-14, 02:00 PM #2

DanFang

*停權中*

加入日期: Aug 2000

您的住址: Seattle, WA

文章: 6,075

nVidia included more math units on the chip and removed a handful of limitations constraining sequences of instructions to attain maximum performance out of NV40. While the NV3X architecture relied heavily on nVidia’s run-time compiler in the driver to run shaders at a reasonable speed, the NV40 architecture is inherently more flexible.

nVidia has also added a dedicated video processor on the NV40 that is about as complex as the original GeForce. (好可怕的VPU….) What can the dedicated video processor do? Finally offload CPU intensive coding from your CPU, on to a very fast and expensive GPU. The result? Imagine real-time 1080i MPEG-4 or DivX encoding done on your computer, virtually regardless of your host CPU speed. (所以以後不管轉檔軟體有沒有拍Intel馬屁, Athlon族都可以轉的很快樂??) The other benefit? nVidia says that the video processor will make it to all NV4X chips, so eventually there will be $79 card with this functionality on it. (連低階卡都會有VPU…呵呵…畫質問題應該可以獲得改善吧?)

All this comes at a cost, the NV40 is bigger than Intel’s largest desktop chip (at 220 million transistors) and you can estimate an almost Itanium-like die size. It doesn’t matter much to end users, but don’t plan on overclocking NV40 much. (超頻幅度不高…哇咧) Also, the company needs a smaller manufacturing process to bring NV40 to the mainstream. (0.09, 0.65?? 問題是連CPU廠都沒有呀…)

Performance-wise, the first NV40 based card, the $500 GeForce 6800 Ultra, is fast. At more than 1280 x 1024, it is at least twice as fast as the 9800XT. ATI is about to respond with Radeon X800, but ATI may break sweat this time around.

ps. 打這篇有夠累的, 別指望我翻整篇…

2004-06-14, 02:01 PM #3

Artx1

Master Member

加入日期: Jun 2002

您的住址: 耗電量頗高的地方.

文章: 1,959

怎麼說呢.... AnandTech比較偏ATI吧....

簡單講，這篇其實說的東西不多，只有一個關鍵：
8x1架構優於4x2架構，因為Multi-Texture應用的機會越來越少。

過去認為Multi-Texture架構比較好的原因是因為pipeline的其他hardwired component的規模並不小，所以沒辦法多做幾條pipeline。

而NVIDIA第一個multi-texture架構是NV1x(NV10除外)，從NV11/15以來就一直都是nx2架構....

但是現在，Pixel Shader需要相當規模的電晶體，相比之下pipeline所需的元件並不會很大，於是nx1架構的好處又浮現出來了。

於是NV40又回到nx1架構，在功能沒有利用到的情況下，那些電晶體就變成浪費了，於是NV35邊承擔著成本比較高的壓力，效能還跑不出來，變成吃虧又佔不到便宜。

就這麼簡單而已，上面那幾篇其實真的是沒講到什麼東西。
其實說起來Radeon初代還是2x3呢。那個更誇張。

David Kirk在最近PCWatch的訪談時提到，NV40的4/8pipe產品很快就會推出，
而且4pipe產品就會有與FX5950同等的表現。

實質上要比raw performance，NV4x的4pipe版(4x1)當然不會真的比NV3x的4pipe版(4x2)強，但是因為引擎改進的方向之故，從上面那篇Anandtech那篇文章的觀點來看，4x1架構表現與4x2架構只怕不會有多少差異。

====
至於那顆VPU....
NV10大約是23M transistor，所以這篇說NV4x的VPU聽說是同等規模？
220M裡面的23M....的確是不佔太多就是。
不過，對低階產品而言23M就不是小數目了吧....
(呃，我記得NV34也才45M....)

此文章於 2004-06-24 09:20 PM 被 Artx1 編輯.

2004-06-24, 08:40 PM #9

DanFang 停權中加入日期: Aug 2000 您的住址: Seattle, WA 文章: 6,075	與友人談論後突被提醒: 3dfx的技術到哪裡去了?? 還是nVidia唯一從3dfx學到的就是Glide式硬掰法? (16bit絕對夠用了, 看起來像22bit就行了, 破圖也沒關係) nVidia現任公關經理Brian Burke以前就是3dfx的公關.
2004-06-14, 03:21 PM #4

DanFang 停權中加入日期: Aug 2000 您的住址: Seattle, WA 文章: 6,075	怎麼無人回應, 連ATI族都沒來附和一下??
2004-06-24, 10:11 AM #5

DanFang 停權中加入日期: Aug 2000 您的住址: Seattle, WA 文章: 6,075	多謝Artx1大指教.
2004-06-29, 03:33 PM #10