PCDVD數位科技討論區 - 瀏覽單個文章 - [轉貼]NV3X究竟出了什麼問題??

剛剛在Anandtech找X800的測試時才看到這篇一個月前的文章...也許有些人早看過了...但還是決定貼出來問問大家的看法...
畢竟是一個月前的文章所以對DirectX 9.0c的事隻字未提.
首頁: http://www.anandtech.com/video/showdoc.html?i=2031&p=1
結論: http://www.anandtech.com/video/showdoc.html?i=2031&p=4

Final Words
In talking about pure pixel drawing power, NV35 and NV38 didn't have it too bad as their clock speed helped push fill rate up to 1800 and 1900 Mpixels/s at their theoretic peaks. This number is simply a multiplication of how many pixels can be drawn at a time and clock speed. The NV3x architecture could also push twice as many textured pixels (if multitextureing was employed) or twice as many z / stencil operations as pixels. The problems with performance in NV3x didn't come in theoretical maximum limitations, but rather in not being able to come anywhere near theoretical maximums in the real world due to all the issues we have explored in addition to a couple other caveats. Here's a brief rundown of the bottlenecks.

If a game uses single textures rather than multitextures, texture rate is automatically cut in half. If very complex vertex or pixel shaders are used, multiple clock cycles can be spent per pixel without drawing anything. This is heavily affected both by how many pixels we can be working on at one time, as well as how able the shaders are to handle common shader code. Enabling antialiasing incurs a performance hit, as does trilinear and anisotropic filtering. There will always be some overdraw (pixels being drawn on top of other pixels), which also wastes time. This all translates into a good amount of time spent not drawing pixels on an architecture without a lot of leeway for this.

In moving to NV40 there were lots of evolutionary fixes that really helped bring the performance of the architecture up. The most significant improvements were touched on earlier: the quadrupling of the pixel pipes while doubling the number of texture units (creating a 16x1 architecture), and increasing the number of vertex shader units while adding a second math unit and more registers to pixel shaders to avoid scheduling issues. Further improvements in NV40 were made to the help eliminate hidden pixels earlier in the pipeline at the vertex shaders (which helps keep from doing unnecessary work), and optimizations were made to the anisotropic filtering engine to match ATI's method of doing things with approximated (rather than actual) distances.

In the end, it wasn't the architecture of the NV3x GPU that was flawed, but rather an accumulation of an unfortunate number smaller issues that held the architecture back from its potential.

It is important to take away from this that NV40 is very evolutionary, and that NVIDIA were pushing very hard to make this next step an incredible leap in performance. In order to do so, they have had to squeeze 222 million transistors on something in the neighborhood of a 300mm^2 die. By its very nature, graphics rendering is infinitely parallelizable, and NVIDIA has taken clear advantage of this, but it has certainly come at a cost. They needed the performance leap, and now they will be in a tough position when it comes to making money on this chip. Yields will be lower than NV3x, but retail prices are not going to move beyond the $500 mark.

On a side note, ATI will most likely not be this aggressive with their new chip. The performance of the R300 was very good with what we have seen of current games, and they just won't need to push as hard as NVIDIA did this time around. Their architecture was already well suited to the current climate, and we can expect small refinements as well as an increase in the width of their pixel pipe (which also looks like it will be 16x1). Of course, ATI's performance this time around will be increased, but just how much will have to remain a mystery for a couple more weeks.