*停權中*
|
Capacity. These occur when pages must be discarded and reloaded because the cache isn't big enough - making the cache bigger helps, but that makes it slower on cache hits.
Conflict. These occur when two or more pages compete for the same slot. They can be reduced by increasing the degree of associativity, but that again makes it slower on cache hits.
這兩段中告訴你的slower是講latency應該是沒錯,但是重點是這個cache大小指的是整體大小,不是L1大小,L2的功能之一才是提升hit rate,以目前來說hit rate都有90%~98%以上,一般狀況下其實更多,還要提升hit rate?再說就算hit中了,可是重點Conflict rate呢?太大的L1會造成多餘的虛功,取進來的資料不是最新的,等於白取。Conflict rate會大大降低速度,miss penalty等於是要你重新取一遍資料,於是乎整個動作全部停在那邊等這筆資料進來,這樣絕對比較慢。
Set size or associativity (2S). Direct mapping (S = 0) is simple and fast; greater associativity leads to more complexity, and thus slower access, but
tends to reduce conflict misses. More on this later.
要論成本,Intel的作法是不是造成更多成本浪費,L2做那麼大?但L1卻奇小?你看cache的速度是哪一家快?
底下這一段談到
The idea behind (b) is to increase flexibility and hence reduce conflict hits. A rule of thumb is that a k-way associative jKbyte cache is as effective as a 2k-way associative cache of size j/2Kbyte cache. However, as we've seen, such caches get slower - by reducing the number of misses, we're also reducing the speed of hits.
如果真的L1越大越好,那何不挪用L2空間來做?那所謂的Hit rate也可以靠著頻寬來補償,也就是該單位操作時脈。在K8的die size裡面,要多增加個128K的L1絕對不是難事,那何不做256K的L1配上512K的L2就好?何必弄成128K的L1跟1024K的L2。
快取做大這個動作,對於分級的cache來說,這部份要大的是L2,但是其實對於提昇效能來說有限,從512K到1024K絕對也提升不暸多少。
|