All issues
- 2024 Vol. 16
- 2023 Vol. 15
- 2022 Vol. 14
- 2021 Vol. 13
- 2020 Vol. 12
- 2019 Vol. 11
- 2018 Vol. 10
- 2017 Vol. 9
- 2016 Vol. 8
- 2015 Vol. 7
- 2014 Vol. 6
- 2013 Vol. 5
- 2012 Vol. 4
- 2011 Vol. 3
- 2010 Vol. 2
- 2009 Vol. 1
Reducing miss rate in a non-inclusive cache with inclusive directory of a chip multiprocessor
Although the era of exponential performance growth in computer chips has ended, processor core numbers have reached 16 or more even in general-purpose desktop CPUs. As DRAM throughput is unable to keep pace with this computing power growth, CPU designers need to find ways of lowering memory traffic per instruction. The straightforward way to do this is to reduce the miss rate of the last-level cache. Assuming “non-inclusive cache, inclusive directory” (NCID) scheme already implemented, three ways of reducing the cache miss rate further were studied.
The first is to achieve more uniform usage of cache banks and sets by employing hash-based interleaving and indexing. In the experiments in SPEC CPU2017 refrate tests, even the simplest XOR-based hash functions demonstrated a performance increase of 3.2%, 9.1%, and 8.2% for CPU configurations with 16, 32, and 64 cores and last-level cache banks, comparable to the results of more complex matrix-, division- and CRC-based functions.
The second optimisation is aimed at reducing replication at different cache levels by means of automatically switching to the exclusive scheme when it appears optimal. A known scheme of this type, FLEXclusion, was modified for use in NCID caches and showed an average performance gain of 3.8%, 5.4 %, and 7.9% for 16-, 32-, and 64-core configurations.
The third optimisation is to increase the effective cache capacity using compression. The compression rate of the inexpensive and fast BDI*-HL (Base-Delta-Immediate Modified, Half-Line) algorithm, designed for NCID, was measured, and the respective increase in cache capacity yielded roughly 1% of the average performance increase.
All three optimisations can be combined and demonstrated a performance gain of 7.7%, 16% and 19% for CPU configurations with 16, 32, and 64 cores and banks, respectively.
Indexed in Scopus
Full-text version of the journal is also available on the web site of the scientific electronic library eLIBRARY.RU
The journal is included in the Russian Science Citation Index
The journal is included in the RSCI
International Interdisciplinary Conference "Mathematics. Computing. Education"