ISSN: XXXX-XXXX Volume X, Issue X, Month Year
Increasing Memory Performance Using Cache Opti-
mizations in Chip Multiprocessors
Archana.K.V
Dept of Computer Science and Engineering
BTL Institute of Technology
Bangalore, India
archanaglows@gmail.com
Abstract:
The processor-memory bandwidth in modern generation
processors is the important bottleneck due to a number of
processor cores dealing it through with the same bus/ pro-
cessor-memory interface. Caches take a significant amount
of energy in current microprocessors. To design an energy-
efficient microprocessor, it is important to optimize cache
energy economic consumption. Powerful utilization of this
resource is consequently an important view of memory hier-
archy design of multi core processors. This is presently an
important field of research on a large number of research
issues that have suggested a number of techniques to figure
out the problem. The better contribution of this theme is the
assessment of effectiveness of some of the proficiencies that
were enforced in recent chip multiprocessors. Cache optimi-
zation techniques that were named for single core proces-
sors but have not been implemented in multi core processors
are as well tested to forecast their effectiveness.
Keywords: On-chip cache hierarchy; cache optimizations
1. INTRODUCTION
The on-chip memory and its efficient usage in multi core
processors is the prime focusing of this paper. With the en-
hancing number of cores on a single chip, this scheme will
find the overall memory performance and therefore the per-
formance of the applications running on these systems. The
workload running on these systems is a mix of multiple pro-
grams. The overall performance would consequently not
only be observed from the throughput of multiple programs
just also from the performance of programs making up of
multiple parallel processes running on multiple cores of the
identical chip. The on-chip cache hierarchy needs to be de-
signed with the best feasible configuration and optimizations
to do the above purpose. Portable computing applications
have changed from conventional low performance products
much as wristwatches and calculators to high throughput and
computation intense products such as notebook computers
and cellular phones. The early portable computing applica-
tions expect high speed, however low energy consumption
because for such products longer battery life interprets to
extended use and better marketability. This paper introduces
a case study of performance and power trade-offs in design-
ing on-chip caches for the microprocessors used in portable
computing applications. Early cache studies have primarily
focused on improving performance. Studies of cache access
times and miss rates for different cache parameters (e.g.
cache size, block size, and degree of set associativity) of the
single level caches can be found in [5,8]. Corresponding
studies focusing on multi-level cache organizations can be
found in [6,7]. Studies of instruction set design and it affect
the cache performance and power consumption can be
founding [1,3]. This paper consists of five sections. Section
2 briefly describes the cache performance and energy mod-
els used in this study. Section 3 presents several experi-
mental cache organizations which are designed for either
improving performance or saving energy. Section 4 shows
the experimental results of this study. Finally, concluding
remarks are offered in Section 5.
Fig.1 Block diagram of on-chip memory hierarchy in CMPs
International Journal of Innovatory research in Engineering and Technology - IJIRET
ISSN: XXXX-XXXX Volume X, Issue X, Month Year 26
2. ANALYTICAL MODELS FOR
ON-CHIP CACHES
A formal cache can be separated into three dissimilar
components: address decoding path, cell arrays, and I/O
path. The address decoding path admits address buses and
address decoding logic. The cell arrays include read/write
circuitry, tag arrays, and the data arrays. The I/O path in-
cludes I/O pads and buses to link the address and data buses.
The on-chip cache cycle time is computed based on an ana-
lytical model demonstrated in [6,14] (which was based on
the access time model of Wada et al in [13]). This time mod-
el, based on 0.8 mm CMOS technology, gives some cache
cycle time (i.e. the minimum time required between the start
of two accesses) and cache access time (i.e. the minimum
time between the start and end of a single access) in terms of
cache size, block size, and associativity. The characteristics
of this time model is that it applies SPICE parameters to
predict the delays due to the address decoder, word-line
driver, pre-charged bit lines, sense amplifiers, data bus driv-
er, and data output drivers.
The average time for an off-chip cache access is computed
by the average off-chip access and transfer times which are
rounded to the next higher multiple of on-chip cycle time.
The on-chip cache energy expenditure is based on an ab-
stract model which believes only those cache factors that
dominate overall cache power consumption. In the address
decoding path, the capacity of the decoding logic is general-
ly less than that of the address bus. Energy expenditure of
the address buses dominate the total energy consumption of
the address decoding path. In the cell arrays, the read/write
circuitry generally does not take much power. Most energy
took in the cell arrays is due to both tag and data arrays. The
tag and data arrays in established cache designs can be im-
plemented in dynamic or static logic. In a dynamic circuit
design, word/bit lines are generally pre-charged before they
are accessed. The energy took by the pre-charged cache
word/bit lines normally dominates the overall energy con-
sumption in the cell arrays. In a stable circuit design, there
are no pre-charges on the word/bit lines. The energy ex-
penditure of the tag and data arrays right away depends on
the bit switch activities of the bit lines. In the I/O path, most
energy is consumed during bit switches of the I/O pads.
2.1 OPTIMIZATIONS IMPLEMENT-
ED SUCCESSFULLY
A number of cache optimization proficiencies that were
implemented in single core processors were successfully
carried out in multi core processors. Multi-level cache with
the modern structure of two-level has been implemented
afterwards the very first multi core processor visualized in
(Fig.1). In this form, the first-level cache is private to each
one core and coherence is preserved among them with MESI
or MOESI protocols (Villa, F.J., et al., 2005). The second-
level cache has been carried out with different design selec-
tions in several architectures. In universal, the second-level
cache is distributed between all cores with a number of op-
timizations to be talked over in this section. One of the ma-
jor introductions in the design of the second level cache is
NUCA (Non Uniform Cache Architecture) cache (Kim, C.,
et al., 2003). The cause for building NUCA organization is
that the second-level cache is induced much larger than the
first-level to fulfill the design necessities of multi-level
cache. The result is a slower access time with the enhancing
cache size. This problem is dissolved by dividing the cache
into banks. The context of a particular core is kept in a bank
physically closer to it making advance in the speed of ac-
cess. A number of variants of NUCA have developed over
the last few years with many innovations implemented in
modern generation processors.
3 EXPERIMENTAL CACHE ORGAN-
IZATIONS
3.1 Conventional Designs
Conventional cache plans include direct-mapped and set
associative. A set associative cache generally has a better hit
rate than a direct-mapped cache of the equal size, although
the access time for the set associative cache is commonly
higher than the direct-mapped cache. The number of bit line
switches in the set associative cache is normally more than
that in the direct-mapped cache, but the energy consumption
of each bit line in a set associative cache is in generally less
than that in a direct-mapped cache of the equal size.
3.2 Cache Designs for Low Power
This paper investigates three various cache design ap-
proaches to attain low power: vertical cache partitioning,
horizontal cache partitioning and Gray code addressing
Vertical Cache Partitioning
The fundamental idea of vertical cache partitioning is to
optimize the capacity of each one cache access by increase
on-chip cache hierarchy (e.g. two-level caches). Accessing a
smaller cache has lower power economic consumption since
a smaller cache has a lower load capacitance. We use block
buffering as an good example of this approach. A fundamen-
tal structure of a block buffered cache [1] is presented in
Figure 1. The block buffer itself is, in effect, some other
cache which is closer to the processor than on-chip caches.
The processor finds out if there is a block hit (i.e. the current
access data is placed at the same block of the latest access
data). If it is a hit, the data is directly read from the block
buffer and the cache is not functioned. The cache is operated
only if there is a block miss. A block buffered cache pre-
serves power by optimizing capacity of each cache access.
The effectiveness of block buffering powerfully depends on
the spatial locality of applications and the block sizes. The
higher the spatial locality of the access patterns (e.g. an in-
struction sequence), the larger the number of energy which
can be preserved by block buffering. The block size is also
very essential in block buffering. Excluding the effect to the
cache hit rate of the cache block size, a small block may
result in defining the number of energy protected by the
International Journal of Innovatory research in Engineering and Technology - IJIRET
ISSN: XXXX-XXXX Volume X, Issue X, Month Year 27
block buffered cache and a large block may result in enhanc-
ing unnecessary energy economic consumption by the un-
used data in the block.
Horizontal Cache Partitioning
The primary idea of the horizontal cache segmentation
approach is to partition the cache data memory into various
segments. Each segment can be high-powered individually.
Cache sub-banking, proposed in [11], is one horizontal
cache partition technique which partitions the data array of a
cache into different banks (called cache sub banks). Each
cache sub-bank can be accessed (powered up) separately.
Just the cache sub-bank where the applied data is located
consumes power in each cache access. A primary structure
for cache sub-banking is presented in Figure Cache
sub-banking keeps power by eliminating unnecessary ac-
cesses. The number of power saving depends on the number
of cache sub-banks. More cache
Sub-banks preserve more power. One advantage of cache
sub-banking over block buffering is that the efficient cache
hit time of a sub-bank cache can be as smooth as a conven-
tional performance-driven cache since the sub-bank selec-
tion logic is generally very simple and can be well hidden in
the cache index decoding logic. With the advantage of main-
taining the cache performance, cache sub-banking would be
very attractive to computer architects in designing energy-
efficient high-performance microprocessors.
Gray Code Addressing
Memory addressing used in a traditionalistic processor
design is usually in a 2’s complement representation. The bit
switching of the address buses when accessing consecutive
memory space is not optimal. Since there is a significant
number of energy consumed on the address buses and se-
quential memory address access are frequently seen in an
application with high spatial locality, it is essential to opti-
mize bit switching activities of the address buses for low
power caches.
4.RESULTS AND DISCUSSION
4.1. Proposed Cache Optimizations
A number of cache optimization techniques were success-
fully carried out in single core processors or single core mul-
tiprocessors but have not yet been attempted in multi core
processors. Any of these techniques are discussed in this
section with a prediction of their effectiveness in multi core
processors.
3.2 Ineffective Cache Optimizations
The optimization proficiencies introduced in Section 3.1
needs to be carried out to find out their effectiveness. A few
optimizations were tried out for multi core processors and
were found to be ineffective. As more optimizations are test-
ed, one may find more such techniques as not being efficient
for multi core processors. The succeeding paragraphs give a
brief account of the tested techniques that were not success-
ful in CMPs.
Cache affinity is a policy decision taken by the operating
system to schedule processes on particular cores. The deci-
sion is based on the activity of a process that has its context
in a cache and is expected to reuse the contents as a result of
temporal locality. After a context switch, when a process is
rescheduled, it is allocated to the same processor, assuming
that its context may still be present in the cache, reducing the
compulsory or cold start misses. This scheme has improved
the performance in conventional multiprocessors (SMPs).
On investigation of this scheme in multi core processors and
summarized in (Kazempour, et al., 2008), it was observed
that the performance improvement in multi core uniproces-
sors (CMPs) is not significant, but the performance is good
in case of multi core multiprocessors (SMPs based on
CMPs).
5. CONCLUSION AND FUTURE DI-
RECTIONS
This paper forms part of the guideline for future work for
researchers interested in optimization of memory hierarchy
for scalable multi core processors, as it presents a survey of
all such techniques proposed in recent publications. The
techniques are also presented along with the comments
about their effectiveness. A summary of all the optimization
techniques discussed in this paper is presented in Table 1.
The effect of the mechanisms and policies of operating sys-
tem on the memory hierarchy, especially the on-chip cache
hierarchy is another direction of research that can be ex-
plored. High coherence traffic gives rise to congestion at the
first level cache. Directory-based coherence protocols may
reduce the overall coherence traffic but this comes with the
cost of maintaining the directory and keeping it updated.
These and other research directions shall be explored in fu-
ture research.
6. REFERENCES
[1] Chang and Sohi, (2006),”Cooperative Caching for Chip
Multiprocessors”, Proceedings of the 33rd Annual Interna-
tional Symposium on Computer Architecture, p.264-276
[2] Chen and Kandemir, (2008), “Code Restructuring for
Improving Cache Performance in MPSoCs”, IEEE Transac-
tions on Parallel and Distributed Systems, Vol. 19, No. 9, p.
1201-1214
[3] Dybdahl , H., P. Stenström, (2007), “An Adaptive
Shared/Private NUCA Cache Partitioning Scheme for Chip
Multiprocessors”, Proceedings of the IEEE 13th Internation-
al Symposium on High Performance Computer Architecture,
p. 2-12
[4] Dybdahl., Stenström, (2006), “Enhancing Last-Level
Cache Performance by Block Bypassing and Early Miss
International Journal of Innovatory research in Engineering and Technology - IJIRET
ISSN: XXXX-XXXX Volume X, Issue X, Month Year 28
Determination”, Asia-Pacific Computer Systems Architec-
ture Conference (ACSAC), LNCS
4186, p. 52-66
[5] Core Systems, Proceedings of the 42nd International
Symposium on Micro-architecture (MICRO), p.327-336

More Related Content

PDF
AN ENERGY EFFICIENT L2 CACHE ARCHITECTURE USING WAY TAG INFORMATION UNDER WR...
PDF
Characteristics of an on chip cache on nec sx
PDF
Architecture and implementation issues of multi core processors and caching –...
PDF
Erlang Cache
DOCX
2017 18 ieee vlsi titles,IEEE 2017-18 BULK NS2 PROJECTS TITLES,IEEE 2017-18...
PDF
Coarse Grained Hybrid Reconfigurable Architecture with NoC Router for Variabl...
PDF
A novel cache resolution technique for cooperative caching in wireless mobile...
PDF
International Journal of Engineering Research and Development (IJERD)
AN ENERGY EFFICIENT L2 CACHE ARCHITECTURE USING WAY TAG INFORMATION UNDER WR...
Characteristics of an on chip cache on nec sx
Architecture and implementation issues of multi core processors and caching –...
Erlang Cache
2017 18 ieee vlsi titles,IEEE 2017-18 BULK NS2 PROJECTS TITLES,IEEE 2017-18...
Coarse Grained Hybrid Reconfigurable Architecture with NoC Router for Variabl...
A novel cache resolution technique for cooperative caching in wireless mobile...
International Journal of Engineering Research and Development (IJERD)

What's hot (15)

DOCX
Abstract The Prospect of 3D-IC
PDF
A Low Control Overhead Cluster Maintenance Scheme for Mobile Ad hoc NETworks ...
PDF
A New Efficient Cache Replacement Strategy for Named Data Networking
PDF
Peer to peer cache resolution mechanism for mobile ad hoc networks
PDF
PDF
International Journal of Computational Engineering Research(IJCER)
PDF
Mobile Data Gathering with Load Balanced Clustering and Dual Data Uploading i...
PDF
IRJET- Reduction of Dark Silicon through Efficient Power Reduction Designing ...
PDF
SECTOR TREE-BASED CLUSTERING FOR ENERGY EFFICIENT ROUTING PROTOCOL IN HETEROG...
PDF
Memory base
PDF
AN EFFECTIVE CONTROL OF HELLO PROCESS FOR ROUTING PROTOCOL IN MANETS
PDF
Time and Low Power Operation Using Embedded Dram to Gain Cell Data Retention
PDF
ENERGY EFFICIENT MULTICAST ROUTING IN MANET
PDF
LiberateMXWhitePaper
PDF
Effective Sparse Matrix Representation for the GPU Architectures
Abstract The Prospect of 3D-IC
A Low Control Overhead Cluster Maintenance Scheme for Mobile Ad hoc NETworks ...
A New Efficient Cache Replacement Strategy for Named Data Networking
Peer to peer cache resolution mechanism for mobile ad hoc networks
International Journal of Computational Engineering Research(IJCER)
Mobile Data Gathering with Load Balanced Clustering and Dual Data Uploading i...
IRJET- Reduction of Dark Silicon through Efficient Power Reduction Designing ...
SECTOR TREE-BASED CLUSTERING FOR ENERGY EFFICIENT ROUTING PROTOCOL IN HETEROG...
Memory base
AN EFFECTIVE CONTROL OF HELLO PROCESS FOR ROUTING PROTOCOL IN MANETS
Time and Low Power Operation Using Embedded Dram to Gain Cell Data Retention
ENERGY EFFICIENT MULTICAST ROUTING IN MANET
LiberateMXWhitePaper
Effective Sparse Matrix Representation for the GPU Architectures
Ad

Viewers also liked (20)

PPTX
APLIC 2014 - Sharing IS the point
PPTX
Final exam review game
PDF
Hello sp
PPTX
Adding valuethroughdatacuration
DOC
laxman final new cv
PPTX
Deber deinformatica
PDF
Ijirsm amrutha-s-efficient-complaint-registration-to-government-bodies
PPTX
Final exam review game
PPT
Improving awareness pescador san diego air & space museum
DOCX
Astrologer, Vastu & Fengshui consultant
DOC
ad web
PPT
Final exam review game (5) (2)
PPT
Interactional view student (5)
PPTX
Dangers of facebook by sani
PPTX
APLIC 2014 - Building a Technical Knowledge Hub: Applying library science to ...
PPTX
The roles of warm up
PDF
Ijirsm ashok-kumar-ps-compulsiveness-of-res tful-web-services
PPTX
APLIC 2014 - Beth Kantor on Content Curation
PPSX
La revolució del gessamí
PPTX
Brain computer interface
APLIC 2014 - Sharing IS the point
Final exam review game
Hello sp
Adding valuethroughdatacuration
laxman final new cv
Deber deinformatica
Ijirsm amrutha-s-efficient-complaint-registration-to-government-bodies
Final exam review game
Improving awareness pescador san diego air & space museum
Astrologer, Vastu & Fengshui consultant
ad web
Final exam review game (5) (2)
Interactional view student (5)
Dangers of facebook by sani
APLIC 2014 - Building a Technical Knowledge Hub: Applying library science to ...
The roles of warm up
Ijirsm ashok-kumar-ps-compulsiveness-of-res tful-web-services
APLIC 2014 - Beth Kantor on Content Curation
La revolució del gessamí
Brain computer interface
Ad

Similar to Ijiret archana-kv-increasing-memory-performance-using-cache-optimizations-in-chip-multiprocessors (20)

PDF
Dominant block guided optimal cache size estimation to maximize ipc of embedd...
PDF
Dominant block guided optimal cache size estimation to maximize ipc of embedd...
PDF
Different Approaches in Energy Efficient Cache Memory
PDF
DESIGNED DYNAMIC SEGMENTED LRU AND MODIFIED MOESI PROTOCOL FOR RING CONNECTED...
PDF
Cmp cache architectures a survey
PDF
An efficient multi-level cache system for geometrically interconnected many-...
PDF
Robust Fault Tolerance in Content Addressable Memory Interface
PDF
Memory consistency models
PDF
STUDY OF VARIOUS FACTORS AFFECTING PERFORMANCE OF MULTI-CORE PROCESSORS
PDF
Low power network on chip architectures: A survey
PDF
International Journal of Engineering Research and Development
PDF
REDUCING COMPETITIVE CACHE MISSES IN MODERN PROCESSOR ARCHITECTURES
PDF
REDUCING COMPETITIVE CACHE MISSES IN MODERN PROCESSOR ARCHITECTURES
PDF
Reducing Competitive Cache Misses in Modern Processor Architectures
PDF
WRITE BUFFER PARTITIONING WITH RENAME REGISTER CAPPING IN MULTITHREADED CORES
PDF
WRITE BUFFER PARTITIONING WITH RENAME REGISTER CAPPING IN MULTITHREADED CORES
PDF
IRJET-A Review on Trends in Multicore Processor Based on Cache and Power Diss...
PPT
Chap2 slides
PDF
IEEExeonmem
PDF
Digital standard cell library Design flow
Dominant block guided optimal cache size estimation to maximize ipc of embedd...
Dominant block guided optimal cache size estimation to maximize ipc of embedd...
Different Approaches in Energy Efficient Cache Memory
DESIGNED DYNAMIC SEGMENTED LRU AND MODIFIED MOESI PROTOCOL FOR RING CONNECTED...
Cmp cache architectures a survey
An efficient multi-level cache system for geometrically interconnected many-...
Robust Fault Tolerance in Content Addressable Memory Interface
Memory consistency models
STUDY OF VARIOUS FACTORS AFFECTING PERFORMANCE OF MULTI-CORE PROCESSORS
Low power network on chip architectures: A survey
International Journal of Engineering Research and Development
REDUCING COMPETITIVE CACHE MISSES IN MODERN PROCESSOR ARCHITECTURES
REDUCING COMPETITIVE CACHE MISSES IN MODERN PROCESSOR ARCHITECTURES
Reducing Competitive Cache Misses in Modern Processor Architectures
WRITE BUFFER PARTITIONING WITH RENAME REGISTER CAPPING IN MULTITHREADED CORES
WRITE BUFFER PARTITIONING WITH RENAME REGISTER CAPPING IN MULTITHREADED CORES
IRJET-A Review on Trends in Multicore Processor Based on Cache and Power Diss...
Chap2 slides
IEEExeonmem
Digital standard cell library Design flow

More from IJIR JOURNALS IJIRUSA (7)

PDF
Ijirsm poornima-km-a-survey-on-security-circumstances-for-mobile-cloud-computing
PDF
Ijirsm ranpreet-kaur-the-study-of-dividend policy-a-review-of-irrelevance-theory
PDF
Ijirsm choudhari-priyanka-backup-and-restore-in-smartphone-using-mobile-cloud...
PDF
Ijirsm bhargavi-ka-robust-distributed-security-using-stateful-csg-based-distr...
PDF
Ijirsm ashok-kumar-h-problems-and-solutions-infrastructure-as-service-securit...
PDF
Ijiret siri-hp-a-remote-phone-access-for-smartphone-events
PDF
Ijiret ashwini-kc-deadlock-detection-in-homogeneous-distributed-database-systems
Ijirsm poornima-km-a-survey-on-security-circumstances-for-mobile-cloud-computing
Ijirsm ranpreet-kaur-the-study-of-dividend policy-a-review-of-irrelevance-theory
Ijirsm choudhari-priyanka-backup-and-restore-in-smartphone-using-mobile-cloud...
Ijirsm bhargavi-ka-robust-distributed-security-using-stateful-csg-based-distr...
Ijirsm ashok-kumar-h-problems-and-solutions-infrastructure-as-service-securit...
Ijiret siri-hp-a-remote-phone-access-for-smartphone-events
Ijiret ashwini-kc-deadlock-detection-in-homogeneous-distributed-database-systems

Recently uploaded (20)

PDF
Trump Administration's workforce development strategy
PDF
My India Quiz Book_20210205121199924.pdf
PPTX
Computer Architecture Input Output Memory.pptx
DOC
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
PDF
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
PDF
What if we spent less time fighting change, and more time building what’s rig...
PDF
Practical Manual AGRO-233 Principles and Practices of Natural Farming
PDF
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PPTX
B.Sc. DS Unit 2 Software Engineering.pptx
PPTX
A powerpoint presentation on the Revised K-10 Science Shaping Paper
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PPTX
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
PDF
AI-driven educational solutions for real-life interventions in the Philippine...
PDF
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
PDF
Complications of Minimal Access-Surgery.pdf
PDF
Uderstanding digital marketing and marketing stratergie for engaging the digi...
PDF
Hazard Identification & Risk Assessment .pdf
PPTX
History, Philosophy and sociology of education (1).pptx
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
Trump Administration's workforce development strategy
My India Quiz Book_20210205121199924.pdf
Computer Architecture Input Output Memory.pptx
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
What if we spent less time fighting change, and more time building what’s rig...
Practical Manual AGRO-233 Principles and Practices of Natural Farming
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
B.Sc. DS Unit 2 Software Engineering.pptx
A powerpoint presentation on the Revised K-10 Science Shaping Paper
Chinmaya Tiranga quiz Grand Finale.pdf
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
AI-driven educational solutions for real-life interventions in the Philippine...
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
Complications of Minimal Access-Surgery.pdf
Uderstanding digital marketing and marketing stratergie for engaging the digi...
Hazard Identification & Risk Assessment .pdf
History, Philosophy and sociology of education (1).pptx
202450812 BayCHI UCSC-SV 20250812 v17.pptx

Ijiret archana-kv-increasing-memory-performance-using-cache-optimizations-in-chip-multiprocessors

  • 1. ISSN: XXXX-XXXX Volume X, Issue X, Month Year Increasing Memory Performance Using Cache Opti- mizations in Chip Multiprocessors Archana.K.V Dept of Computer Science and Engineering BTL Institute of Technology Bangalore, India [email protected] Abstract: The processor-memory bandwidth in modern generation processors is the important bottleneck due to a number of processor cores dealing it through with the same bus/ pro- cessor-memory interface. Caches take a significant amount of energy in current microprocessors. To design an energy- efficient microprocessor, it is important to optimize cache energy economic consumption. Powerful utilization of this resource is consequently an important view of memory hier- archy design of multi core processors. This is presently an important field of research on a large number of research issues that have suggested a number of techniques to figure out the problem. The better contribution of this theme is the assessment of effectiveness of some of the proficiencies that were enforced in recent chip multiprocessors. Cache optimi- zation techniques that were named for single core proces- sors but have not been implemented in multi core processors are as well tested to forecast their effectiveness. Keywords: On-chip cache hierarchy; cache optimizations 1. INTRODUCTION The on-chip memory and its efficient usage in multi core processors is the prime focusing of this paper. With the en- hancing number of cores on a single chip, this scheme will find the overall memory performance and therefore the per- formance of the applications running on these systems. The workload running on these systems is a mix of multiple pro- grams. The overall performance would consequently not only be observed from the throughput of multiple programs just also from the performance of programs making up of multiple parallel processes running on multiple cores of the identical chip. The on-chip cache hierarchy needs to be de- signed with the best feasible configuration and optimizations to do the above purpose. Portable computing applications have changed from conventional low performance products much as wristwatches and calculators to high throughput and computation intense products such as notebook computers and cellular phones. The early portable computing applica- tions expect high speed, however low energy consumption because for such products longer battery life interprets to extended use and better marketability. This paper introduces a case study of performance and power trade-offs in design- ing on-chip caches for the microprocessors used in portable computing applications. Early cache studies have primarily focused on improving performance. Studies of cache access times and miss rates for different cache parameters (e.g. cache size, block size, and degree of set associativity) of the single level caches can be found in [5,8]. Corresponding studies focusing on multi-level cache organizations can be found in [6,7]. Studies of instruction set design and it affect the cache performance and power consumption can be founding [1,3]. This paper consists of five sections. Section 2 briefly describes the cache performance and energy mod- els used in this study. Section 3 presents several experi- mental cache organizations which are designed for either improving performance or saving energy. Section 4 shows the experimental results of this study. Finally, concluding remarks are offered in Section 5. Fig.1 Block diagram of on-chip memory hierarchy in CMPs
  • 2. International Journal of Innovatory research in Engineering and Technology - IJIRET ISSN: XXXX-XXXX Volume X, Issue X, Month Year 26 2. ANALYTICAL MODELS FOR ON-CHIP CACHES A formal cache can be separated into three dissimilar components: address decoding path, cell arrays, and I/O path. The address decoding path admits address buses and address decoding logic. The cell arrays include read/write circuitry, tag arrays, and the data arrays. The I/O path in- cludes I/O pads and buses to link the address and data buses. The on-chip cache cycle time is computed based on an ana- lytical model demonstrated in [6,14] (which was based on the access time model of Wada et al in [13]). This time mod- el, based on 0.8 mm CMOS technology, gives some cache cycle time (i.e. the minimum time required between the start of two accesses) and cache access time (i.e. the minimum time between the start and end of a single access) in terms of cache size, block size, and associativity. The characteristics of this time model is that it applies SPICE parameters to predict the delays due to the address decoder, word-line driver, pre-charged bit lines, sense amplifiers, data bus driv- er, and data output drivers. The average time for an off-chip cache access is computed by the average off-chip access and transfer times which are rounded to the next higher multiple of on-chip cycle time. The on-chip cache energy expenditure is based on an ab- stract model which believes only those cache factors that dominate overall cache power consumption. In the address decoding path, the capacity of the decoding logic is general- ly less than that of the address bus. Energy expenditure of the address buses dominate the total energy consumption of the address decoding path. In the cell arrays, the read/write circuitry generally does not take much power. Most energy took in the cell arrays is due to both tag and data arrays. The tag and data arrays in established cache designs can be im- plemented in dynamic or static logic. In a dynamic circuit design, word/bit lines are generally pre-charged before they are accessed. The energy took by the pre-charged cache word/bit lines normally dominates the overall energy con- sumption in the cell arrays. In a stable circuit design, there are no pre-charges on the word/bit lines. The energy ex- penditure of the tag and data arrays right away depends on the bit switch activities of the bit lines. In the I/O path, most energy is consumed during bit switches of the I/O pads. 2.1 OPTIMIZATIONS IMPLEMENT- ED SUCCESSFULLY A number of cache optimization proficiencies that were implemented in single core processors were successfully carried out in multi core processors. Multi-level cache with the modern structure of two-level has been implemented afterwards the very first multi core processor visualized in (Fig.1). In this form, the first-level cache is private to each one core and coherence is preserved among them with MESI or MOESI protocols (Villa, F.J., et al., 2005). The second- level cache has been carried out with different design selec- tions in several architectures. In universal, the second-level cache is distributed between all cores with a number of op- timizations to be talked over in this section. One of the ma- jor introductions in the design of the second level cache is NUCA (Non Uniform Cache Architecture) cache (Kim, C., et al., 2003). The cause for building NUCA organization is that the second-level cache is induced much larger than the first-level to fulfill the design necessities of multi-level cache. The result is a slower access time with the enhancing cache size. This problem is dissolved by dividing the cache into banks. The context of a particular core is kept in a bank physically closer to it making advance in the speed of ac- cess. A number of variants of NUCA have developed over the last few years with many innovations implemented in modern generation processors. 3 EXPERIMENTAL CACHE ORGAN- IZATIONS 3.1 Conventional Designs Conventional cache plans include direct-mapped and set associative. A set associative cache generally has a better hit rate than a direct-mapped cache of the equal size, although the access time for the set associative cache is commonly higher than the direct-mapped cache. The number of bit line switches in the set associative cache is normally more than that in the direct-mapped cache, but the energy consumption of each bit line in a set associative cache is in generally less than that in a direct-mapped cache of the equal size. 3.2 Cache Designs for Low Power This paper investigates three various cache design ap- proaches to attain low power: vertical cache partitioning, horizontal cache partitioning and Gray code addressing Vertical Cache Partitioning The fundamental idea of vertical cache partitioning is to optimize the capacity of each one cache access by increase on-chip cache hierarchy (e.g. two-level caches). Accessing a smaller cache has lower power economic consumption since a smaller cache has a lower load capacitance. We use block buffering as an good example of this approach. A fundamen- tal structure of a block buffered cache [1] is presented in Figure 1. The block buffer itself is, in effect, some other cache which is closer to the processor than on-chip caches. The processor finds out if there is a block hit (i.e. the current access data is placed at the same block of the latest access data). If it is a hit, the data is directly read from the block buffer and the cache is not functioned. The cache is operated only if there is a block miss. A block buffered cache pre- serves power by optimizing capacity of each cache access. The effectiveness of block buffering powerfully depends on the spatial locality of applications and the block sizes. The higher the spatial locality of the access patterns (e.g. an in- struction sequence), the larger the number of energy which can be preserved by block buffering. The block size is also very essential in block buffering. Excluding the effect to the cache hit rate of the cache block size, a small block may result in defining the number of energy protected by the
  • 3. International Journal of Innovatory research in Engineering and Technology - IJIRET ISSN: XXXX-XXXX Volume X, Issue X, Month Year 27 block buffered cache and a large block may result in enhanc- ing unnecessary energy economic consumption by the un- used data in the block. Horizontal Cache Partitioning The primary idea of the horizontal cache segmentation approach is to partition the cache data memory into various segments. Each segment can be high-powered individually. Cache sub-banking, proposed in [11], is one horizontal cache partition technique which partitions the data array of a cache into different banks (called cache sub banks). Each cache sub-bank can be accessed (powered up) separately. Just the cache sub-bank where the applied data is located consumes power in each cache access. A primary structure for cache sub-banking is presented in Figure Cache sub-banking keeps power by eliminating unnecessary ac- cesses. The number of power saving depends on the number of cache sub-banks. More cache Sub-banks preserve more power. One advantage of cache sub-banking over block buffering is that the efficient cache hit time of a sub-bank cache can be as smooth as a conven- tional performance-driven cache since the sub-bank selec- tion logic is generally very simple and can be well hidden in the cache index decoding logic. With the advantage of main- taining the cache performance, cache sub-banking would be very attractive to computer architects in designing energy- efficient high-performance microprocessors. Gray Code Addressing Memory addressing used in a traditionalistic processor design is usually in a 2’s complement representation. The bit switching of the address buses when accessing consecutive memory space is not optimal. Since there is a significant number of energy consumed on the address buses and se- quential memory address access are frequently seen in an application with high spatial locality, it is essential to opti- mize bit switching activities of the address buses for low power caches. 4.RESULTS AND DISCUSSION 4.1. Proposed Cache Optimizations A number of cache optimization techniques were success- fully carried out in single core processors or single core mul- tiprocessors but have not yet been attempted in multi core processors. Any of these techniques are discussed in this section with a prediction of their effectiveness in multi core processors. 3.2 Ineffective Cache Optimizations The optimization proficiencies introduced in Section 3.1 needs to be carried out to find out their effectiveness. A few optimizations were tried out for multi core processors and were found to be ineffective. As more optimizations are test- ed, one may find more such techniques as not being efficient for multi core processors. The succeeding paragraphs give a brief account of the tested techniques that were not success- ful in CMPs. Cache affinity is a policy decision taken by the operating system to schedule processes on particular cores. The deci- sion is based on the activity of a process that has its context in a cache and is expected to reuse the contents as a result of temporal locality. After a context switch, when a process is rescheduled, it is allocated to the same processor, assuming that its context may still be present in the cache, reducing the compulsory or cold start misses. This scheme has improved the performance in conventional multiprocessors (SMPs). On investigation of this scheme in multi core processors and summarized in (Kazempour, et al., 2008), it was observed that the performance improvement in multi core uniproces- sors (CMPs) is not significant, but the performance is good in case of multi core multiprocessors (SMPs based on CMPs). 5. CONCLUSION AND FUTURE DI- RECTIONS This paper forms part of the guideline for future work for researchers interested in optimization of memory hierarchy for scalable multi core processors, as it presents a survey of all such techniques proposed in recent publications. The techniques are also presented along with the comments about their effectiveness. A summary of all the optimization techniques discussed in this paper is presented in Table 1. The effect of the mechanisms and policies of operating sys- tem on the memory hierarchy, especially the on-chip cache hierarchy is another direction of research that can be ex- plored. High coherence traffic gives rise to congestion at the first level cache. Directory-based coherence protocols may reduce the overall coherence traffic but this comes with the cost of maintaining the directory and keeping it updated. These and other research directions shall be explored in fu- ture research. 6. REFERENCES [1] Chang and Sohi, (2006),”Cooperative Caching for Chip Multiprocessors”, Proceedings of the 33rd Annual Interna- tional Symposium on Computer Architecture, p.264-276 [2] Chen and Kandemir, (2008), “Code Restructuring for Improving Cache Performance in MPSoCs”, IEEE Transac- tions on Parallel and Distributed Systems, Vol. 19, No. 9, p. 1201-1214 [3] Dybdahl , H., P. Stenström, (2007), “An Adaptive Shared/Private NUCA Cache Partitioning Scheme for Chip Multiprocessors”, Proceedings of the IEEE 13th Internation- al Symposium on High Performance Computer Architecture, p. 2-12 [4] Dybdahl., Stenström, (2006), “Enhancing Last-Level Cache Performance by Block Bypassing and Early Miss
  • 4. International Journal of Innovatory research in Engineering and Technology - IJIRET ISSN: XXXX-XXXX Volume X, Issue X, Month Year 28 Determination”, Asia-Pacific Computer Systems Architec- ture Conference (ACSAC), LNCS 4186, p. 52-66 [5] Core Systems, Proceedings of the 42nd International Symposium on Micro-architecture (MICRO), p.327-336