Skip to main content
Springer Nature Link
Account
Menu
Find a journal Publish with us Track your research
Search
Cart
  1. Home
  2. Design Automation for Embedded Systems
  3. Article

Reducing impact of cache miss stalls in embedded systems by extracting guaranteed independent instructions

  • Open access
  • Published: 21 July 2010
  • Volume 14, pages 309–326, (2010)
  • Cite this article
Download PDF

You have full access to this open access article

Design Automation for Embedded Systems Aims and scope Submit manuscript
Reducing impact of cache miss stalls in embedded systems by extracting guaranteed independent instructions
Download PDF
  • Garo Bournoutian1 &
  • Alex Orailoglu1 
  • 1415 Accesses

  • Explore all metrics

Abstract

Today, embedded processors are expected to be able to run algorithmically complex, memory-intensive applications that were originally designed and coded for general-purpose processors. As such, the impact of memory latencies on the execution time increasingly becomes evident. All the while, it is also expected that embedded processors be power-conscientious as well as of minimal area impact, as they are often used in mobile devices such as wireless smartphones and portable MP3 players. As a result, traditional methods for addressing performance and memory latencies, such as multiple issue, out-of-order execution and large, associative caches, are not aptly suited for the mobile embedded domain due to the significant area and power overhead. This paper explores a novel approach to mitigating execution delays caused by memory latencies that would otherwise not be possible in a regular in-order, single-issue embedded processor without large, power-hungry constructs like a Reorder Buffer (ROB). The concept relies on efficiently leveraging both compile-time and run-time information to safely allow non-data-dependent instructions to continue executing in the event of a memory stall. The simulation results show significant improvement in overall execution throughput of approximately 11%, while having a minimal impact on area overhead and power.

Article PDF

Download to read the full article text

Similar content being viewed by others

A composable worst case latency analysis for multi-rank DRAM devices under open row policy

Article 09 April 2016

MemPol: polling-based microsecond-scale per-core memory bandwidth regulation

Article Open access 17 June 2024

Memory Analysis and Performance Modeling for HPC Applications on Embedded Hardware via Instruction Accurate Simulation

Chapter © 2017

Explore related subjects

Discover the latest articles, books and news in related subjects, suggested using machine learning.
  • Data Structures
  • Embedded Systems
  • Optimization
  • Register-Transfer-Level Implementation
  • Control Structures and Microprogramming
  • Processor Architectures
Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

  1. Wilkes MV (2001) The memory gap and the future of high performance memories. SIGARCH Comput Archit News 29(1):2–7

    Article  Google Scholar 

  2. Lee L, Kannan S, Fridman J (2004) MPEG4 video codec on a wireless handset baseband system. In: Proc workshop media and signal processors for embedded systems and SoCs

  3. Jouppi NP (1990) Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. SIGARCH Comput Archit News 18:364–373

    Article  Google Scholar 

  4. Bournoutian G, Orailoglu A (2008) Miss reduction in embedded processors through dynamic, power-friendly cache design. In: DAC’08: proceedings of the 45th annual conference on design automation. ACM, New York, pp 304–309

    Chapter  Google Scholar 

  5. Sprangle E, Carmean D (2002) Increasing processor performance by implementing deeper pipelines. SIGARCH Comput Archit News 30(2):25–34

    Article  Google Scholar 

  6. Tomasulo RM (1967) An efficient algorithm for exploiting multiple arithmetic units. IBM J Res Develop 11:25–33

    Article  MATH  Google Scholar 

  7. Smith JE, Pleszkun AR (1985) Implementation of precise interrupts in pipelined processors. In: ISCA’85: proceedings of the 12th annual international symposium on computer architecture. IEEE Comput Soc, Los Alamitos, pp 36–44

    Google Scholar 

  8. Hily S, Seznec A (1999) Out-of-order execution may not be cost-effective on processors featuring simultaneous multithreading. In: HPCA’99: proceedings of the 5th international symposium on high performance computer architecture. IEEE Comput Soc, Los Alamitos, pp 64–67

    Chapter  Google Scholar 

  9. Grossman JP (2000) Cheap out-of-order execution using delayed issue. In: ICCD’00: proceedings of the 2000 IEEE international conference on computer design, pp 549–551

  10. Callahan D, Kennedy K, Porterfield A (1991) Software prefetching. In: ASPLOS-IV: proceedings of the 4th international conference on architectural support for programming languages and operating systems. ACM, New York, pp 40–52

    Chapter  Google Scholar 

  11. Klaiber AC, Levy HM (1991) An architecture for software-controlled data prefetching. SIGARCH Comput Archit News 19(3):43–53

    Article  Google Scholar 

  12. Mowry TC, Lam MS, Gupta A (1992) Design and evaluation of a compiler algorithm for prefetching. In: ASPLOS-V: proceedings of the 5th international conference on architectural support for programming languages and operating systems. ACM, New York, pp 62–73

    Chapter  Google Scholar 

  13. Badawy A-HA, Aggarwal A, Yeung D, Tseng C-W (2001) Evaluating the impact of memory system performance on software prefetching and locality optimizations. In: ICS’01: proceedings of the 15th international conference on supercomputing. ACM, New York, pp 486–500

    Chapter  Google Scholar 

  14. Baer J-L, Chen T-F (1991) An effective on-chip preloading scheme to reduce data access penalty. In: Supercomputing’91: proceedings of the 1991 ACM/IEEE conference on supercomputing. ACM, New York, pp 176–186

    Chapter  Google Scholar 

  15. Fu JWC, Patel JH, Janssens BL (1992) Stride directed prefetching in scalar processors. In: MICRO 25: proceedings of the 25th annual international symposium on microarchitecture. IEEE Comput Soc, Los Alamitos, pp 102–110

    Chapter  Google Scholar 

  16. Joseph D, Grunwald D (1997) Prefetching using Markov predictors. In: ISCA’97: proceedings of the 24th annual international symposium on computer architecture. ACM, New York, pp 252–263

    Chapter  Google Scholar 

  17. Park S, Shrivastava A, Paek Y (2008) Hiding cache miss penalty using priority-based execution for embedded processors. In: DATE’08: proceedings of the conference on design, automation and test in Europe, pp 1190–1195

  18. Olukotun K, Nayfeh BA, Hammond L, Wilson K, Chang K (1996) The case for a single-chip multiprocessor. SIGOPS Oper Syst Rev 30(5):2–11

    Article  Google Scholar 

  19. Austin T, Larson E, Ernst D (2002) Simplescalar: an infrastructure for computer system modeling. Computer 35(2):59–67

    Article  Google Scholar 

  20. SPEC CPU2000 Benchmarks. http://www.spec.org/cpu/

  21. Lee C, Potkonjak M, Mangione-Smith WH (1997) Mediabench: a tool for evaluating and synthesizing multimedia and communications systems. In: MICRO 30: proceedings of the 30th annual ACM/IEEE international symposium on microarchitecture. IEEE Comput Soc, Los Alamitos, pp 330–335

    Google Scholar 

  22. Guthaus MR, Ringenberg JS, Ernst D, Austin TM, Mudge T, Brown RB (2001) Mibench: a free, commercially representative embedded benchmark suite. In: WWC’01: proceedings of the IEEE international workshop on workload characterization. IEEE Comput Soc, Los Alamitos, pp 3–14

    Chapter  Google Scholar 

  23. Folegnani D, González A (2001) Energy-effective issue logic. SIGARCH Comput Archit News 29(2):230–239

    Article  Google Scholar 

  24. Wilton SJE, Jouppi NP (1996) CACTI: an enhanced cache access and cycle time model. IEEE J Solid-State Circuits 31(5):677–688

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

  1. CSE Department, University of California, San Diego, 9500 Gilman Dr. #0404, La Jolla, CA, 92093-0404, USA

    Garo Bournoutian & Alex Orailoglu

Authors
  1. Garo Bournoutian
    View author publications

    Search author on:PubMed Google Scholar

  2. Alex Orailoglu
    View author publications

    Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Garo Bournoutian.

Rights and permissions

Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Reprints and permissions

About this article

Cite this article

Bournoutian, G., Orailoglu, A. Reducing impact of cache miss stalls in embedded systems by extracting guaranteed independent instructions. Des Autom Embed Syst 14, 309–326 (2010). https://doi.org/10.1007/s10617-010-9058-y

Download citation

  • Received: 11 May 2010

  • Accepted: 24 June 2010

  • Published: 21 July 2010

  • Issue Date: September 2010

  • DOI: https://doi.org/10.1007/s10617-010-9058-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords

  • Embedded processors
  • Data cache
  • Pipeline stalls
  • Compiler assisted hardware
Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Advertisement

Search

Navigation

  • Find a journal
  • Publish with us
  • Track your research

Discover content

  • Journals A-Z
  • Books A-Z

Publish with us

  • Journal finder
  • Publish your research
  • Language editing
  • Open access publishing

Products and services

  • Our products
  • Librarians
  • Societies
  • Partners and advertisers

Our brands

  • Springer
  • Nature Portfolio
  • BMC
  • Palgrave Macmillan
  • Apress
  • Discover
  • Your US state privacy rights
  • Accessibility statement
  • Terms and conditions
  • Privacy policy
  • Help and support
  • Legal notice
  • Cancel contracts here

Not affiliated

Springer Nature

© 2025 Springer Nature