This document discusses techniques for improving instruction fetch throughput in superscalar processors. It begins by explaining that fetch throughput defines the maximum performance and that superscalar processors need to supply more than one instruction per cycle. It then describes some challenges to high bandwidth instruction fetching including misaligned instructions, changes in control flow, and memory latency/bandwidth limitations. The document proceeds to discuss specific techniques like aligned fetching, split cache line access, predication, collapsing buffers, trace caches, and issues related to indexing and redundancy in trace caches.