Instruction-level parallelism (ILP) aims to improve performance by overlapping the execution of instructions. There are two main approaches: 1) relying on hardware to dynamically discover and exploit parallelism, and 2) relying on software to statically find parallelism at compile-time. Exploiting ILP across multiple basic blocks is needed to achieve substantial performance gains, as basic block ILP is typically small due to frequent branches. Data dependencies between instructions limit the amount of parallelism that can be exploited, as true dependencies must be preserved to maintain program correctness. Hardware and software aim to exploit parallelism while preserving program order where it affects the program outcome.