The document discusses Apache Spark's new cost-based optimizer (CBO) in version 2.2. It describes how the CBO works in two key steps:
1. It collects and propagates statistics about tables and columns to estimate the cardinality of operations like filters, joins and aggregates.
2. It calculates the estimated cost of different execution plans and selects the most optimal plan based on minimizing the estimated cost. This allows it to pick more efficient join orders and join algorithms.
The document provides examples of how the CBO improves queries on TPC-DS benchmarks by producing smaller intermediate results and faster execution times compared to the previous rule-based optimizer in Spark 2.1.