This document discusses the performance optimization of Apache Spark on high-performance computing (HPC) systems, emphasizing the impact of storage hierarchy, network latency, and I/O operations. It highlights the need for strategies like keeping files open to improve I/O performance and the use of containers for better resource management. The findings indicate that while latency and bandwidth are important, variability and network time are more critical at scale.