This document discusses running distributed TensorFlow jobs on the DC/OS platform. It begins with an overview of typical TensorFlow development workflows for single-node and distributed training. It then outlines some challenges of running distributed TensorFlow, such as needing to hard-code cluster configuration details. The document explains how DC/OS addresses these challenges by dynamically generating cluster configurations and handling failures gracefully. It demonstrates deploying non-distributed and distributed TensorFlow jobs on a DC/OS cluster to train an image classification model.