This document discusses using R for large-scale data analysis on distributed data clouds. It recommends splitting large datasets into segments using MapReduce or UDFs, then building separate models for each segment in R. PMML can be used to combine the separate models into an ensemble model. The Sawmill framework is proposed to preprocess data in parallel, build models for each segment using R, and combine the models into a PMML file for deployment. Running R on each segment sequentially allows scaling to large datasets, with examples showing processing times for different numbers of segments.