1) StumbleUpon uses open source tools like Kafka, HBase, Hive and Pig to build a scalable big data infrastructure to process large amounts of data from its services in real-time and batch.
2) Data is collected from various services using Kafka and stored in HBase for real-time analytics. Batch processing is done using Pig and data is loaded into Hive for ad-hoc querying.
3) The infrastructure powers various applications like recommendations, ads and business intelligence dashboards.