Building Data Labs
in the Cloud
Alex Bordei
Head of Product Management at Bigstep
BDW16 London - Alex Bordei, Bigstep - Building Data Labs in the Cloud
BDW16 London - Alex Bordei, Bigstep - Building Data Labs in the Cloud
BDW16 London - Alex Bordei, Bigstep - Building Data Labs in the Cloud
BDW16 London - Alex Bordei, Bigstep - Building Data Labs in the Cloud
BDW16 London - Alex Bordei, Bigstep - Building Data Labs in the Cloud
BDW16 London - Alex Bordei, Bigstep - Building Data Labs in the Cloud
The Data Laboratory
BDW16 London - Alex Bordei, Bigstep - Building Data Labs in the Cloud
BDW16 London - Alex Bordei, Bigstep - Building Data Labs in the Cloud
BDW16 London - Alex Bordei, Bigstep - Building Data Labs in the Cloud
BDW16 London - Alex Bordei, Bigstep - Building Data Labs in the Cloud
BDW16 London - Alex Bordei, Bigstep - Building Data Labs in the Cloud
Connecting to on-premise services - VPN
Connecting to on-premise services - Targeted Firewall
Identity Services integration - On-premises realm
Identity Services integration - cloud realm
Joined single sign on (for two factor auth)
• Hadoop’s Encryption (with KMS)
• Third party Encryption with on-premise
HSM
• Cloud providers offer HSMs as well:
CloudHSM
Eg: Zettaset
Encryption
BDW16 London - Alex Bordei, Bigstep - Building Data Labs in the Cloud
HDFS+Spark
Spark can do:
1. data science at scale
2. SQL on Hadoop
3. ETL
4. large scale data
processing
5. machine learning
6. graph processing
Multi-Context
Architecture
- Rip and replace maintenance
model
- Multiplexing for resource
utilization efficiency
Realtime = production
The system will need to provide:
- performance
- stability
- online serviceability
- fault tolerance
BDW16 London - Alex Bordei, Bigstep - Building Data Labs in the Cloud
@alexandrubordei
@bigstepinc
alex@bigstep.com
I’m all ears

More Related Content

PDF
BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient C...
PDF
BDW16 London - William Vambenepe, Google - 3rd Generation Data Platform
PDF
Accelerate Analytics and ML in the Hybrid Cloud Era
PDF
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
PDF
The Future of Computing is Distributed
PDF
The Pandemic Changes Everything, the Need for Speed and Resiliency
PDF
Deep Learning in the Cloud at Scale: A Data Orchestration Story
PDF
Reducing large S3 API costs using Alluxio at Datasapiens
BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient C...
BDW16 London - William Vambenepe, Google - 3rd Generation Data Platform
Accelerate Analytics and ML in the Hybrid Cloud Era
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
The Future of Computing is Distributed
The Pandemic Changes Everything, the Need for Speed and Resiliency
Deep Learning in the Cloud at Scale: A Data Orchestration Story
Reducing large S3 API costs using Alluxio at Datasapiens

What's hot (20)

PDF
Delivering Data Science to the Business
PDF
Hybrid Data Lake Architecture with Presto & Spark in the cloud accessing on-p...
PDF
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
PPTX
Real-Time Analytics in Transactional Applications by Brian Bulkowski
PDF
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | Qubole
PDF
Apache Spark vs Apache Spark: An On-Prem Comparison of Databricks and Open-So...
PDF
A Gentle Introduction to GPU Computing by Armen Donigian
PDF
Alluxio Architecture and Performance
PDF
The Practice of Presto & Alluxio in E-Commerce Big Data Platform
PDF
StorageQuery: federated querying on object stores, powered by Alluxio and Presto
PDF
Build Real-Time Applications with Databricks Streaming
PDF
Alluxio - Virtual Unified File System
PPTX
Octo and the DevSecOps Evolution at Oracle by Ian Van Hoven
PDF
Data Orchestration for AI, Big Data, and Cloud
PDF
Data Privacy with Apache Spark: Defensive and Offensive Approaches
PDF
Spark and Hadoop at Production Scale-(Anil Gadre, MapR)
PDF
Intro to databricks delta lake
PPT
Google App Engine
PDF
Data Science Across Data Sources with Apache Arrow
PDF
An Engineering Approach to Database Evaluations
Delivering Data Science to the Business
Hybrid Data Lake Architecture with Presto & Spark in the cloud accessing on-p...
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Real-Time Analytics in Transactional Applications by Brian Bulkowski
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | Qubole
Apache Spark vs Apache Spark: An On-Prem Comparison of Databricks and Open-So...
A Gentle Introduction to GPU Computing by Armen Donigian
Alluxio Architecture and Performance
The Practice of Presto & Alluxio in E-Commerce Big Data Platform
StorageQuery: federated querying on object stores, powered by Alluxio and Presto
Build Real-Time Applications with Databricks Streaming
Alluxio - Virtual Unified File System
Octo and the DevSecOps Evolution at Oracle by Ian Van Hoven
Data Orchestration for AI, Big Data, and Cloud
Data Privacy with Apache Spark: Defensive and Offensive Approaches
Spark and Hadoop at Production Scale-(Anil Gadre, MapR)
Intro to databricks delta lake
Google App Engine
Data Science Across Data Sources with Apache Arrow
An Engineering Approach to Database Evaluations
Ad

Viewers also liked (20)

PDF
BDW16 London - Amjad Zaim, Cognitro Analytics: How Deep is Your Learning
PDF
BDW16 London - Marius Boeru, Bigstep - How to Automate Big Data with Ansible
PDF
BDW16 London - John Belchamber, Telefonica - New Data, New Strategies, New Op...
PDF
BDW16 London - Roland Major, Transport for London - Cloud Search Secured
PDF
BDW16 London - Charlie Ballard, TripAdvisor - TripAdvisor and Constant Change...
PDF
BDW16 London - Wael Elrifai, Pentaho - Big Data-Driven Innovatiom
PDF
BDW16 London - John Callan, Boxever - Data and Analytics - The Fuel Your Bran...
PPTX
BDW16 London - Josh Partridge, Shazam - How Labels, Radio Stations and Brand...
PDF
BDW16 London - Mishal Patel, NHS - Modernising Routine Breast Cancer Using Bi...
PDF
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
PDF
BDW16 London - Chris von Csefalvay, Helioserv - Cats and What They Tell us Ab...
PDF
BDW16 London - Jonny Voon, Innovate UK - Smart Cities and the Buzz Word Bingo
PDF
BDW16 London - Ingrid Funie, Imperial College London - Machine Learning and F...
PDF
BDW16 London - Vojta Rocek, Trologic - Challenging Big Data
PDF
BDW16 London - Nondas Sourlas, Bupa - Big Data in Healthcare
PDF
BDW16 London - Mark van Rijmenam, Datafloq - Big Data is Dead, Long Live Big ...
PDF
BDW16 London - Rob Anderson, MapR - Big Data and Everyday Lives
PDF
BDW16 London - Harry Powell & Raffael Strassnig, Barclays UK - Graph-Based Re...
PPTX
ETL Metadata Injection with Pentaho Data Integration
PDF
SugarCRM Enterprise Development Virtual Appliance
BDW16 London - Amjad Zaim, Cognitro Analytics: How Deep is Your Learning
BDW16 London - Marius Boeru, Bigstep - How to Automate Big Data with Ansible
BDW16 London - John Belchamber, Telefonica - New Data, New Strategies, New Op...
BDW16 London - Roland Major, Transport for London - Cloud Search Secured
BDW16 London - Charlie Ballard, TripAdvisor - TripAdvisor and Constant Change...
BDW16 London - Wael Elrifai, Pentaho - Big Data-Driven Innovatiom
BDW16 London - John Callan, Boxever - Data and Analytics - The Fuel Your Bran...
BDW16 London - Josh Partridge, Shazam - How Labels, Radio Stations and Brand...
BDW16 London - Mishal Patel, NHS - Modernising Routine Breast Cancer Using Bi...
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
BDW16 London - Chris von Csefalvay, Helioserv - Cats and What They Tell us Ab...
BDW16 London - Jonny Voon, Innovate UK - Smart Cities and the Buzz Word Bingo
BDW16 London - Ingrid Funie, Imperial College London - Machine Learning and F...
BDW16 London - Vojta Rocek, Trologic - Challenging Big Data
BDW16 London - Nondas Sourlas, Bupa - Big Data in Healthcare
BDW16 London - Mark van Rijmenam, Datafloq - Big Data is Dead, Long Live Big ...
BDW16 London - Rob Anderson, MapR - Big Data and Everyday Lives
BDW16 London - Harry Powell & Raffael Strassnig, Barclays UK - Graph-Based Re...
ETL Metadata Injection with Pentaho Data Integration
SugarCRM Enterprise Development Virtual Appliance
Ad

Similar to BDW16 London - Alex Bordei, Bigstep - Building Data Labs in the Cloud (20)

PDF
Slides: Accelerating Queries on Cloud Data Lakes
PDF
Accelerate Analytics and ML in the Hybrid Cloud Era
PDF
Enabling big data & AI workloads on the object store at DBS
PDF
Data Orchestration Platform for the Cloud
PDF
From limited Hadoop compute capacity to increased data scientist efficiency
PDF
Data Orchestration for the Hybrid Cloud Era
PDF
Accelerate Analytics and ML in the Hybrid Cloud Era
PPTX
Big Data in the Cloud - The What, Why and How from the Experts
PDF
Bridging to a hybrid cloud data services architecture
PPTX
Hadoop in the Cloud - The what, why and how from the experts
PDF
How the Development Bank of Singapore solves on-prem compute capacity challen...
PDF
Accelerate Digital Transformation with IBM Cloud Private
PPTX
Cloud and its job oppertunities
PDF
Ibm integrated analytics system
PDF
Solving enterprise challenges through scale out storage & big compute final
DOCX
Rameez Rangrez_Hadoop_Admin
PPTX
Hadoop in the Cloud – The What, Why and How from the Experts
PPTX
EMC Isilon Database Converged deck
PPTX
Postgres for Digital Transformation: NoSQL Features, Replication, FDW & More
PDF
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
Slides: Accelerating Queries on Cloud Data Lakes
Accelerate Analytics and ML in the Hybrid Cloud Era
Enabling big data & AI workloads on the object store at DBS
Data Orchestration Platform for the Cloud
From limited Hadoop compute capacity to increased data scientist efficiency
Data Orchestration for the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud Era
Big Data in the Cloud - The What, Why and How from the Experts
Bridging to a hybrid cloud data services architecture
Hadoop in the Cloud - The what, why and how from the experts
How the Development Bank of Singapore solves on-prem compute capacity challen...
Accelerate Digital Transformation with IBM Cloud Private
Cloud and its job oppertunities
Ibm integrated analytics system
Solving enterprise challenges through scale out storage & big compute final
Rameez Rangrez_Hadoop_Admin
Hadoop in the Cloud – The What, Why and How from the Experts
EMC Isilon Database Converged deck
Postgres for Digital Transformation: NoSQL Features, Replication, FDW & More
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...

More from Big Data Week (10)

PPTX
BDW17 London - Edward Kibardin - Mitie PLC - Learning and Topological Data A...
PPTX
BDWW17 London - Steve Bradbury, GRSC - Big Data to the Rescue: A Fraud Case S...
PDF
BDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal Inference
PPTX
BDW17 London - Rita Simoes, Boehringer Ingelheim - Big Data in Pharma: Sittin...
PPTX
BDW17 London - Mick Ridley, Exterion Media & Dale Campbell , TfL - Transformi...
PDF
BDW17 London - Abed Ajraou - First Utility - Putting Data Science in your Bus...
PPTX
BDW17 London - Steve Bradbury - GRSC - Making Sense of the Chaos of Data
PPTX
BDW17 London - Andy Boura - Thomson Reuters - Does Big Data Have to Mean Big ...
PPTX
BDW17 London - Tom Woolrich, Financial Times - What Does Big Data Mean for th...
PPTX
BDW17 London - Andrew Fryer, Microsoft - Everybody Needs a Bit of Science in ...
BDW17 London - Edward Kibardin - Mitie PLC - Learning and Topological Data A...
BDWW17 London - Steve Bradbury, GRSC - Big Data to the Rescue: A Fraud Case S...
BDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal Inference
BDW17 London - Rita Simoes, Boehringer Ingelheim - Big Data in Pharma: Sittin...
BDW17 London - Mick Ridley, Exterion Media & Dale Campbell , TfL - Transformi...
BDW17 London - Abed Ajraou - First Utility - Putting Data Science in your Bus...
BDW17 London - Steve Bradbury - GRSC - Making Sense of the Chaos of Data
BDW17 London - Andy Boura - Thomson Reuters - Does Big Data Have to Mean Big ...
BDW17 London - Tom Woolrich, Financial Times - What Does Big Data Mean for th...
BDW17 London - Andrew Fryer, Microsoft - Everybody Needs a Bit of Science in ...

Recently uploaded (20)

PDF
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
PPTX
Module 1 Introduction to Web Programming .pptx
PDF
Early detection and classification of bone marrow changes in lumbar vertebrae...
PPTX
Configure Apache Mutual Authentication
PDF
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf
PPTX
Custom Battery Pack Design Considerations for Performance and Safety
PDF
Dell Pro Micro: Speed customer interactions, patient processing, and learning...
PDF
Statistics on Ai - sourced from AIPRM.pdf
PDF
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
PDF
sbt 2.0: go big (Scala Days 2025 edition)
PDF
4 layer Arch & Reference Arch of IoT.pdf
PPTX
agenticai-neweraofintelligence-250529192801-1b5e6870.pptx
PDF
Enhancing plagiarism detection using data pre-processing and machine learning...
PDF
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf
PDF
CXOs-Are-you-still-doing-manual-DevOps-in-the-age-of-AI.pdf
PDF
Convolutional neural network based encoder-decoder for efficient real-time ob...
PDF
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf
DOCX
Basics of Cloud Computing - Cloud Ecosystem
PPTX
Training Program for knowledge in solar cell and solar industry
PDF
Auditboard EB SOX Playbook 2023 edition.
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
Module 1 Introduction to Web Programming .pptx
Early detection and classification of bone marrow changes in lumbar vertebrae...
Configure Apache Mutual Authentication
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf
Custom Battery Pack Design Considerations for Performance and Safety
Dell Pro Micro: Speed customer interactions, patient processing, and learning...
Statistics on Ai - sourced from AIPRM.pdf
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
sbt 2.0: go big (Scala Days 2025 edition)
4 layer Arch & Reference Arch of IoT.pdf
agenticai-neweraofintelligence-250529192801-1b5e6870.pptx
Enhancing plagiarism detection using data pre-processing and machine learning...
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf
CXOs-Are-you-still-doing-manual-DevOps-in-the-age-of-AI.pdf
Convolutional neural network based encoder-decoder for efficient real-time ob...
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf
Basics of Cloud Computing - Cloud Ecosystem
Training Program for knowledge in solar cell and solar industry
Auditboard EB SOX Playbook 2023 edition.

BDW16 London - Alex Bordei, Bigstep - Building Data Labs in the Cloud