SlideShare a Scribd company logo
Cloudian®
S3 Cloud Storage Platform
Case Study:
Implementing Hadoop and Elastic Map
Reduce on Scale-out Object Storage
Paul Turner
Cloudian Inc.
June 11th 2014
About Cloudian
• Hybrid cloud storage startup in Silicon Valley
– Strong venture backing: Goldman Sachs, Intel Capital
– Solid management with storage, big data, enterprise software and telco
expertise
– 50 employees, offices in Foster City, Japan and China
• Production hardened product
• Target market: mid- to large-enterprises & regional service providers
• GTM: traditional storage distribution/VARs
CLOUDIAN PARTNERS
The Challenge
• Business problem = Analysis of log data from our
customer systems to improve support (classic
‘Internet of Things’ content)
• Existing system required transformation of the data
into HDFS for analytics (slow and costly)
Goal : Reduce cost and provide faster results
6/16/2014 3
Use Case : Support Analytics
• Compare system statistics and usage
patterns to previous normal results
6/16/2014 4
Abnormal Operations
Analysis
End User Analysis
to root cause issues
Trend Analysis for
Capacity Planning and
Traffic Patterns
• Identify all operations for a particular user
and review patterns and any faults
• Build capacity and traffic trend lines based
on statistical analysis of all traffic
100tps S3 Server = 83million lines info log = 3.5GB/Day
10 Server System = 35GB/Day ~ 1TB/month
100 Customer Systems => 1.2PB Annually
Traditional Big Data Flow
Event Processing
Platform
Big Data Storage Platform
Analytics PlatformContent Storage
Consumer Activity
(Events, GPS, WiFi)
Social MediaDevice Tracking and Logs
(Event, Configuration, Usage, Performance, )
Real Time
Events
Big Data
Result of analysis
6/16/2014 5
Traditional Big Data Flow
Event Processing
Platform
Analytics Platform
(HDFS)Content
Storage
(Object, NAS)
• Wasted storage = storage for content and analytics
• Transform of data into HDFS can be costly
• High overhead of HDFS (3copy replica) for content which may
be poor quality
Logs, Config
6/16/2014 6
S3 and Hadoop
• Apache Hadoop supports S3 since Jan 2008
– http://wiki.apache.org/hadoop/AmazonS3
• Well-proven by Amazon with Elastic MapReduce
• State-of-the-art and advancing quickly to provide
much easier Hadoop over S3 – e.g. Netflix Genie
– https://github.com/Netflix/genie
6/16/2014 7
Cloudian Approach
Event Processing
Platform
AnalyticsCloudian HyperStore
Storage
• No redundant storage of data
• Hyperstore scales out with your data – adding nodes for I/O
• Analyze more - allows for efficient bulk data analysis in place
• Take advantage of multi-core CPUs – makes sense for MapReduce
• Can feed smarter data for subsequent analytic systems
• Faster time to decision
6/16/2014 8
Cloudian Hadoop Configuration
• Hadoop 2.2
• Configured for native S3 file system (etc/hadoop/core-site.xml)
– S3N native file system for reading and writing regular files on S3. The
advantage of this file system is that you can access files on S3 that were
written with other tools. Conversely, other tools can access files written using
Hadoop.
• Configure Hadoop to use Cloudian (etc/hadoop/jets3t.properties)
– s3service.s3-endpoint=CLOUDIAN_ENDPOINT
– s3service.s3-endpoint-http-port=CLOUDIAN_PORT
6/16/2014 9
Note: you can also dedicate a bucket for Hadoop analytics and then
Hadoop will chunk the content into blocks for storage – like HDFS
S3
NFS
Cloudian HyperStore® Software
 Scalable peer-to-peer architecture
 Multi-data center replication
 Multi-Tenancy and Chargeback
 Hybrid cloud-ready (any S3 cloud)
 100s of supported applications
 Optimized for any workload
 Storage for OpenStack & CloudStack
6/16/2014 10
Elastic, Distributed and Reliable
NOSQL database distributes
and replicates data
Logical Ring
Data is
automatically
replicated to
multiple nodes.
Location of data can be
designated, for instance, to
multiple datacenters and
per rack.
DC1
DC2
In theory, # of nodes in
a logical ring can be up
to 2127 (almost infinite).
Data load can be
rebalanced when a node is
added or removed.
Jun-14
116/16/2014
Enhanced HyperStore® Technology
• Policies tailored for different
object types
• Optimized for all data
• Chunking for better
performance
• Erasure Coding for deep
archive efficiency
• Reliable storage across
multi-node failures
HyperStore
Patent Pending
Small Objects
Large Objects
Active Content
File System
NOSQL DB
Erasure Coding
Deep
Archives
6/16/2014 12
Cloudian Complete S3 API
• Core REST API – Get, Put, Post, Head, Delete
• Multi-part uploads: Allows uploading large objects
in multiple parts
• Versioning: Multiple versions of same object
• Bucket Lifecycle: Auto-expiration using rules
• Server side encryption: Managed by Cloudian
• Location Constraint: Assign data to specific region
(e.g. for HIPAA compliance)
• Bucket Website: Create buckets as websites to
host web content
• Access control lists (ACLs) define access rights to
bucket and object
• And more...
Cloudian Complete S3 API
Products S3 API
Cloudian
AmpliData
Basho
Caringo
Cleversafe
EMC Atmos
NetApp Bycast
Scality
OpenStack Swift
6/16/2014 13
Seamless tiering to Amazon S3, Glacier and
other S3 Service Providers
146/16/2014
• Cloudian deployed as On-Premises
S3 cloud behind the firewall
• Automatically migrates data to AWS
using Bucket Lifecycle Policies
– Optional migration to Glacier
– Metadata maintained for
search/list of objects
• Configurable to reduce
overhead
• Read/Writes to migrated objects
– restore by default, option to
redirect to AWS/S3 Service
Provider
On-Premises S3
S3
Client/Application
Content migrated
or restored via
Bucket Lifecycle
Policies
Option to redirect
migrated content
Amazon S3
Firewall
Amazon Glacier
Big Data Storage Platform
15
Event Processing Platform Big Data Storage Platform
Input I/F Recommend
CEP Engine
Filter Judge Aggregate
Real Time Analysis
Big Data Analysis
Analyze Recommend
Data Analysis and Storage Platform
Content Storage
Consumer Activity
(Events, GPS, WiFi)
Social mediaBusiness Tracking
(goods, inventory, campaign, sales)
Smarter
Business
6/16/2014
Future Work
• Delivery of Cloudian Hadoop-ready
object storage (2HCY14)
• Integration with key Hadoop
distributions
• Locality awareness
• Potentially use new drive technology for
processing (eg HGST Ethernet drive)
• Find out more – Booth 139
6/16/2014 16
Cloudian®
S3 Cloud Storage Platform
Thank You!
Questions?
www.cloudian.com
“The Leading Provider of Hybrid Cloud Storage”

More Related Content

What's hot (20)

PDF
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Alluxio, Inc.
 
PDF
Powering Data Science and AI with Apache Spark, Alluxio, and IBM
Alluxio, Inc.
 
PDF
Presto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data Stores
Alluxio, Inc.
 
PPTX
Qubole - Big data in cloud
Dmitry Tolpeko
 
PDF
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
Alluxio, Inc.
 
PPTX
Big Data Case Study: Fortune 100 Telco
BlueData, Inc.
 
PPTX
Big data in Azure
Venkatesh Narayanan
 
PPTX
Backup multi-cloud solution based on named pipes
Leandro Totino Pereira
 
PPTX
Architecting a datalake
Laurent Leturgez
 
PPTX
Azure Big Data Story
Lynn Langit
 
PDF
The Practice of Presto & Alluxio in E-Commerce Big Data Platform
Alluxio, Inc.
 
PPTX
Spark Infrastructure Made Easy
BlueData, Inc.
 
PDF
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
Spark Summit
 
PDF
Owning Your Own (Data) Lake House
Data Con LA
 
PPTX
HIPAA Compliance in the Cloud
DataWorks Summit/Hadoop Summit
 
PPTX
Introducing Cloudian HyperStore 6.0
Cloudian
 
PDF
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
Data Con LA
 
PPTX
Presto query optimizer: pursuit of performance
DataWorks Summit
 
PPTX
Move your on prem data to a lake in a Lake in Cloud
CAMMS
 
PPTX
Introduction to Kudu - StampedeCon 2016
StampedeCon
 
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Alluxio, Inc.
 
Powering Data Science and AI with Apache Spark, Alluxio, and IBM
Alluxio, Inc.
 
Presto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data Stores
Alluxio, Inc.
 
Qubole - Big data in cloud
Dmitry Tolpeko
 
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
Alluxio, Inc.
 
Big Data Case Study: Fortune 100 Telco
BlueData, Inc.
 
Big data in Azure
Venkatesh Narayanan
 
Backup multi-cloud solution based on named pipes
Leandro Totino Pereira
 
Architecting a datalake
Laurent Leturgez
 
Azure Big Data Story
Lynn Langit
 
The Practice of Presto & Alluxio in E-Commerce Big Data Platform
Alluxio, Inc.
 
Spark Infrastructure Made Easy
BlueData, Inc.
 
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
Spark Summit
 
Owning Your Own (Data) Lake House
Data Con LA
 
HIPAA Compliance in the Cloud
DataWorks Summit/Hadoop Summit
 
Introducing Cloudian HyperStore 6.0
Cloudian
 
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
Data Con LA
 
Presto query optimizer: pursuit of performance
DataWorks Summit
 
Move your on prem data to a lake in a Lake in Cloud
CAMMS
 
Introduction to Kudu - StampedeCon 2016
StampedeCon
 

Viewers also liked (11)

PPTX
Limewood Event - EMC
BlueChipICT
 
PPTX
Why consolidation of data centers smart business move
Go4hosting Web Hosting Provider
 
PDF
Design at Scale: A Storage Case Study
DesignMap
 
PPTX
Green Networks by Neenu
Neenu Ks
 
PPTX
Twitter case study
divya_binu
 
PPTX
Microservice architecture case study
Rudra Tripathy
 
PDF
Hadoop Trends
Hortonworks
 
PPT
Cisco Systems Case Study: The Architecture Review Process Improving the IT P...
Susan Bouchard
 
PPTX
Green storage
mnalls
 
PPT
Introduction to MongoDB
Ravi Teja
 
PPTX
Big data ppt
Nasrin Hussain
 
Limewood Event - EMC
BlueChipICT
 
Why consolidation of data centers smart business move
Go4hosting Web Hosting Provider
 
Design at Scale: A Storage Case Study
DesignMap
 
Green Networks by Neenu
Neenu Ks
 
Twitter case study
divya_binu
 
Microservice architecture case study
Rudra Tripathy
 
Hadoop Trends
Hortonworks
 
Cisco Systems Case Study: The Architecture Review Process Improving the IT P...
Susan Bouchard
 
Green storage
mnalls
 
Introduction to MongoDB
Ravi Teja
 
Big data ppt
Nasrin Hussain
 
Ad

Similar to Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object Storage (20)

PPTX
Adam Dagnall: Advanced S3 compatible storage integration in CloudStack
ShapeBlue
 
PPTX
Cloudian hyper store
John Varghese
 
PDF
critical_capabilities_for_ob_271719 copy
Chris Woeppel
 
PDF
Big Data and Analytics Innovation Summit
Martin Yan
 
PDF
Object Storage: How Can it Work for You
Cloudian
 
PDF
NTT Communications Delivers Object Storage to Enterprise Cloud Service with C...
Cloudian
 
PPTX
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
Omid Vahdaty
 
PPTX
Big data journey to the cloud rohit pujari 5.30.18
Cloudera, Inc.
 
PPTX
Hadoop and Cloudian HyperStore
Cloudian
 
PPTX
Cloud and Big Data trends
Sebastien Goasguen
 
PDF
Cloudian HyperStore Features and Benefits
Cloudian
 
PDF
Amazon Elastic Map Reduce - Ian Meyers
huguk
 
PPTX
Big Data on Cloud Native Platform
Sunil Govindan
 
PPTX
Big Data on Cloud Native Platform
Sunil Govindan
 
PPTX
High-Performance Analytics in the Cloud with Apache Impala
Cloudera, Inc.
 
PDF
Cloudian Object Storage for Accelerite CloudPlatform
Cloudian
 
PPTX
AWS Big Data Demystified #1: Big data architecture lessons learned
Omid Vahdaty
 
PDF
Building a Resilient, Scalable, Storage System with OpenStack
Cloudian
 
PDF
Zenko: Enabling Data Control in a Multi-cloud World
Scality
 
PDF
Big data and cloud computing 9 sep-2017
Dr. Anita Goel
 
Adam Dagnall: Advanced S3 compatible storage integration in CloudStack
ShapeBlue
 
Cloudian hyper store
John Varghese
 
critical_capabilities_for_ob_271719 copy
Chris Woeppel
 
Big Data and Analytics Innovation Summit
Martin Yan
 
Object Storage: How Can it Work for You
Cloudian
 
NTT Communications Delivers Object Storage to Enterprise Cloud Service with C...
Cloudian
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
Omid Vahdaty
 
Big data journey to the cloud rohit pujari 5.30.18
Cloudera, Inc.
 
Hadoop and Cloudian HyperStore
Cloudian
 
Cloud and Big Data trends
Sebastien Goasguen
 
Cloudian HyperStore Features and Benefits
Cloudian
 
Amazon Elastic Map Reduce - Ian Meyers
huguk
 
Big Data on Cloud Native Platform
Sunil Govindan
 
Big Data on Cloud Native Platform
Sunil Govindan
 
High-Performance Analytics in the Cloud with Apache Impala
Cloudera, Inc.
 
Cloudian Object Storage for Accelerite CloudPlatform
Cloudian
 
AWS Big Data Demystified #1: Big data architecture lessons learned
Omid Vahdaty
 
Building a Resilient, Scalable, Storage System with OpenStack
Cloudian
 
Zenko: Enabling Data Control in a Multi-cloud World
Scality
 
Big data and cloud computing 9 sep-2017
Dr. Anita Goel
 
Ad

More from Cloudian (20)

PDF
S3 Deduplication with StorReduce and Cloudian
Cloudian
 
PDF
Dynamic Object Routing
Cloudian
 
PDF
Cloudian and Rubrik - Hybrid Cloud based Disaster Recovery
Cloudian
 
PDF
Network Setup Guide: Deploying Your Cloudian HyperStore Hybrid Storage Service
Cloudian
 
PDF
Quick-Start Guide: Deploying Your Cloudian HyperStore Hybrid Storage Service
Cloudian
 
PDF
Cloudian HyperStore with IBM Spectrum Protect
Cloudian
 
PDF
Cloudian HyperStore Streamlines Scientific Collaboration
Cloudian
 
PDF
Cloudian HyperStore Enables Healthcare Data Storage
Cloudian
 
PDF
Data Protection & Ensuring Data Availability with Commvault Next-Generation S...
Cloudian
 
PDF
Modernize the Disperse Enterprise with CTERA Cloud Storage Gateways and Cloud...
Cloudian
 
PDF
Workload Centric Scale-Out Storage for Next Generation Datacenter
Cloudian
 
PPTX
Cloudian HyperStore 'Forever Live' Storage Platform
Cloudian
 
PPTX
Cloudian HyperStore 5.0 Release What's New
Cloudian
 
PPTX
Object Storage Overview
Cloudian
 
PDF
How to configure Cloudian HyperStore with Dragon Disk S3 Client
Cloudian
 
PDF
Kumo Meets Customer Demand for Cloud Backup with Cloudian Object Storage and ...
Cloudian
 
PDF
Cloudian Object Storage For Red Hat OpenStack Platform Solution Brief
Cloudian
 
PDF
Simplification of storage - The Hot and the Cold of It
Cloudian
 
PDF
How to Become Cloud Backup Provider
Cloudian
 
PDF
Building a Hybrid Cloud Solution
Cloudian
 
S3 Deduplication with StorReduce and Cloudian
Cloudian
 
Dynamic Object Routing
Cloudian
 
Cloudian and Rubrik - Hybrid Cloud based Disaster Recovery
Cloudian
 
Network Setup Guide: Deploying Your Cloudian HyperStore Hybrid Storage Service
Cloudian
 
Quick-Start Guide: Deploying Your Cloudian HyperStore Hybrid Storage Service
Cloudian
 
Cloudian HyperStore with IBM Spectrum Protect
Cloudian
 
Cloudian HyperStore Streamlines Scientific Collaboration
Cloudian
 
Cloudian HyperStore Enables Healthcare Data Storage
Cloudian
 
Data Protection & Ensuring Data Availability with Commvault Next-Generation S...
Cloudian
 
Modernize the Disperse Enterprise with CTERA Cloud Storage Gateways and Cloud...
Cloudian
 
Workload Centric Scale-Out Storage for Next Generation Datacenter
Cloudian
 
Cloudian HyperStore 'Forever Live' Storage Platform
Cloudian
 
Cloudian HyperStore 5.0 Release What's New
Cloudian
 
Object Storage Overview
Cloudian
 
How to configure Cloudian HyperStore with Dragon Disk S3 Client
Cloudian
 
Kumo Meets Customer Demand for Cloud Backup with Cloudian Object Storage and ...
Cloudian
 
Cloudian Object Storage For Red Hat OpenStack Platform Solution Brief
Cloudian
 
Simplification of storage - The Hot and the Cold of It
Cloudian
 
How to Become Cloud Backup Provider
Cloudian
 
Building a Hybrid Cloud Solution
Cloudian
 

Recently uploaded (20)

PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PDF
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PPT
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 

Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object Storage

  • 1. Cloudian® S3 Cloud Storage Platform Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object Storage Paul Turner Cloudian Inc. June 11th 2014
  • 2. About Cloudian • Hybrid cloud storage startup in Silicon Valley – Strong venture backing: Goldman Sachs, Intel Capital – Solid management with storage, big data, enterprise software and telco expertise – 50 employees, offices in Foster City, Japan and China • Production hardened product • Target market: mid- to large-enterprises & regional service providers • GTM: traditional storage distribution/VARs CLOUDIAN PARTNERS
  • 3. The Challenge • Business problem = Analysis of log data from our customer systems to improve support (classic ‘Internet of Things’ content) • Existing system required transformation of the data into HDFS for analytics (slow and costly) Goal : Reduce cost and provide faster results 6/16/2014 3
  • 4. Use Case : Support Analytics • Compare system statistics and usage patterns to previous normal results 6/16/2014 4 Abnormal Operations Analysis End User Analysis to root cause issues Trend Analysis for Capacity Planning and Traffic Patterns • Identify all operations for a particular user and review patterns and any faults • Build capacity and traffic trend lines based on statistical analysis of all traffic 100tps S3 Server = 83million lines info log = 3.5GB/Day 10 Server System = 35GB/Day ~ 1TB/month 100 Customer Systems => 1.2PB Annually
  • 5. Traditional Big Data Flow Event Processing Platform Big Data Storage Platform Analytics PlatformContent Storage Consumer Activity (Events, GPS, WiFi) Social MediaDevice Tracking and Logs (Event, Configuration, Usage, Performance, ) Real Time Events Big Data Result of analysis 6/16/2014 5
  • 6. Traditional Big Data Flow Event Processing Platform Analytics Platform (HDFS)Content Storage (Object, NAS) • Wasted storage = storage for content and analytics • Transform of data into HDFS can be costly • High overhead of HDFS (3copy replica) for content which may be poor quality Logs, Config 6/16/2014 6
  • 7. S3 and Hadoop • Apache Hadoop supports S3 since Jan 2008 – http://wiki.apache.org/hadoop/AmazonS3 • Well-proven by Amazon with Elastic MapReduce • State-of-the-art and advancing quickly to provide much easier Hadoop over S3 – e.g. Netflix Genie – https://github.com/Netflix/genie 6/16/2014 7
  • 8. Cloudian Approach Event Processing Platform AnalyticsCloudian HyperStore Storage • No redundant storage of data • Hyperstore scales out with your data – adding nodes for I/O • Analyze more - allows for efficient bulk data analysis in place • Take advantage of multi-core CPUs – makes sense for MapReduce • Can feed smarter data for subsequent analytic systems • Faster time to decision 6/16/2014 8
  • 9. Cloudian Hadoop Configuration • Hadoop 2.2 • Configured for native S3 file system (etc/hadoop/core-site.xml) – S3N native file system for reading and writing regular files on S3. The advantage of this file system is that you can access files on S3 that were written with other tools. Conversely, other tools can access files written using Hadoop. • Configure Hadoop to use Cloudian (etc/hadoop/jets3t.properties) – s3service.s3-endpoint=CLOUDIAN_ENDPOINT – s3service.s3-endpoint-http-port=CLOUDIAN_PORT 6/16/2014 9 Note: you can also dedicate a bucket for Hadoop analytics and then Hadoop will chunk the content into blocks for storage – like HDFS
  • 10. S3 NFS Cloudian HyperStore® Software  Scalable peer-to-peer architecture  Multi-data center replication  Multi-Tenancy and Chargeback  Hybrid cloud-ready (any S3 cloud)  100s of supported applications  Optimized for any workload  Storage for OpenStack & CloudStack 6/16/2014 10
  • 11. Elastic, Distributed and Reliable NOSQL database distributes and replicates data Logical Ring Data is automatically replicated to multiple nodes. Location of data can be designated, for instance, to multiple datacenters and per rack. DC1 DC2 In theory, # of nodes in a logical ring can be up to 2127 (almost infinite). Data load can be rebalanced when a node is added or removed. Jun-14 116/16/2014
  • 12. Enhanced HyperStore® Technology • Policies tailored for different object types • Optimized for all data • Chunking for better performance • Erasure Coding for deep archive efficiency • Reliable storage across multi-node failures HyperStore Patent Pending Small Objects Large Objects Active Content File System NOSQL DB Erasure Coding Deep Archives 6/16/2014 12
  • 13. Cloudian Complete S3 API • Core REST API – Get, Put, Post, Head, Delete • Multi-part uploads: Allows uploading large objects in multiple parts • Versioning: Multiple versions of same object • Bucket Lifecycle: Auto-expiration using rules • Server side encryption: Managed by Cloudian • Location Constraint: Assign data to specific region (e.g. for HIPAA compliance) • Bucket Website: Create buckets as websites to host web content • Access control lists (ACLs) define access rights to bucket and object • And more... Cloudian Complete S3 API Products S3 API Cloudian AmpliData Basho Caringo Cleversafe EMC Atmos NetApp Bycast Scality OpenStack Swift 6/16/2014 13
  • 14. Seamless tiering to Amazon S3, Glacier and other S3 Service Providers 146/16/2014 • Cloudian deployed as On-Premises S3 cloud behind the firewall • Automatically migrates data to AWS using Bucket Lifecycle Policies – Optional migration to Glacier – Metadata maintained for search/list of objects • Configurable to reduce overhead • Read/Writes to migrated objects – restore by default, option to redirect to AWS/S3 Service Provider On-Premises S3 S3 Client/Application Content migrated or restored via Bucket Lifecycle Policies Option to redirect migrated content Amazon S3 Firewall Amazon Glacier
  • 15. Big Data Storage Platform 15 Event Processing Platform Big Data Storage Platform Input I/F Recommend CEP Engine Filter Judge Aggregate Real Time Analysis Big Data Analysis Analyze Recommend Data Analysis and Storage Platform Content Storage Consumer Activity (Events, GPS, WiFi) Social mediaBusiness Tracking (goods, inventory, campaign, sales) Smarter Business 6/16/2014
  • 16. Future Work • Delivery of Cloudian Hadoop-ready object storage (2HCY14) • Integration with key Hadoop distributions • Locality awareness • Potentially use new drive technology for processing (eg HGST Ethernet drive) • Find out more – Booth 139 6/16/2014 16
  • 17. Cloudian® S3 Cloud Storage Platform Thank You! Questions? www.cloudian.com “The Leading Provider of Hybrid Cloud Storage”