Big Data
Using Public Cloud
Assoc. Prof. Dr. Thanachart Numnonda
Executive Director
IMC Institute
18 August 2015
2
Internet of ThingsCloud Computing
Big Data
3
Data created every minute
Source: Trendwise Analytics
4
Data!
The New York Stock Exchange generate about
1TB of new trade data per day.
A commercial aircraft generates 3GB of flight
sensor data in 1 hour.
Vodafone generates 3TB of Call Detail Record
(CRDs) per day.
Between 2009 and 2014, the total number of U.S.
online banking households will increase from 54
million to 66 million.
5
Big Data
Source: http://www.datasciencecentral.com/
6
7
Sample Data v.s. Big Data
Can you judge a persons life expectancy?
Given:
– DNA
– Medical records
– Food
– Lifestyle (smoking, drinking, driving, exercise)
8
IT Infrastructure
Analytics
Data Sources
9
“Bัy 2015, 20% of Global 1000 organizations
Will have established a strategic focus on
information infrastructure ”
Gartner
10
Big Data Technology !!
11
Big Data Landscape
Source: Big Data in the Enterprise. When to Use What?
12
A scalable fault-tolerant distributed system
for data storage and processing
Completely written in java
Open source & distributed under Apache license
What is Hadoop?
13
Hadoop Environment
Source: Hadoop in Practice; Alex Holmes
14
Hadoop Distribution
Microsoft Azure
15
Big Data Future Architecture
Sscial Media Images e-mails Crawlers
ERP CRM LOB APPs
Unstructured and Structured Data
Data Warehouse / NewSQL
Hadoop On
Cloud
Hadoop On
Private
Server
Connectors
S
S
R
S
BI Platform
Familiar End User Tools
Spreadsheet Predictive Analytics
Data Market Place
NoSQL
Petabytes of Data
(Unstructured)
Hundreds of TB of Data
(structured)
16
Issue with Big Data Infrastructure
Large investment
Scalabilty
ROI
Business Cases
17
Big Data on Cloud
Using IaaS to leverage Cloud Vms
Using Big Data as a Services
18
Big Data using IaaS
19
20
21
Source: http://www.datadansandler.com/
22
Big Data Services on Cloud
Amazon
Elastic Mapreduce
Microsoft Azure Hadoop
23
Big Data as a Service
24
Database as a Service
Amazon RDS
IBM SQL Database for Bluemix
Microsoft SQL Database
Google CloudSQL
25
NoSQL as a Service
Amazon DynomoDB
Google Cloud DataStore
Microsoft Azure DocumentDB
Cloudant on IBM Bluemix.
Mongo DB on Heroku
26
Amazon DynomoDB
27
Hadoop as a Service
Amazon Elastic Map Reduce
Rackspace Cloud Big Data Platform
Qubole
Google Cloud Platform
IBM Bluemix: Analytic on Hadoop
Microsoft Azure HDInsight
28
29
30
Big Data on Amazon EMR
31
Amazon EMR
32
33
34
Hadoop on Google
35
36
Analytic as a Service
Google Big Query
Amazon Machine Learning
Azure Machine Learning
BIME: BI as a Service
IBM Watson Analytic
37
38
39
Google BigQuery
40
Big Data on Cloud Roadmap
Step 1: Build the business case
Step 2: Assess your Big Data application
workloads
Step 3: Develop a technical approach for
deploying and managing Big Data in the cloud
Step 4: Address governance, security, privacy,
risk,
Step 5: Deploy, integrate, and operationalize
your cloud-based Big Data infrastructure
Source : Deploying Big Data Analytics Applications to the Cloud: Roadmap for Success: CSCS
41
Sample applications
Enterprise applications already hosted in the
cloud
High-volume external data sources that
require considerable preprocessing
Tactical applications beyond your on-
premises, Big Data capabilities
Elastic provisioning of very large but short-
lived analytic sandboxes
Source : Deploying Big Data Analytics Applications to the Cloud: Roadmap for Success: CSCS
42
www.facebook.com/imcinstitute
43
Thank you
thanachart@imcinstitute.com
www.facebook.com/imcinstitute
www.slideshare.net/imcinstitute
www.thanachart.org

Big data using Public Cloud