SlideShare a Scribd company logo
Proprietary + Confidential
SQL Saturday - Los Angeles
Data Platform on GCP
Patrick Alexander
Google - Customer Engineer
Ex-Microsoft - Principal Cloud Solution Architect
PatrickGCP@Google.com
@PatrickCloudArc
Google Office Spruce Goose
Playa Vista - California
2022
Spruce Goose
Hughes H-4 Hercules
1942 - 1947
https://en.wikipedia.org/wiki/Hughes_H-4_Hercules
November 2, 1947
Long Beach, California
THE STATS
Wingspan: 320′ 11″
Length: 218′ 8″
Height: 79′ 4″
Pounds, Empty Weight: 300,000
Cruise Speed: 135 MPH
Data Platform on GCP
Intro to GCP’s Data Platform
Confidential & Proprietary
Agenda
01 The Data Landscape
02 Google Cloud Platform
03 Google Cloud Big Data Portfolio
The Data Landscape
Confidential & Proprietary
$203,579
In Amazon sales
generated
Data is surging every minute. How are you using it?
500
Hours of video
uploaded on YouTube
142,361,111
Emails sent and
received
2,083,333
Minutes used on
Skype calls
347,222
Tweets posted
50,200
Mobile apps
downloaded
1,389
Uber rides taken
2.4 Million
Google searches
made
216,000
Photos posted to
Instagram
*Stats may be out of date!
Data Platform on GCP
Confidential & Proprietary
Unintegrated Marketing Tools
Many companies use 20+ separate tools
It’s difficult to get a holistic view of customers.
Company Data in Silos
CRM / ERP / Billing / Inventory / POS
Confidential & Proprietary
If you want to unlock the power of your data, you need
a CDP (customer data platform), not just new tools.
Google Cloud Platform
Warehouse
Cloud
Storage
Object
Binary or
object data
Images, media
serving, backups
Memcache
Key-value
Web/mobile
applications,
gaming
Game state,
user sessions
Non-relational
Cloud
Datastore
Hierarchical,
mobile, web
User profiles,
Game State
Cloud
Bigtable
Heavy read +
write, events
AdTech,
financial, IoT
Relational
Cloud
SQL
Web
frameworks
CMS,
eCommerce
Cloud
Spanner
RDBMS+scale,
HA, HTAP
Transactions,
Ad/Fin/MarTech
BigQuery
Enterprise Data
Warehouse
Analytics,
Dashboards
Fully managed storage
& database services
A modern data warehouse on a comprehensive platform
Data ingestion
at any scale
Reliable streaming
data pipeline
Advanced analytics
Data lake and data
warehousing
Cloud Pub/Sub Cloud
Dataflow
Cloud
Dataproc
Cloud
Storage
Data Transfer
Service
Cloud Composer
Cloud IoT
Core
Cloud Dataprep
Cloud AI
Services
Google
Data Studio
Tensorflow Sheets
Storage Transfer
Service
Data Catalog
Cloud Data Fusion
Process
Capture Store
Data warehousing
Analyze
BigQuery
storage
BigQuery
analysis engine
Use
Apache
Beam
16
A Leader in
Cloud Data Warehouse
Data Ingestion
Data Lake Integration
ML / Data Science
Performance
Scalability
Google receives 5 of 5 in 19 different criteria, such as:
Solution Roadmap
Strategy Execution
Customer Adoption
Use Cases
Partners
The Forrester Wave™: Cloud Data Warehouse, Q1 2021, Noel Yuhanna
The Forrester Wave™ is a graphical representation of Forrester's call on a market and is plotted using a detailed spreadsheet with exposed scores, weightings, and comments.
Forrester does not endorse any vendor, product, or service depicted in the Forrester Wave. Information is based on best available resources. Opinions reflect judgment at the time
and are subject to change.
Google Cloud Big Data Portfolio
GCP provides a full suite of storage service options
● Cost-effective
● Varied choices based on your:
○ Application
○ Workload
Highlight rows using blue and white, just like the agenda
Cloud
Storage
Cloud
Bigtable
Cloud
Datastore
Cloud
SQL
Cloud
Spanner
BigQuery
Cloud
Firestore
Overview Ideal for
● Fully managed, highly reliable ● Images and videos
● Cost-efficient, scalable object/blob
store
● Objects and blobs
● Objects access via HTTP requests ● Unstructured data
● Object name is the only key ● Static website hosting
Cloud Storage
Cloud
Bigtable
Cloud
Datastore
Cloud
SQL
Cloud
Spanner
BigQuery
Cloud
Firestore
Cloud
Storage
Cloud Datastore
Overview Ideal for
● Fully managed NoSQL ● Semi-structured application data
● Scalable ● Durable key-value data
● Hierarchical data
● Managing multiple indexes
● Transactions
Cloud
Storage
Cloud
Bigtable
Cloud
SQL
Cloud
Spanner
BigQuery
Cloud
Firestore
Cloud
Datastore
Cloud Firestore
Overview Ideal for
● Fully managed, serverless, NoSQL
● Scalable
● Native mobile and web client libraries
● Real-time updates
● Document-oriented data
● Large collections of small documents
● Native mobile and web clients
● Durable key-value data
● Hierarchical data
● Managing multiple indexes
● Transactions
Cloud
Storage
Cloud
Bigtable
Cloud
Datastore
Cloud
SQL
Cloud
Spanner
BigQuery
Cloud
Firestore
Cloud Bigtable
Overview Ideal for
● High performance wide column NoSQL
database service
● Operational applications
● Sparsely populated table ● Analytical applications
● Can scale to billions of rows and
thousands of columns
● Storing large amounts of single-keyed
data
● Can store TB to PB of data ● MapReduce operations
Cloud
Storage
Cloud
Datastore
Cloud
SQL
Cloud
Spanner
BigQuery
Cloud
Firestore
Cloud
Bigtable
Cloud SQL
Overview Ideal for
● Managed service
○ Replication
○ Failover
○ Backups
● Web frameworks
● MySQL, PostgreSQL, and SQL Server ● Structured data
● Relational database service ● OLTP workloads
● Proxy allows for secure access to your
Cloud SQL Second Generation instances
without whitelisting
● Applications using MySQL/PGS
Cloud
Storage
Cloud
Bigtable
Cloud
Datastore
Cloud
Spanner
BigQuery
Cloud
Firestore
Cloud
SQL
Cloud Spanner
Overview Ideal for
● Mission-critical relational database
service
● Mission-critical applications
● Transactional conspiracy ● High transactions
● Global scale ● Scale and consistency requirements
● High availability
● Multi-region replication
● 99.999% SLA
Cloud
Storage
Cloud
Bigtable
Cloud
Datastore
Cloud
SQL
BigQuery
Cloud
Firestore
Cloud
Spanner
BigQuery
Overview Ideal for
● Low-cost enterprise data warehouse for
analytics
● Online Analytical Processing (OLAP)
workloads
● Fully managed ● Big data exploration and processing
● Petabyte scale ● Reporting via Business Intelligence (BI)
tools
● Fast response times
● Serverless
Cloud
Storage
Cloud
Bigtable
Cloud
Datastore
Cloud
SQL
Cloud
Spanner
Cloud
Firestore
BigQuery
Product Simple Description Ideal for Not Ideal for
Cloud
Storage Binary/object store
Large or rarely accessed
unstructured data
Structured data, building
fast apps
Datastore
Scalable store for structured serve
GAE apps, structured
pure-serve use cases
Relational or
analytic data
Firestore Cloud-native app data at global scale
Real-time NoSQL database to
store and sync data
Mobile, web, multi-user,
IoT & real-time
applications
Bigtable
High-volume, low-latency database
“Flat,” heavy read/write, or
analytical data
High structure or
transactional data
CloudSQL
Well-understood VM-based RDBMS
Web frameworks,
existing applications
Scaling, analytics, heavy
writes
Spanner
Relational DB service
Low-latency transactional
systems
Analytic data
BigQuery
Auto-scaling analytic data warehouse
Interactive analysis of static
datasets
Building fast apps
Storage at a glance
Data Platform on GCP
Cloud
SQL
Cloud
Spanner
Cloud
Datastore
Cloud
Bigtable
BigQuery
Cloud
Firestore on
Firebase
Is your data
structured?
Is your workload
analytics?
Is your data
relational?
Do you need updates
or low-latency?
Do you need
Mobile SDK’s?
Do you need
horizontal scalability?
No
Yes
No
Yes
No
Yes
Yes
No Yes
No Yes No
Do you need
Mobile SDK’s?
Firebase
Storage
Yes
No
Cloud
Storage
Which Google Cloud
Database is right for me?
A modern data warehouse on a comprehensive platform
Data ingestion
at any scale
Reliable streaming
data pipeline
Advanced analytics
Data lake and data
warehousing
Apache
Beam
Cloud Pub/Sub Cloud
Dataflow
Cloud
Dataproc
Cloud
Storage
Data Transfer
Service
Cloud Composer
Cloud IoT
Core
Cloud Dataprep
Cloud AI
Services
Google
Data Studio
Tensorflow Sheets
Storage Transfer
Service
Data Catalog
Cloud Data Fusion
Process
Capture Store
Data warehousing
Analyze
BigQuery
storage
BigQuery
analysis engine
Use
Cloud
Storage
Cloud
Transfer
Good for:
Managed Bulk
(arbitrary) data
transfer
Such as:
Cloud migration,
backup, legacy
data
Cloud
Pub/Sub
Streaming Batch
Applications
Data lifecycle - ingest
Stackdriver
Logging
Good for:
Centralized Log
management
solution
Such as:
Log data from
Applications
Cloud
Pub/Sub
Good for:
Global,
Scalable MQ,
durable,
de-couple apps
Such as:
IOT, User event,
System metrics
Cloud
SQL
Good for:
Structured
data, Web
frameworks
Such as:
Meta-data,
Fintech,
AdTech
Cloud
Datastore
Good for:
Hierarchical,
Mobile, Web
Such as:
User profile,
Game states
Cloud
Bigtable
Good for:
Heavy
read/write,
events
Such as:
IOT,
User/system
events, low
latency
systems
Cloud
Firestore
Cloud
Spanner
Good for:
RDBMS, SQL,
Horizontal
scaling
Such as:
Meta-data,
Fintech,
AdTech
Good for:
Hierarchical,
Mobile, Web
Such as:
User profile,
Game states
Good for:
Global,
Scalable MQ,
durable,
de-couple apps
Such as:
IOT, User event,
System metrics
Good for:
Binary, Object
data
Such as:
Images, Media
serving, Backup
AutoML Video
Intelligence
AutoML
Vision
Good for:
Object/face
detection,
emotional
facial
attributes, Safe
search, real
time or batch,
OCR
Good for:
Video metadata,
entity analysis,
granularity of 1
frame per second,
Video catalog
(timestamped)
entity search
Data Analysis Task specific Machine Learning
Large scale data processing
Data lifecycle - process and analyze
Cloud
Dataproc
Good for:
Managed
hadoop
eco-systems
Such as:
Batch and
streaming
analytics over
Big Data,
Machine
Learning
Cloud
Dataflow
Good for:
Unified abs. for
batch & streaming
data.
Such as:
New pipelines,
Windowing
operations,
Watermarking
Cloud
Dataprep
Good for:
UI Driven data
preparation
Such as:
Pre-step to Big
data jobs
(Dataproc/Data
Flow), Machine
Learning
BigQuery
Vertex AI
Platform
Good for:
General
purpose ML
platform.
Such as:
Data
scientists,
ML on Data
warehouse
Custom ML
Cloud
Dataproc
Good for:
Managed
hadoop
eco-systems
Such as:
ML Jobs using
Mahour/Spark
MLLib
AutoML
Translation
AutoML
NLP
Good for:
Structure and
meaning of text,
sentiment
analysis
Good for:
Auto translation of
90 languages,
language
detection, both
real time and
batch
AutoML
Tables
Good for:
Analyse structured
data, find data
traits, data label
and target feature
selection
Good for:
Enterprise Data
Warehouse
Such as:
Analytics,
Dashboards,
Business
Intelligence, Basic
Machine Learning
Cloud
Datalab
Connected
Sheets
Good for:
Jupyter notebooks
for general purpose
data visualization
Good for:
Using Google
App script
ability to run
BigQuery
Query. Usually
for quick
short analysis
on smaller
datasets
Google Data
Studio
Good for:
Drag and Drop report
builder from Google
Sheets, BigQuery,
Cloud storage files,
SQL
Business Intelligence Spreadsheet
Data Science
Data lifecycle - explore and visualize
Looker
Good for:
Custom applications,
embedded
visualizations, data
science workflows,
Integrates with
BigQuery
Cloud
Dataprep
Good for:
UI Driven data
preparation
and
visualization.
Also used as
Pre-step to
Big data jobs
(Dataproc/Dat
aFlow),
Machine
Learning
Big Data Reference Architecture
Data Science Reference Architecture
Data Platform on GCP
Data Platform on GCP
(High Performance Computing)
Data Platform on GCP
Proprietary + Confidential
SQL:2011
Compliant
Petabit Network
BigQuery High-Available Cluster Compute
(Dremel)
Streaming Ingest
Free Bulk
Loading
Replicated, Distributed Storage
(99.9999999999% durability) REST API
Client libraries for: C#, Go, Java,
Node.js, PHP, Python, Ruby
Web UI, CLI
Distributed
Memory Shuffle
Tier
BigQuery | Architecture
Decoupled storage and compute for maximum flexibility
Proprietary + Confidential
Economic value - Data Warehouse Migration
lowers your TCO massively
ES G 2019 : The Economic
advantage of migrating Data
Warehouse Workloads to
BigQuery
52% Lower TCO
(versus on-premises)
41% Lower TCO
(vs Teradata on AWS)
TCO Calculator
Expected 3-year total cost of ownership
Teradata
on-premises
$0
Teradata
on AWS
Google
BigQuery
$2,000,000
$4,000,000
$6,000,000
$8,000,000
$10,000,000
$12,000,000
$14,000,000
$16,000,000
41% lower
TCO (vs EDW
on AWS)
52% lower
TCO vs Legacy
TD on-prem
Up-front Capital Investment Monthly Cloud spend
Administrative costs Planning/deployment/migration
Power/cooling/floorspace
ESG 2019
Proprietary + Confidential
Google Cloud provides the most modern data warehouse
Impact Google Cloud
BigQuery
Teradata
on-prem
AWS RedShift Snowflake Azure Synapse Analytics
Scale ✓ Fully managed and
serverless
✓ Petabyte-scale
✓ No warm-up or
maintenance
✕ Tied to cluster
✕ Significant
performance
bottlenecks
✕ Tied to cluster (RedShift
Spectrum is serverless)
✕ Considerable amount of
tuning needed
✕ Huge performance
bottlenecks
✕ Reclustering, shuffles, and
loads hurt performance
✕ SSDs tied to VMs
✕ Significant performance
bottlenecks on large data
✕ Compute has to be scaled up
manually
✕ Capacity limits based on
instance size
Real-time ✓ Streaming data
✓ BI Engine
✓ Streaming SQL
✓ Streaming data,
dashboards, SQL
✓ Streaming data,
dashboards, SQL
✕ Poor streaming performance ✓ Streaming data, dashboards
✕ Requires Databricks for
streaming scenarios
AI support ✓ Built-in BigQuery ML
✓ Two-way connections
to AI Platform
✓ Storage API for
Spark/Dataproc
✓ Some built-in ML
✕ Only basic
techniques
✕ No SQL-based ML
✓ Integration with Sage
Maker
✕ No SQL-based AI/ML
✕ No high-performance support
for Spark
✕ No SQL-based ML
✕ Just a rebrand of three
separate products; no deep
integration
Data
security
✓ Encrypted at rest and
in transit
✓ Immutable audit logs
✓ Data Catalog
✓ DLP API for redaction
✓ Integrated security ✓ Integrated security
✕ Partner tool (DgSecure)
needed for redaction
✕ No VPC-SC means no guards
against data exfiltration
✕ Standalone authentication
system
✕ No native redaction capability
✓ Integrated security
✕ Patches applied during
maintenance windows, with
downtime
BigQuery Hands on Lab
https://google.qwiklabs.com/focuses/1145?parent=catalog
Qwiklab
Any questions?
Thank you!
Proprietary + Confidential
NDA
Performance at Scale
Petabyte scale, automated, and intelligent - lets your enterprise focus on
delivering insights not infrastructure
Built-in advanced
analytics capabilities
Completely automated
and serverless
Manual configuration
Workloads
and
analytics
Degree of automation
BigQuery
Ad-hoc reporting,
operational insight
Basic reporting Legacy DW
Proprietary + Confidential
Expected 3-Year Total Cost of Ownership
52% Lower TCO1
(versus on-premises)
26-34% Lower TCO2
(vs other Cloud DW’s)
Flat-rate and variable pricing
options to give customers
control over TCO
1) Migrating Enterprise Data Warehouse Workloads - ESG 2019
2) Google BigQuery vs. Alternative Cloud-based EDW Solutions - ESG 2019
Economic Value - BigQuery lowers your data warehouse
TCO massively
How Google’s Smart Analytics Platform is Unique in the Industry
Scale ✓ Partial ✕
BigQuery is fully managed, serverless and architected for petabyte scale. While others are tied to clusters or require manual reclustering efforts
BQ manages the infrastructure for you and allows your teams to focus on delivering insights
Total Cost of Ownership ✓ ✕ ✕
BigQuery eliminates the need for upfront investment and planning for your EDW, reduces operational and administrative expenses - all while
delivering on business agility. Enterprise Strategy Group (ESG) estimated savings of 26-34% over cloud-based EDW alternatives and >40% over
legacy on-premise solutions
Interoperability ✓ ✕ ✕
BigQuery provides a unified, interoperable best of breed platform across your Data Warehouse and Data Lakes and data integration across
on-prem and cloud sources. BQ was made to tear down data silos and allow you to avoid creating new ones.
Democratized ML/AI ✓ ✕ ✕
BigQuery democratizes Machine Learning for the enterprise user (not just data scientists) with accessible capabilities using SQL. While allowing
for more sophisticated data science teams to access the power of Google’s leading edge AI technologies via Cloud AI. More than 80% of our
BigQuery customers have incorporated ML into their business analysis
Reliable & Secure ✓ Partial Partial
BigQuery offers robust security, governance and reliability that is unmatched in the industry. High availability and a 99.99% SLA, automatic data
replication, restore and backup to ensure business continuity. Ability to classify and redact sensitive data, fine-grained identity and access
management including access transparency so you can log each view. Data is encrypted at rest and in transit by default, and
customer-managed encryption keys provide control over your data
Real-Time ✓ ✕ ✕
Designed to excel in IoT and other scenarios where your analysis depends on real-time streaming data as well as a BI acceleration engine for
high-concurrency low-latency use cases - both are unique differentiators for Google Cloud and essential for businesses that need to make real
time decisions
Usefully Multi-Cloud ✓ ✕ ✕
BigQuery breaks down the silos to provide a single pane of glass for all your data across multiple clouds (AWS. Azure). Most other vendors are
focused on providing the same service running in 3 clouds but these are 3 silos. BigQuery breaks the silo and enables customers to analyze data
across datasets
Industry Leadership ✓ ? ?
Recognized industry leader by both Gartner and Forrester in Data Management and Analytics. With 9 Google products with more than a billion
users running on our platform you can be sure that big data is in our DNA and we are ready to help your business build a future ready data
platform
L
e
g
a
c
y
S
o
l
u
t
i
o
n
s
Appendix
Data Platform on GCP
Data Platform on GCP
Data Platform on GCP
Data Platform on GCP
Data Platform on GCP
Data Platform on GCP
Data Platform on GCP
Data Platform on GCP
Data Platform on GCP
Data Platform on GCP
Data Platform on GCP
Data Platform on GCP
Data Platform on GCP
Data Platform on GCP
Data Platform on GCP
Data Platform on GCP
Data Platform on GCP

More Related Content

What's hot (20)

PDF
Microsoft Azure Overview
David J Rosenthal
 
PDF
Choosing Between Microsoft Fabric, Azure Synapse Analytics and Azure Data Fac...
Cathrine Wilhelmsen
 
PDF
Tom Grey - Google Cloud Platform
Fondazione CUOA
 
PPTX
Migrating on premises workload to azure sql database
PARIKSHIT SAVJANI
 
PPTX
Databricks on AWS.pptx
Wasm1953
 
PDF
Building an open data platform with apache iceberg
Alluxio, Inc.
 
PPTX
How to migrate workloads to the google cloud platform
actualtechmedia
 
PDF
Cloud Native Application
VMUG IT
 
PPTX
Azure storage
Adam Skibicki
 
PDF
Azure stack all you need to know
Susantha Silva
 
PDF
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kai Wähner
 
PDF
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
confluent
 
PDF
ETL Made Easy with Azure Data Factory and Azure Databricks
Databricks
 
PPTX
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
huguk
 
PPTX
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
PDF
Microsoft Azure Cloud Services
David J Rosenthal
 
PPTX
Cloud computing by Google Cloud Platform - Presentation
TinarivosoaAbaniaina
 
PDF
Google Cloud Platform Training | Introduction To GCP | Google Cloud Platform ...
Edureka!
 
PDF
Introduction to Azure Data Lake
Antonios Chatzipavlis
 
PPTX
Databricks Fundamentals
Dalibor Wijas
 
Microsoft Azure Overview
David J Rosenthal
 
Choosing Between Microsoft Fabric, Azure Synapse Analytics and Azure Data Fac...
Cathrine Wilhelmsen
 
Tom Grey - Google Cloud Platform
Fondazione CUOA
 
Migrating on premises workload to azure sql database
PARIKSHIT SAVJANI
 
Databricks on AWS.pptx
Wasm1953
 
Building an open data platform with apache iceberg
Alluxio, Inc.
 
How to migrate workloads to the google cloud platform
actualtechmedia
 
Cloud Native Application
VMUG IT
 
Azure storage
Adam Skibicki
 
Azure stack all you need to know
Susantha Silva
 
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kai Wähner
 
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
confluent
 
ETL Made Easy with Azure Data Factory and Azure Databricks
Databricks
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
huguk
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
Microsoft Azure Cloud Services
David J Rosenthal
 
Cloud computing by Google Cloud Platform - Presentation
TinarivosoaAbaniaina
 
Google Cloud Platform Training | Introduction To GCP | Google Cloud Platform ...
Edureka!
 
Introduction to Azure Data Lake
Antonios Chatzipavlis
 
Databricks Fundamentals
Dalibor Wijas
 

Similar to Data Platform on GCP (20)

PDF
GDSC Google Cloud Study jam Web Bootcamp - Day-4 Session 4
SahithiGurlinka
 
PDF
GCSJ Session 4.pdf
SahithiGurlinka
 
PDF
Beyond Relational
Lynn Langit
 
PDF
[Cloud OnAir] Talks by DevRel Vol.4 データ管理とデータ ベース 2020年8月27日 放送
Google Cloud Platform - Japan
 
PDF
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Chris Jang
 
PDF
Getting more into GCP.pdf
Knoldus Inc.
 
PPTX
Google Cloud Spanner Preview
DoiT International
 
PPTX
Eric Andersen Keynote
Data Con LA
 
DOCX
GOOGLE CLOUD DATA AND STORAGE Foundations.docx
GCP Masters
 
PPTX
Google Cloud and Data Pipeline Patterns
Lynn Langit
 
PPTX
Introduction to Google Cloud Platform
dhruv_chaudhari
 
PDF
Google cloud big data summit master gcp big data summit la - 10-20-2015
Raj Babu
 
PDF
Prague data management meetup 2018-03-27
Martin Bém
 
PDF
IBM Cloud Day January 2021 - A well architected data lake
Torsten Steinbach
 
PDF
GCP Data Engineer cheatsheet
Guang Xu
 
PPTX
An Overview of All The Different Databases in Google Cloud
Fibonalabs
 
PDF
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
Adaryl "Bob" Wakefield, MBA
 
PPTX
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020
Mariano Gonzalez
 
PDF
Connecta Event: Big Query och dataanalys med Google Cloud Platform
ConnectaDigital
 
PDF
Gcp data engineer
Narendranath Reddy T
 
GDSC Google Cloud Study jam Web Bootcamp - Day-4 Session 4
SahithiGurlinka
 
GCSJ Session 4.pdf
SahithiGurlinka
 
Beyond Relational
Lynn Langit
 
[Cloud OnAir] Talks by DevRel Vol.4 データ管理とデータ ベース 2020年8月27日 放送
Google Cloud Platform - Japan
 
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Chris Jang
 
Getting more into GCP.pdf
Knoldus Inc.
 
Google Cloud Spanner Preview
DoiT International
 
Eric Andersen Keynote
Data Con LA
 
GOOGLE CLOUD DATA AND STORAGE Foundations.docx
GCP Masters
 
Google Cloud and Data Pipeline Patterns
Lynn Langit
 
Introduction to Google Cloud Platform
dhruv_chaudhari
 
Google cloud big data summit master gcp big data summit la - 10-20-2015
Raj Babu
 
Prague data management meetup 2018-03-27
Martin Bém
 
IBM Cloud Day January 2021 - A well architected data lake
Torsten Steinbach
 
GCP Data Engineer cheatsheet
Guang Xu
 
An Overview of All The Different Databases in Google Cloud
Fibonalabs
 
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
Adaryl "Bob" Wakefield, MBA
 
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020
Mariano Gonzalez
 
Connecta Event: Big Query och dataanalys med Google Cloud Platform
ConnectaDigital
 
Gcp data engineer
Narendranath Reddy T
 
Ad

Recently uploaded (20)

PDF
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna
 
PDF
MusicVideoProjectRubric Animation production music video.pdf
ALBERTIANCASUGA
 
PPTX
AI Presentation Tool Pitch Deck Presentation.pptx
ShyamPanthavoor1
 
PPTX
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
PPT
Data base management system Transactions.ppt
gandhamcharan2006
 
PPTX
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
PDF
Choosing the Right Database for Indexing.pdf
Tamanna
 
PDF
Building Production-Ready AI Agents with LangGraph.pdf
Tamanna
 
PDF
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
PPT
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
PPTX
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
PPTX
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
PPTX
Climate Action.pptx action plan for climate
justfortalabat
 
PPTX
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
PDF
Web Scraping with Google Gemini 2.0 .pdf
Tamanna
 
PPTX
apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...
apidays
 
PPTX
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
PDF
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
PDF
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
PDF
Data Chunking Strategies for RAG in 2025.pdf
Tamanna
 
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna
 
MusicVideoProjectRubric Animation production music video.pdf
ALBERTIANCASUGA
 
AI Presentation Tool Pitch Deck Presentation.pptx
ShyamPanthavoor1
 
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
Data base management system Transactions.ppt
gandhamcharan2006
 
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
Choosing the Right Database for Indexing.pdf
Tamanna
 
Building Production-Ready AI Agents with LangGraph.pdf
Tamanna
 
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
Climate Action.pptx action plan for climate
justfortalabat
 
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
Web Scraping with Google Gemini 2.0 .pdf
Tamanna
 
apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...
apidays
 
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
Data Chunking Strategies for RAG in 2025.pdf
Tamanna
 
Ad

Data Platform on GCP

  • 1. Proprietary + Confidential SQL Saturday - Los Angeles Data Platform on GCP Patrick Alexander Google - Customer Engineer Ex-Microsoft - Principal Cloud Solution Architect [email protected] @PatrickCloudArc
  • 2. Google Office Spruce Goose Playa Vista - California 2022
  • 3. Spruce Goose Hughes H-4 Hercules 1942 - 1947
  • 4. https://en.wikipedia.org/wiki/Hughes_H-4_Hercules November 2, 1947 Long Beach, California THE STATS Wingspan: 320′ 11″ Length: 218′ 8″ Height: 79′ 4″ Pounds, Empty Weight: 300,000 Cruise Speed: 135 MPH
  • 6. Intro to GCP’s Data Platform
  • 7. Confidential & Proprietary Agenda 01 The Data Landscape 02 Google Cloud Platform 03 Google Cloud Big Data Portfolio
  • 9. Confidential & Proprietary $203,579 In Amazon sales generated Data is surging every minute. How are you using it? 500 Hours of video uploaded on YouTube 142,361,111 Emails sent and received 2,083,333 Minutes used on Skype calls 347,222 Tweets posted 50,200 Mobile apps downloaded 1,389 Uber rides taken 2.4 Million Google searches made 216,000 Photos posted to Instagram *Stats may be out of date!
  • 11. Confidential & Proprietary Unintegrated Marketing Tools Many companies use 20+ separate tools It’s difficult to get a holistic view of customers. Company Data in Silos CRM / ERP / Billing / Inventory / POS
  • 12. Confidential & Proprietary If you want to unlock the power of your data, you need a CDP (customer data platform), not just new tools.
  • 14. Warehouse Cloud Storage Object Binary or object data Images, media serving, backups Memcache Key-value Web/mobile applications, gaming Game state, user sessions Non-relational Cloud Datastore Hierarchical, mobile, web User profiles, Game State Cloud Bigtable Heavy read + write, events AdTech, financial, IoT Relational Cloud SQL Web frameworks CMS, eCommerce Cloud Spanner RDBMS+scale, HA, HTAP Transactions, Ad/Fin/MarTech BigQuery Enterprise Data Warehouse Analytics, Dashboards Fully managed storage & database services
  • 15. A modern data warehouse on a comprehensive platform Data ingestion at any scale Reliable streaming data pipeline Advanced analytics Data lake and data warehousing Cloud Pub/Sub Cloud Dataflow Cloud Dataproc Cloud Storage Data Transfer Service Cloud Composer Cloud IoT Core Cloud Dataprep Cloud AI Services Google Data Studio Tensorflow Sheets Storage Transfer Service Data Catalog Cloud Data Fusion Process Capture Store Data warehousing Analyze BigQuery storage BigQuery analysis engine Use Apache Beam
  • 16. 16 A Leader in Cloud Data Warehouse Data Ingestion Data Lake Integration ML / Data Science Performance Scalability Google receives 5 of 5 in 19 different criteria, such as: Solution Roadmap Strategy Execution Customer Adoption Use Cases Partners The Forrester Wave™: Cloud Data Warehouse, Q1 2021, Noel Yuhanna The Forrester Wave™ is a graphical representation of Forrester's call on a market and is plotted using a detailed spreadsheet with exposed scores, weightings, and comments. Forrester does not endorse any vendor, product, or service depicted in the Forrester Wave. Information is based on best available resources. Opinions reflect judgment at the time and are subject to change.
  • 17. Google Cloud Big Data Portfolio
  • 18. GCP provides a full suite of storage service options ● Cost-effective ● Varied choices based on your: ○ Application ○ Workload Highlight rows using blue and white, just like the agenda Cloud Storage Cloud Bigtable Cloud Datastore Cloud SQL Cloud Spanner BigQuery Cloud Firestore
  • 19. Overview Ideal for ● Fully managed, highly reliable ● Images and videos ● Cost-efficient, scalable object/blob store ● Objects and blobs ● Objects access via HTTP requests ● Unstructured data ● Object name is the only key ● Static website hosting Cloud Storage Cloud Bigtable Cloud Datastore Cloud SQL Cloud Spanner BigQuery Cloud Firestore Cloud Storage
  • 20. Cloud Datastore Overview Ideal for ● Fully managed NoSQL ● Semi-structured application data ● Scalable ● Durable key-value data ● Hierarchical data ● Managing multiple indexes ● Transactions Cloud Storage Cloud Bigtable Cloud SQL Cloud Spanner BigQuery Cloud Firestore Cloud Datastore
  • 21. Cloud Firestore Overview Ideal for ● Fully managed, serverless, NoSQL ● Scalable ● Native mobile and web client libraries ● Real-time updates ● Document-oriented data ● Large collections of small documents ● Native mobile and web clients ● Durable key-value data ● Hierarchical data ● Managing multiple indexes ● Transactions Cloud Storage Cloud Bigtable Cloud Datastore Cloud SQL Cloud Spanner BigQuery Cloud Firestore
  • 22. Cloud Bigtable Overview Ideal for ● High performance wide column NoSQL database service ● Operational applications ● Sparsely populated table ● Analytical applications ● Can scale to billions of rows and thousands of columns ● Storing large amounts of single-keyed data ● Can store TB to PB of data ● MapReduce operations Cloud Storage Cloud Datastore Cloud SQL Cloud Spanner BigQuery Cloud Firestore Cloud Bigtable
  • 23. Cloud SQL Overview Ideal for ● Managed service ○ Replication ○ Failover ○ Backups ● Web frameworks ● MySQL, PostgreSQL, and SQL Server ● Structured data ● Relational database service ● OLTP workloads ● Proxy allows for secure access to your Cloud SQL Second Generation instances without whitelisting ● Applications using MySQL/PGS Cloud Storage Cloud Bigtable Cloud Datastore Cloud Spanner BigQuery Cloud Firestore Cloud SQL
  • 24. Cloud Spanner Overview Ideal for ● Mission-critical relational database service ● Mission-critical applications ● Transactional conspiracy ● High transactions ● Global scale ● Scale and consistency requirements ● High availability ● Multi-region replication ● 99.999% SLA Cloud Storage Cloud Bigtable Cloud Datastore Cloud SQL BigQuery Cloud Firestore Cloud Spanner
  • 25. BigQuery Overview Ideal for ● Low-cost enterprise data warehouse for analytics ● Online Analytical Processing (OLAP) workloads ● Fully managed ● Big data exploration and processing ● Petabyte scale ● Reporting via Business Intelligence (BI) tools ● Fast response times ● Serverless Cloud Storage Cloud Bigtable Cloud Datastore Cloud SQL Cloud Spanner Cloud Firestore BigQuery
  • 26. Product Simple Description Ideal for Not Ideal for Cloud Storage Binary/object store Large or rarely accessed unstructured data Structured data, building fast apps Datastore Scalable store for structured serve GAE apps, structured pure-serve use cases Relational or analytic data Firestore Cloud-native app data at global scale Real-time NoSQL database to store and sync data Mobile, web, multi-user, IoT & real-time applications Bigtable High-volume, low-latency database “Flat,” heavy read/write, or analytical data High structure or transactional data CloudSQL Well-understood VM-based RDBMS Web frameworks, existing applications Scaling, analytics, heavy writes Spanner Relational DB service Low-latency transactional systems Analytic data BigQuery Auto-scaling analytic data warehouse Interactive analysis of static datasets Building fast apps Storage at a glance
  • 28. Cloud SQL Cloud Spanner Cloud Datastore Cloud Bigtable BigQuery Cloud Firestore on Firebase Is your data structured? Is your workload analytics? Is your data relational? Do you need updates or low-latency? Do you need Mobile SDK’s? Do you need horizontal scalability? No Yes No Yes No Yes Yes No Yes No Yes No Do you need Mobile SDK’s? Firebase Storage Yes No Cloud Storage Which Google Cloud Database is right for me?
  • 29. A modern data warehouse on a comprehensive platform Data ingestion at any scale Reliable streaming data pipeline Advanced analytics Data lake and data warehousing Apache Beam Cloud Pub/Sub Cloud Dataflow Cloud Dataproc Cloud Storage Data Transfer Service Cloud Composer Cloud IoT Core Cloud Dataprep Cloud AI Services Google Data Studio Tensorflow Sheets Storage Transfer Service Data Catalog Cloud Data Fusion Process Capture Store Data warehousing Analyze BigQuery storage BigQuery analysis engine Use
  • 30. Cloud Storage Cloud Transfer Good for: Managed Bulk (arbitrary) data transfer Such as: Cloud migration, backup, legacy data Cloud Pub/Sub Streaming Batch Applications Data lifecycle - ingest Stackdriver Logging Good for: Centralized Log management solution Such as: Log data from Applications Cloud Pub/Sub Good for: Global, Scalable MQ, durable, de-couple apps Such as: IOT, User event, System metrics Cloud SQL Good for: Structured data, Web frameworks Such as: Meta-data, Fintech, AdTech Cloud Datastore Good for: Hierarchical, Mobile, Web Such as: User profile, Game states Cloud Bigtable Good for: Heavy read/write, events Such as: IOT, User/system events, low latency systems Cloud Firestore Cloud Spanner Good for: RDBMS, SQL, Horizontal scaling Such as: Meta-data, Fintech, AdTech Good for: Hierarchical, Mobile, Web Such as: User profile, Game states Good for: Global, Scalable MQ, durable, de-couple apps Such as: IOT, User event, System metrics Good for: Binary, Object data Such as: Images, Media serving, Backup
  • 31. AutoML Video Intelligence AutoML Vision Good for: Object/face detection, emotional facial attributes, Safe search, real time or batch, OCR Good for: Video metadata, entity analysis, granularity of 1 frame per second, Video catalog (timestamped) entity search Data Analysis Task specific Machine Learning Large scale data processing Data lifecycle - process and analyze Cloud Dataproc Good for: Managed hadoop eco-systems Such as: Batch and streaming analytics over Big Data, Machine Learning Cloud Dataflow Good for: Unified abs. for batch & streaming data. Such as: New pipelines, Windowing operations, Watermarking Cloud Dataprep Good for: UI Driven data preparation Such as: Pre-step to Big data jobs (Dataproc/Data Flow), Machine Learning BigQuery Vertex AI Platform Good for: General purpose ML platform. Such as: Data scientists, ML on Data warehouse Custom ML Cloud Dataproc Good for: Managed hadoop eco-systems Such as: ML Jobs using Mahour/Spark MLLib AutoML Translation AutoML NLP Good for: Structure and meaning of text, sentiment analysis Good for: Auto translation of 90 languages, language detection, both real time and batch AutoML Tables Good for: Analyse structured data, find data traits, data label and target feature selection Good for: Enterprise Data Warehouse Such as: Analytics, Dashboards, Business Intelligence, Basic Machine Learning
  • 32. Cloud Datalab Connected Sheets Good for: Jupyter notebooks for general purpose data visualization Good for: Using Google App script ability to run BigQuery Query. Usually for quick short analysis on smaller datasets Google Data Studio Good for: Drag and Drop report builder from Google Sheets, BigQuery, Cloud storage files, SQL Business Intelligence Spreadsheet Data Science Data lifecycle - explore and visualize Looker Good for: Custom applications, embedded visualizations, data science workflows, Integrates with BigQuery Cloud Dataprep Good for: UI Driven data preparation and visualization. Also used as Pre-step to Big data jobs (Dataproc/Dat aFlow), Machine Learning
  • 33. Big Data Reference Architecture
  • 34. Data Science Reference Architecture
  • 39. Proprietary + Confidential SQL:2011 Compliant Petabit Network BigQuery High-Available Cluster Compute (Dremel) Streaming Ingest Free Bulk Loading Replicated, Distributed Storage (99.9999999999% durability) REST API Client libraries for: C#, Go, Java, Node.js, PHP, Python, Ruby Web UI, CLI Distributed Memory Shuffle Tier BigQuery | Architecture Decoupled storage and compute for maximum flexibility
  • 40. Proprietary + Confidential Economic value - Data Warehouse Migration lowers your TCO massively ES G 2019 : The Economic advantage of migrating Data Warehouse Workloads to BigQuery 52% Lower TCO (versus on-premises) 41% Lower TCO (vs Teradata on AWS) TCO Calculator Expected 3-year total cost of ownership Teradata on-premises $0 Teradata on AWS Google BigQuery $2,000,000 $4,000,000 $6,000,000 $8,000,000 $10,000,000 $12,000,000 $14,000,000 $16,000,000 41% lower TCO (vs EDW on AWS) 52% lower TCO vs Legacy TD on-prem Up-front Capital Investment Monthly Cloud spend Administrative costs Planning/deployment/migration Power/cooling/floorspace ESG 2019
  • 41. Proprietary + Confidential Google Cloud provides the most modern data warehouse Impact Google Cloud BigQuery Teradata on-prem AWS RedShift Snowflake Azure Synapse Analytics Scale ✓ Fully managed and serverless ✓ Petabyte-scale ✓ No warm-up or maintenance ✕ Tied to cluster ✕ Significant performance bottlenecks ✕ Tied to cluster (RedShift Spectrum is serverless) ✕ Considerable amount of tuning needed ✕ Huge performance bottlenecks ✕ Reclustering, shuffles, and loads hurt performance ✕ SSDs tied to VMs ✕ Significant performance bottlenecks on large data ✕ Compute has to be scaled up manually ✕ Capacity limits based on instance size Real-time ✓ Streaming data ✓ BI Engine ✓ Streaming SQL ✓ Streaming data, dashboards, SQL ✓ Streaming data, dashboards, SQL ✕ Poor streaming performance ✓ Streaming data, dashboards ✕ Requires Databricks for streaming scenarios AI support ✓ Built-in BigQuery ML ✓ Two-way connections to AI Platform ✓ Storage API for Spark/Dataproc ✓ Some built-in ML ✕ Only basic techniques ✕ No SQL-based ML ✓ Integration with Sage Maker ✕ No SQL-based AI/ML ✕ No high-performance support for Spark ✕ No SQL-based ML ✕ Just a rebrand of three separate products; no deep integration Data security ✓ Encrypted at rest and in transit ✓ Immutable audit logs ✓ Data Catalog ✓ DLP API for redaction ✓ Integrated security ✓ Integrated security ✕ Partner tool (DgSecure) needed for redaction ✕ No VPC-SC means no guards against data exfiltration ✕ Standalone authentication system ✕ No native redaction capability ✓ Integrated security ✕ Patches applied during maintenance windows, with downtime
  • 42. BigQuery Hands on Lab https://google.qwiklabs.com/focuses/1145?parent=catalog Qwiklab
  • 44. Proprietary + Confidential NDA Performance at Scale Petabyte scale, automated, and intelligent - lets your enterprise focus on delivering insights not infrastructure Built-in advanced analytics capabilities Completely automated and serverless Manual configuration Workloads and analytics Degree of automation BigQuery Ad-hoc reporting, operational insight Basic reporting Legacy DW
  • 45. Proprietary + Confidential Expected 3-Year Total Cost of Ownership 52% Lower TCO1 (versus on-premises) 26-34% Lower TCO2 (vs other Cloud DW’s) Flat-rate and variable pricing options to give customers control over TCO 1) Migrating Enterprise Data Warehouse Workloads - ESG 2019 2) Google BigQuery vs. Alternative Cloud-based EDW Solutions - ESG 2019 Economic Value - BigQuery lowers your data warehouse TCO massively
  • 46. How Google’s Smart Analytics Platform is Unique in the Industry Scale ✓ Partial ✕ BigQuery is fully managed, serverless and architected for petabyte scale. While others are tied to clusters or require manual reclustering efforts BQ manages the infrastructure for you and allows your teams to focus on delivering insights Total Cost of Ownership ✓ ✕ ✕ BigQuery eliminates the need for upfront investment and planning for your EDW, reduces operational and administrative expenses - all while delivering on business agility. Enterprise Strategy Group (ESG) estimated savings of 26-34% over cloud-based EDW alternatives and >40% over legacy on-premise solutions Interoperability ✓ ✕ ✕ BigQuery provides a unified, interoperable best of breed platform across your Data Warehouse and Data Lakes and data integration across on-prem and cloud sources. BQ was made to tear down data silos and allow you to avoid creating new ones. Democratized ML/AI ✓ ✕ ✕ BigQuery democratizes Machine Learning for the enterprise user (not just data scientists) with accessible capabilities using SQL. While allowing for more sophisticated data science teams to access the power of Google’s leading edge AI technologies via Cloud AI. More than 80% of our BigQuery customers have incorporated ML into their business analysis Reliable & Secure ✓ Partial Partial BigQuery offers robust security, governance and reliability that is unmatched in the industry. High availability and a 99.99% SLA, automatic data replication, restore and backup to ensure business continuity. Ability to classify and redact sensitive data, fine-grained identity and access management including access transparency so you can log each view. Data is encrypted at rest and in transit by default, and customer-managed encryption keys provide control over your data Real-Time ✓ ✕ ✕ Designed to excel in IoT and other scenarios where your analysis depends on real-time streaming data as well as a BI acceleration engine for high-concurrency low-latency use cases - both are unique differentiators for Google Cloud and essential for businesses that need to make real time decisions Usefully Multi-Cloud ✓ ✕ ✕ BigQuery breaks down the silos to provide a single pane of glass for all your data across multiple clouds (AWS. Azure). Most other vendors are focused on providing the same service running in 3 clouds but these are 3 silos. BigQuery breaks the silo and enables customers to analyze data across datasets Industry Leadership ✓ ? ? Recognized industry leader by both Gartner and Forrester in Data Management and Analytics. With 9 Google products with more than a billion users running on our platform you can be sure that big data is in our DNA and we are ready to help your business build a future ready data platform L e g a c y S o l u t i o n s