SlideShare a Scribd company logo
1
Google Cloud & Data Pipeline
Patterns
@LynnLangit
2
Google Cloud in Australia
Data center here in 2017
3
GCP and Patterns
Developer-first
• Fast, flexible and cheap
• Virtual Machines / GCE
• Storage / GCS
Servers ➡ Containers ➡ Functions
• Data Warehouse
• Internet of Things (IoT)
• Bioinformatics
1. Modern Cloud by Example 2. GCP Data Pipeline Patterns
**And also, something New…
4Confidential & ProprietaryGoogle Cloud Platform 4
Demo – Storage / GCS
5
6Confidential & ProprietaryGoogle Cloud Platform 6
Demo – Virtual Machines / GCE
7
Virtual Machines /
GCE
• Fast
• Spin up in seconds
• Tools - SSH, gcloud console
• Flexible
• Custom sizing – slider 
• OS variety – Linux or Windows
• Cheap and Simple
• Auto discount for use
• Pre-emptible
Storage / GCS
• Fast
• Very fast within region
• Tools included
• Flexible
• 4 storage options
• Simple to use / understand
• Cheap
• Pricing by type
8
9
Pipeline Architectures
10Google Cloud Platform 10
Data Warehousing
11
Big Data > Data Warehouse
Reference table
Query / Compute
BigQuery
Customer Lists / Reference
Data
Export Ad
Data
Cloud Storage
Id matching
Cloud Dataflow
Marketing List
DoubleClick
Campaign Manager
Google Analytics
Relevant Users
Cloud Storage
Analysts
DataStudio
360
Dashboards
12Confidential & ProprietaryGoogle Cloud Platform 12
Demo – BigQuery
13
Batch
Streaming
Big Data > Log Processing
Log Storage
Cloud Storage
Log Streaming
Cloud Pub/Sub
Log Analytics
BigQuery
Log Processing
Cloud Dataflow
14
Cloud Dataflow /
Apache Beam
15
Big Data > Time Series Analysis
Batch Storage
BigQuery
Storage
Cloud Storage
Time Series Processing
Cloud Dataflow
Analysis
Cloud Datalab
Storage
Cloud
Bigtable*
Processing
Cloud Dataproc
Time Series Files
Cloud Storage
ML
Cloud ML
Streaming
Time Series Streaming
Cloud Pub/Sub
*Note: Use Bigtable with
NoSQL workloads of 1 TB or more
16
Streaming
Big Data > Complex Event Processing
Cloud Apps
Compute Engine
Streamin
g
Batch
Push to Devices
App Engine
Rules Engine
Cloud Dataflow Data Analysis
Cloud Datalab
Mobile Devices
Push Notifications
Report &
Share
Business Analysis
Cloud Apps
Compute
Engine
On-Premises
Databases
On-Premises
Applications
Processed Events
Cloud Bigtable
Events Time Series
Data
Warehouse
BigQuery
Execution Results
Streaming
Cloud Pub/Sub
Transactions
Processing
Cloud Dataflow
Transaction Streams
Messaging
Cloud Pub/Sub
Rules Actions
ETL
Cloud Dataflow
Transform Data
Cloud Data
Cloud Storage
Rules Engine
Cloud Dataproc
1717
Files
• Cloud Storage
Compute
• Big Query
• Cloud Dataflow
Other
• 3rd party ETL
• 3rd party dashboards
Core Products for Data
Warehousing
More on Big Query…
• Interactive or Batch query
• ANSI SQL compliant
• Cost control - Purchase ‘slots’
• NoOps Data Warehouse
18Google Cloud Platform 18
Big Relational
1919
What is Spanner?
20Confidential & ProprietaryGoogle Cloud Platform 20
Demo – Cloud Spanner
21Google Cloud Platform 21
Internet of Things
22
Internet of Things > MQTT
IoT Warehouse
BigQuery
IoT Application
App Engine
Stream Analytics
Cloud Dataflow
IoT Topic
Cloud Pub/Sub
MQTT
Devices
Auto-scaled Broker
Tier
Custom MQTT broker
MQTT Broker
Compute Engine
RabbitMQ
Cloud Load
Balancing
23
Ingest Pipelines
Storage
Analytics
Application &
Presentation
Standard
Devices
HTTPS
Constraine
d
Devices
Non-TCP
e.g. BLE
Gateway
Internet of Things > Sensor stream ingest and
processing
App
Engine
Container
Engine
Cloud
Storage
Cloud
Pub/Sub
Cloud
Dataflow
Monitoring
Logging
Cloud
Dataflow
Cloud
Datastore
Cloud
Bigtable
BigQuer
y
Cloud
Dataproc
Cloud
Datalab
Compute
Engine
24
Retail > Beacons and Targeted Marketing
Events
Cloud Bigtable
Proximity Events
Analytics
BigQuery
Data Warehouse
Messaging
Cloud Pub/Sub
Proximity Streams
Processing
Cloud Dataflow
Stream Processing
Notifications
App Engine
Push to Devices
Mobile-Push
Notifications
Office Business
Systems
Beacons
Proximity
Notifications
Messaging
Cloud Pub/Sub
Queued Notifications
2525
Files & Storage
• Cloud Storage
• Big Table
Compute & Ingest
• Cloud Pub/Sub
• Big Query
• Cloud Dataflow
Core Products for IoT
26Confidential & ProprietaryGoogle Cloud Platform 26
Demo – Machine Learning
27Google Cloud Platform 27
Bioinformatics
28
Patient
Analytics
Life Sciences > Patient Monitoring
Analytics
Process Data
Prediction API
Ingest
Cloud Pub/Sub
Storage
Cloud Bigtable
Alerts
Notifications
Cloud Pub/Sub
Health Care
Professional
Patient Monitors
(pulse, blood
sugar, exercise)
29
Private Datasets Public Datasets
Life Sciences > Variant Analysis
MSSNG Autism
Cloud Storage
Scientist
High
Throughput
Genome
Sequencers
1000 Genomes
Cloud Storage
Patient Data
Cloud Storage
Illumina Platform
Cloud Storage
Ref Genomes
Cloud Storage
TCGA
Cloud Storage
Analytics
Online Analytics
BigQuery
Batch Analytics
Cloud Dataflow
Lab Notebooks
Cloud Datalab
Data Ingest
Genomics
BAM
FAST
Q
30
Ingest
Elastic Cluster
Storage
Analytics
Life Sciences > Genomics, Secondary Analysis
Carrier
Interconnect
High
Throughput
Genome
Sequencer
s
Scientist
Raw Datafiles
Cloud Storage
Processed Data
Cloud Storage
Metadata
Cloud SQL
Lab notebooks
Cloud Datalab
HPC Cluster
Compute
Engine
10 Nodes
Ingest Server
Compute
Engine
Online Analytics
BigQuery
Cloud Load
Balancing
Cloud
Network
3131
• Cloud Storage
• Big Query
• Compute Engine
• Cloud Dataflow
• Public datasets on GCP
Core Products for
Bioinformatics
33
“The Future is Functional”
@LynnLangit

More Related Content

What's hot (20)

PDF
Building an open data platform with apache iceberg
Alluxio, Inc.
 
PDF
adb.pdf
AdityaMehta724216
 
PDF
Data Mesh
Piethein Strengholt
 
PDF
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Yohei Onishi
 
PDF
BigQuery for Beginners
Better&Stronger
 
PDF
Data Catalog for Better Data Discovery and Governance
Denodo
 
PPTX
Building Modern Data Platform with Microsoft Azure
Dmitry Anoshin
 
PDF
Building Robust ETL Pipelines with Apache Spark
Databricks
 
PDF
Google Cloud Dataflow
Alex Van Boxel
 
PDF
3D: DBT using Databricks and Delta
Databricks
 
PDF
Big Query Basics
Ido Green
 
PDF
Apache Kafka in the Airline, Aviation and Travel Industry
Kai Wähner
 
PPTX
Introducing the Snowflake Computing Cloud Data Warehouse
Snowflake Computing
 
PDF
Future of Data Engineering
C4Media
 
PPTX
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Flink Forward
 
PPTX
Data Lake Overview
James Serra
 
PPTX
Real-time Analytics with Trino and Apache Pinot
Xiang Fu
 
PPTX
Databricks Platform.pptx
Alex Ivy
 
PDF
Change Data Feed in Delta
Databricks
 
PDF
Google BigQuery
Matthias Feys
 
Building an open data platform with apache iceberg
Alluxio, Inc.
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Yohei Onishi
 
BigQuery for Beginners
Better&Stronger
 
Data Catalog for Better Data Discovery and Governance
Denodo
 
Building Modern Data Platform with Microsoft Azure
Dmitry Anoshin
 
Building Robust ETL Pipelines with Apache Spark
Databricks
 
Google Cloud Dataflow
Alex Van Boxel
 
3D: DBT using Databricks and Delta
Databricks
 
Big Query Basics
Ido Green
 
Apache Kafka in the Airline, Aviation and Travel Industry
Kai Wähner
 
Introducing the Snowflake Computing Cloud Data Warehouse
Snowflake Computing
 
Future of Data Engineering
C4Media
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Flink Forward
 
Data Lake Overview
James Serra
 
Real-time Analytics with Trino and Apache Pinot
Xiang Fu
 
Databricks Platform.pptx
Alex Ivy
 
Change Data Feed in Delta
Databricks
 
Google BigQuery
Matthias Feys
 

Viewers also liked (20)

PPTX
Scaling Galaxy on Google Cloud Platform
Lynn Langit
 
PPTX
Introduction to Google Cloud Platform
dhruv_chaudhari
 
PDF
A Tour of Google Cloud Platform
Colin Su
 
PDF
The journey of Moving from AWS ELK to GCP Data Pipeline
Randy Huang
 
PPTX
New AWS Services for Bioinformatics
Lynn Langit
 
PDF
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Chris Jang
 
PPTX
Netflix Data Pipeline With Kafka
Allen (Xiaozhong) Wang
 
PDF
Beyond Relational
Lynn Langit
 
PDF
Firebase para se divertir com Internet das Coisas
Luís Leão
 
PDF
Ad Personalization at Spotify: Iterative Enginering and Product Development -...
Hakka Labs
 
PDF
Google Tech Talk with Dr. Eric Brewer in Korea Apr.27.2015
Chris Jang
 
PPTX
Machine Learning on the Microsoft Stack
Lynn Langit
 
PDF
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
Jaroslav Gergic
 
PPTX
Big Data Pipeline and Analytics Platform Using NetflixOSS and Other Open Sour...
Sudhir Tonse
 
PPTX
Data Pipeline at Tapad
Toby Matejovsky
 
PDF
Google Cloud Platform 2014Q1 - Starter Guide
Simon Su
 
PDF
Modern Machine Learning Infrastructure and Practices
Will Gardella
 
PDF
Building Enterprise Applications on Google Cloud Platform Cloud Computing Exp...
Chris Schalk
 
PDF
Serverless architecture with AWS Lambda (June 2016)
Julien SIMON
 
PDF
Square's Machine Learning Infrastructure and Applications - Rong Yan
Hakka Labs
 
Scaling Galaxy on Google Cloud Platform
Lynn Langit
 
Introduction to Google Cloud Platform
dhruv_chaudhari
 
A Tour of Google Cloud Platform
Colin Su
 
The journey of Moving from AWS ELK to GCP Data Pipeline
Randy Huang
 
New AWS Services for Bioinformatics
Lynn Langit
 
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Chris Jang
 
Netflix Data Pipeline With Kafka
Allen (Xiaozhong) Wang
 
Beyond Relational
Lynn Langit
 
Firebase para se divertir com Internet das Coisas
Luís Leão
 
Ad Personalization at Spotify: Iterative Enginering and Product Development -...
Hakka Labs
 
Google Tech Talk with Dr. Eric Brewer in Korea Apr.27.2015
Chris Jang
 
Machine Learning on the Microsoft Stack
Lynn Langit
 
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
Jaroslav Gergic
 
Big Data Pipeline and Analytics Platform Using NetflixOSS and Other Open Sour...
Sudhir Tonse
 
Data Pipeline at Tapad
Toby Matejovsky
 
Google Cloud Platform 2014Q1 - Starter Guide
Simon Su
 
Modern Machine Learning Infrastructure and Practices
Will Gardella
 
Building Enterprise Applications on Google Cloud Platform Cloud Computing Exp...
Chris Schalk
 
Serverless architecture with AWS Lambda (June 2016)
Julien SIMON
 
Square's Machine Learning Infrastructure and Applications - Rong Yan
Hakka Labs
 
Ad

Similar to Google Cloud and Data Pipeline Patterns (20)

PDF
Getting more into GCP.pdf
Knoldus Inc.
 
PDF
Getting started with GCP ( Google Cloud Platform)
bigdata trunk
 
PPTX
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020
Mariano Gonzalez
 
PDF
IoT NY - Google Cloud Services for IoT
James Chittenden
 
PDF
Bridge to Cloud: Using Apache Kafka to Migrate to GCP
confluent
 
PDF
Openbar Kontich // Google Cloud: past, present and the (oh so sweet) future b...
Openbar
 
PDF
Data Platform on GCP
Patrick Alexander
 
PDF
Google Cloud Dataflow
GirdhareeSaran
 
PDF
Introduction to Google Cloud Platform
Sujai Prakasam
 
PDF
Serhii Kholodniuk: What you need to know, before migrating data platform to G...
Lviv Startup Club
 
PDF
Google Cloud - Stand Out Features
GDG Cloud Bengaluru
 
PDF
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
Edwin Poot
 
PPTX
Google Cloud Platform
Balvinder Hira
 
PPTX
GDSC Cloud Jam.pptx
GDSCIITBhilai
 
PPTX
Eric Andersen Keynote
Data Con LA
 
PDF
Modern Thinking área digital MSKM 21/09/2017
MSMK - Madrid School of Marketing
 
PDF
Google's Infrastructure and Specific IoT Services
Intel® Software
 
PPTX
GCP Data Engineering Online Training in Hyderabad - GCP.pptx
sivavisualpath
 
PDF
Google Cloud Platform Introduction - 2016Q3
Simon Su
 
PPTX
Introduction to Google Cloud Platform for Big Data - Trusted Conf
In Marketing We Trust
 
Getting more into GCP.pdf
Knoldus Inc.
 
Getting started with GCP ( Google Cloud Platform)
bigdata trunk
 
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020
Mariano Gonzalez
 
IoT NY - Google Cloud Services for IoT
James Chittenden
 
Bridge to Cloud: Using Apache Kafka to Migrate to GCP
confluent
 
Openbar Kontich // Google Cloud: past, present and the (oh so sweet) future b...
Openbar
 
Data Platform on GCP
Patrick Alexander
 
Google Cloud Dataflow
GirdhareeSaran
 
Introduction to Google Cloud Platform
Sujai Prakasam
 
Serhii Kholodniuk: What you need to know, before migrating data platform to G...
Lviv Startup Club
 
Google Cloud - Stand Out Features
GDG Cloud Bengaluru
 
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
Edwin Poot
 
Google Cloud Platform
Balvinder Hira
 
GDSC Cloud Jam.pptx
GDSCIITBhilai
 
Eric Andersen Keynote
Data Con LA
 
Modern Thinking área digital MSKM 21/09/2017
MSMK - Madrid School of Marketing
 
Google's Infrastructure and Specific IoT Services
Intel® Software
 
GCP Data Engineering Online Training in Hyderabad - GCP.pptx
sivavisualpath
 
Google Cloud Platform Introduction - 2016Q3
Simon Su
 
Introduction to Google Cloud Platform for Big Data - Trusted Conf
In Marketing We Trust
 
Ad

More from Lynn Langit (20)

PPTX
VariantSpark on AWS
Lynn Langit
 
PPTX
Serverless Architectures
Lynn Langit
 
PPTX
10+ Years of Teaching Kids Programming
Lynn Langit
 
PPTX
Blastn plus jupyter on Docker
Lynn Langit
 
PDF
Testing in Ballerina Language
Lynn Langit
 
PPTX
Teaching Kids to create Alexa Skills
Lynn Langit
 
PPTX
Practical cloud
Lynn Langit
 
PPTX
Understanding Jupyter notebooks using bioinformatics examples
Lynn Langit
 
PPTX
Genome-scale Big Data Pipelines
Lynn Langit
 
PPTX
Teaching Kids Programming
Lynn Langit
 
PPTX
Practical Cloud
Lynn Langit
 
PPTX
Serverless Reality
Lynn Langit
 
PPTX
Genomic Scale Big Data Pipelines
Lynn Langit
 
PPTX
VariantSpark - a Spark library for genomics
Lynn Langit
 
PPTX
Bioinformatics Data Pipelines built by CSIRO on AWS
Lynn Langit
 
PPTX
Serverless Reality
Lynn Langit
 
PPTX
SQL Server on Google Cloud Platform
Lynn Langit
 
PPTX
Redis Labs and SQL Server
Lynn Langit
 
PPT
Building a data warehouse with AWS Redshift, Matillion and Yellowfin
Lynn Langit
 
PPTX
What is 'Teaching Kids Programming'
Lynn Langit
 
VariantSpark on AWS
Lynn Langit
 
Serverless Architectures
Lynn Langit
 
10+ Years of Teaching Kids Programming
Lynn Langit
 
Blastn plus jupyter on Docker
Lynn Langit
 
Testing in Ballerina Language
Lynn Langit
 
Teaching Kids to create Alexa Skills
Lynn Langit
 
Practical cloud
Lynn Langit
 
Understanding Jupyter notebooks using bioinformatics examples
Lynn Langit
 
Genome-scale Big Data Pipelines
Lynn Langit
 
Teaching Kids Programming
Lynn Langit
 
Practical Cloud
Lynn Langit
 
Serverless Reality
Lynn Langit
 
Genomic Scale Big Data Pipelines
Lynn Langit
 
VariantSpark - a Spark library for genomics
Lynn Langit
 
Bioinformatics Data Pipelines built by CSIRO on AWS
Lynn Langit
 
Serverless Reality
Lynn Langit
 
SQL Server on Google Cloud Platform
Lynn Langit
 
Redis Labs and SQL Server
Lynn Langit
 
Building a data warehouse with AWS Redshift, Matillion and Yellowfin
Lynn Langit
 
What is 'Teaching Kids Programming'
Lynn Langit
 

Recently uploaded (20)

PPT
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PDF
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PPTX
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 

Google Cloud and Data Pipeline Patterns

  • 1. 1 Google Cloud & Data Pipeline Patterns @LynnLangit
  • 2. 2 Google Cloud in Australia Data center here in 2017
  • 3. 3 GCP and Patterns Developer-first • Fast, flexible and cheap • Virtual Machines / GCE • Storage / GCS Servers ➡ Containers ➡ Functions • Data Warehouse • Internet of Things (IoT) • Bioinformatics 1. Modern Cloud by Example 2. GCP Data Pipeline Patterns **And also, something New…
  • 4. 4Confidential & ProprietaryGoogle Cloud Platform 4 Demo – Storage / GCS
  • 5. 5
  • 6. 6Confidential & ProprietaryGoogle Cloud Platform 6 Demo – Virtual Machines / GCE
  • 7. 7 Virtual Machines / GCE • Fast • Spin up in seconds • Tools - SSH, gcloud console • Flexible • Custom sizing – slider  • OS variety – Linux or Windows • Cheap and Simple • Auto discount for use • Pre-emptible Storage / GCS • Fast • Very fast within region • Tools included • Flexible • 4 storage options • Simple to use / understand • Cheap • Pricing by type
  • 8. 8
  • 10. 10Google Cloud Platform 10 Data Warehousing
  • 11. 11 Big Data > Data Warehouse Reference table Query / Compute BigQuery Customer Lists / Reference Data Export Ad Data Cloud Storage Id matching Cloud Dataflow Marketing List DoubleClick Campaign Manager Google Analytics Relevant Users Cloud Storage Analysts DataStudio 360 Dashboards
  • 12. 12Confidential & ProprietaryGoogle Cloud Platform 12 Demo – BigQuery
  • 13. 13 Batch Streaming Big Data > Log Processing Log Storage Cloud Storage Log Streaming Cloud Pub/Sub Log Analytics BigQuery Log Processing Cloud Dataflow
  • 15. 15 Big Data > Time Series Analysis Batch Storage BigQuery Storage Cloud Storage Time Series Processing Cloud Dataflow Analysis Cloud Datalab Storage Cloud Bigtable* Processing Cloud Dataproc Time Series Files Cloud Storage ML Cloud ML Streaming Time Series Streaming Cloud Pub/Sub *Note: Use Bigtable with NoSQL workloads of 1 TB or more
  • 16. 16 Streaming Big Data > Complex Event Processing Cloud Apps Compute Engine Streamin g Batch Push to Devices App Engine Rules Engine Cloud Dataflow Data Analysis Cloud Datalab Mobile Devices Push Notifications Report & Share Business Analysis Cloud Apps Compute Engine On-Premises Databases On-Premises Applications Processed Events Cloud Bigtable Events Time Series Data Warehouse BigQuery Execution Results Streaming Cloud Pub/Sub Transactions Processing Cloud Dataflow Transaction Streams Messaging Cloud Pub/Sub Rules Actions ETL Cloud Dataflow Transform Data Cloud Data Cloud Storage Rules Engine Cloud Dataproc
  • 17. 1717 Files • Cloud Storage Compute • Big Query • Cloud Dataflow Other • 3rd party ETL • 3rd party dashboards Core Products for Data Warehousing More on Big Query… • Interactive or Batch query • ANSI SQL compliant • Cost control - Purchase ‘slots’ • NoOps Data Warehouse
  • 18. 18Google Cloud Platform 18 Big Relational
  • 20. 20Confidential & ProprietaryGoogle Cloud Platform 20 Demo – Cloud Spanner
  • 21. 21Google Cloud Platform 21 Internet of Things
  • 22. 22 Internet of Things > MQTT IoT Warehouse BigQuery IoT Application App Engine Stream Analytics Cloud Dataflow IoT Topic Cloud Pub/Sub MQTT Devices Auto-scaled Broker Tier Custom MQTT broker MQTT Broker Compute Engine RabbitMQ Cloud Load Balancing
  • 23. 23 Ingest Pipelines Storage Analytics Application & Presentation Standard Devices HTTPS Constraine d Devices Non-TCP e.g. BLE Gateway Internet of Things > Sensor stream ingest and processing App Engine Container Engine Cloud Storage Cloud Pub/Sub Cloud Dataflow Monitoring Logging Cloud Dataflow Cloud Datastore Cloud Bigtable BigQuer y Cloud Dataproc Cloud Datalab Compute Engine
  • 24. 24 Retail > Beacons and Targeted Marketing Events Cloud Bigtable Proximity Events Analytics BigQuery Data Warehouse Messaging Cloud Pub/Sub Proximity Streams Processing Cloud Dataflow Stream Processing Notifications App Engine Push to Devices Mobile-Push Notifications Office Business Systems Beacons Proximity Notifications Messaging Cloud Pub/Sub Queued Notifications
  • 25. 2525 Files & Storage • Cloud Storage • Big Table Compute & Ingest • Cloud Pub/Sub • Big Query • Cloud Dataflow Core Products for IoT
  • 26. 26Confidential & ProprietaryGoogle Cloud Platform 26 Demo – Machine Learning
  • 27. 27Google Cloud Platform 27 Bioinformatics
  • 28. 28 Patient Analytics Life Sciences > Patient Monitoring Analytics Process Data Prediction API Ingest Cloud Pub/Sub Storage Cloud Bigtable Alerts Notifications Cloud Pub/Sub Health Care Professional Patient Monitors (pulse, blood sugar, exercise)
  • 29. 29 Private Datasets Public Datasets Life Sciences > Variant Analysis MSSNG Autism Cloud Storage Scientist High Throughput Genome Sequencers 1000 Genomes Cloud Storage Patient Data Cloud Storage Illumina Platform Cloud Storage Ref Genomes Cloud Storage TCGA Cloud Storage Analytics Online Analytics BigQuery Batch Analytics Cloud Dataflow Lab Notebooks Cloud Datalab Data Ingest Genomics BAM FAST Q
  • 30. 30 Ingest Elastic Cluster Storage Analytics Life Sciences > Genomics, Secondary Analysis Carrier Interconnect High Throughput Genome Sequencer s Scientist Raw Datafiles Cloud Storage Processed Data Cloud Storage Metadata Cloud SQL Lab notebooks Cloud Datalab HPC Cluster Compute Engine 10 Nodes Ingest Server Compute Engine Online Analytics BigQuery Cloud Load Balancing Cloud Network
  • 31. 3131 • Cloud Storage • Big Query • Compute Engine • Cloud Dataflow • Public datasets on GCP Core Products for Bioinformatics
  • 32. 33 “The Future is Functional” @LynnLangit

Editor's Notes

  • #20: https://cloud.google.com/spanner/ https://research.google.com/pubs/pub45855.html https://static.googleusercontent.com/media/research.google.com/en//archive/spanner-osdi2012.pdf
  • #34: Icon and sample diagrams landing page https://cloud.google.com/icons