SlideShare a Scribd company logo
Powering Interactive Data
Analysis with Google BigQuery
Márton KODOK
@martonkodok
Software Architect @ REEA.net
Everycompany,
no matter how far from the tech they are,
isevolvingintoasoftwarecompany,
and by extension a datacompany.
Turning everything into “data” drives innovation
Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
For a small company it’simportant
to have access to modernBigDatatools
withoutrunningadedicatedteam for it.
Small companies should do BigData - but how?
Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
❏ Need backend/database to STORE, QUERY, EXTRACT data
❏ Deep analytics - large, multi-source, complex, unstructured
❏ Be real time
❏ Terabyte scale - Cost effective
❏ Run Ad-Hoc reports - Without Developer - interactive
❏ Minimal engineering efforts - no dedicated BigData team
❏ Simple Query language (prefered SQL / Javascript)
Making analytics accessible to more companies
Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
Legacy Business Reporting System
Web
Mobile
Web Server
Database
SQL
Cached
Platform Services
CMS/Framework
Report & Share
Business Analysis
Scheduled
Tasks
Batch Processing
Compute Engine
Multiple Instances
Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
Web
Mobile
Web Server
Database
SQL
Cached
Platform Services
CMS/Framework
Report & Share
Business Analysis
Scheduled
Tasks
Batch Processing
Compute Engine
Multiple Instances
BehindtheScenes:
DaysToInsights
Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
Legacy Business Reporting System
Web
Mobile
Web Server
Database
SQL
Cached
Platform Services
CMS/Framework
Report & Share
Business Analysis
Scheduled
Tasks
Batch Processing
Compute Engine
Multiple Instances
Minutes
to kick in
Hours to Run
Batch Processing
Hours to Clean and
Aggregate
DAYS TO
INSIGHTS
Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
● Analytics-as-a-Service - Data Warehouse in the Cloud
● Fully-Managed by Google (US or EU zone)
● Scales into Petabytes
● Ridiculously fast
● SQL 2011 Standard + Javascript UDF (User Defined Functions)
● Familiar DB Structure (table, views, record, nested, JSON)
● Open Interfaces (REST, ODBC, Web UI, BQ command line tool)
● Integrates with Google Sheets + Google Cloud Storage + Pub/Sub connectors
● Decent pricing (queries $5/TB, storage: $20/TB cold: $10/TB) *Oct 2017
What is BigQuery?
Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
Architecting for The Cloud
BigQuery
On-Premises Servers
Pipelines
ETL
Engine
Event Sourcing
Frontend
Platform Services
Metrics / Logs/
Streaming
Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
Data Pipeline Integration at REEA.net
Analytics Backend
BigQuery
On-Premises Servers
Pipelines
FluentD
Event Sourcing
Frontend
Platform Services
Metrics / Logs/
Streaming
Development
Team
Data Analysts
Report & Share
Business Analysis
Tools
Tableau
QlikView
Data Studio
Internal
Dashboard
Database
SQL
Application
ServersServers
Cloud Storage
archive
Load
Export
Replay
Standard
Devices
HTTPS
Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
The following slides will present a sample Fluentd configuration to:
1. Transform a record
2. Copy event to multiple outputs
3. Store event data in File (for backup/log purposes)
4. Stream to BigQuery (for immediate analyses)
Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
<filter frontend.user.*>
@type record_transformer
</filter>
<match frontend.user.*>
@type copy
<store>
@type forest
subtype file
</store>
<store>
@type bigquery
</store>
…
</match>
Filter plugin mutates incoming data. Add/modify/delete
event data transform attributes without a code deploy.1
2
3
4
Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
The copy output plugin copies events to multiple outputs.
File(s), multiple databases, DB engines.
Great to ship same event to multiple subsystems.
The Bigquery output plugin on the fly streams the event to
the BigQuery warehouse. No need to write integration.
Data is available immediately for querying.
Whenever needed other output plugins can be wired in:
Kafka, Google Cloud Storage output plugin.
record_transformer copy file BigQuery
<filter frontend.user.*>
@type record_transformer
enable_ruby
remove_keys host
<record>
bq {"insert_id":"${uid}","host":"${host}",
"created":"${time.to_i}"}
avg ${record["total"] / record["count"]}
</record>
</filter>
syntax: Ruby, easy to use.
Great for:
- date transformation,
- quick normalizations,
- calculating something on the fly,
and store in clear log/analytics db
- renaming without code deploy.
1
Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
2 3 4
record_transformer copy file BigQuery
<match frontend.user.*>
@type copy
<store>
@type forest
subtype file
<template>
path /tank/storage/${tag}.*.log
time_slice_format %Y%m%d
</template>
</store>
</match>
1
Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
2 3 4
record_transformer copy file BigQuery
<match frontend.user.*>
@type bigquery
method insert
auth_method json_key
json_key /etc/td-agent/keys/key-31da042be48c.json
project project_id
dataset dataset_name
time_field timestamp
time_slice_format %Y%m%d
table user$%{time_slice}
ignore_unknown_values
schema_path /etc/td-agent/schema/user_login.json
</match>
1
Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
2 3 4
Connector uses:
- JSON key auth file
- JSON table schema
Pro features:
- streaming to Partitioned tables
- ignore unknown values
(not reflected in schema)
● On data that it is difficult to process/analyze using traditional databases
● On exploring unstructured data
● Not a replacement to traditional DBs, but it compliments the system
● Applying Javascript UDF on columnar storage to resolve complex tasks
(eg: Javascript for natural language processing)
● On streams (forms, Kafka, IoT streams)
● Major strength is handling Large datasets
Where to use BigQuery?
Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
➢ Optimize product pages
Find, store, analyse in BQ time consuming user actions from using
25x more custom events/hits than Google Analytics
➢ Email engagement
Having stored every open/click raw data improve: subject line, layout,
follow up action emails, assistant like experience by heavy
A/B Split Tests on email marketing campaigns (interactive feedback loop)
➢ Funnel Analysis
Wrangle all the data to discover: a small improvement, an AI driven
upsell personal like experience, pre-sell products configured on the go -
not yet in catalog, but easily can be tweaked/customized
Achievements - goal reached by measuring everything
Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
Funnel analysis: Time on upsell pages
Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
Example HITS chain:
● article1 -> page2 -> page3 -> page4 -> orderpage1 -> thankyoupage1
● page1 -> article2-> page3 -> orderpage2 -> ...
Attribute credit to first article visited on purchase
Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
● No manual sharding
● No capacity guessing
● No idle resources
● No maintenance windows
● No manual scaling
● No file mgmt
BigQuery: Serverless Data Warehouse
Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
serverless data warehouse depicted
● no provisioning/deploy
● no running out of resources
● no more focus on large scale execution plan
● no more throwing away-, expiring-, aggregating old data.
● run raw ad-hoc queries (either by analysts/sales or Devs)
● use Javascript in SQL to have an awesome BigData
experience wrangling “unstructured” like nerd
Our benefits
Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
Easily Build Custom Reports and Dashboards
Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
Thank you.
Slides available on: slideshare.net/martonkodok
Reea.net - Integrated web solutions driven by creativity to deliver projects.

More Related Content

PDF
Voxxed Days Cluj - Powering interactive data analysis with Google BigQuery
Márton Kodok
 
PDF
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
Márton Kodok
 
PDF
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
Márton Kodok
 
PDF
Making advanced analytics accessible to more companies
Márton Kodok
 
PDF
DevTalks Keynote Powering interactive data analysis with Google BigQuery
Márton Kodok
 
PDF
Google BigQuery for Everyday Developer
Márton Kodok
 
PDF
BigQuery ML - Machine learning at scale using SQL
Márton Kodok
 
PDF
Big query the first step - (MOSG)
Soshi Nemoto
 
Voxxed Days Cluj - Powering interactive data analysis with Google BigQuery
Márton Kodok
 
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
Márton Kodok
 
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
Márton Kodok
 
Making advanced analytics accessible to more companies
Márton Kodok
 
DevTalks Keynote Powering interactive data analysis with Google BigQuery
Márton Kodok
 
Google BigQuery for Everyday Developer
Márton Kodok
 
BigQuery ML - Machine learning at scale using SQL
Márton Kodok
 
Big query the first step - (MOSG)
Soshi Nemoto
 

What's hot (20)

PDF
Supercharge your data analytics with BigQuery
Márton Kodok
 
PDF
Big query
Tanvi Parikh
 
PDF
Google and big query
QlikView-India
 
PDF
Google BigQuery - Features & Benefits
Andreas Raible
 
PDF
How Google Does Big Data - DevNexus 2014
James Chittenden
 
PDF
DevFest Romania 2020 Keynote: Bringing the Cloud to you.
Márton Kodok
 
PDF
Data Lineage with Apache Airflow using Marquez
Willy Lulciuc
 
PDF
BigQuery ML - Machine learning at scale using SQL
Márton Kodok
 
PDF
Connecta Event: Big Query och dataanalys med Google Cloud Platform
ConnectaDigital
 
PDF
BigQuery for Beginners
Better&Stronger
 
PDF
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
MongoDB
 
PDF
Applying BigQuery ML on e-commerce data analytics
Márton Kodok
 
PDF
Big Query Basics
Ido Green
 
PDF
MongoDB Evenings Houston: What's the Scoop on MongoDB and Hadoop? by Jake Ang...
MongoDB
 
PDF
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Imply
 
PDF
Self Service Analytics at Twitch
Imply
 
PPTX
Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...
Rittman Analytics
 
PDF
MongoDB .local Paris 2020: Les bonnes pratiques pour travailler avec les donn...
MongoDB
 
PPTX
OWF 2014 - Take back control of your Web tracking - Dataiku
Dataiku
 
PPTX
Cracking the Code of Managing The Chaos Of Everyday Project Management
Fishbowl Solutions
 
Supercharge your data analytics with BigQuery
Márton Kodok
 
Big query
Tanvi Parikh
 
Google and big query
QlikView-India
 
Google BigQuery - Features & Benefits
Andreas Raible
 
How Google Does Big Data - DevNexus 2014
James Chittenden
 
DevFest Romania 2020 Keynote: Bringing the Cloud to you.
Márton Kodok
 
Data Lineage with Apache Airflow using Marquez
Willy Lulciuc
 
BigQuery ML - Machine learning at scale using SQL
Márton Kodok
 
Connecta Event: Big Query och dataanalys med Google Cloud Platform
ConnectaDigital
 
BigQuery for Beginners
Better&Stronger
 
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
MongoDB
 
Applying BigQuery ML on e-commerce data analytics
Márton Kodok
 
Big Query Basics
Ido Green
 
MongoDB Evenings Houston: What's the Scoop on MongoDB and Hadoop? by Jake Ang...
MongoDB
 
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Imply
 
Self Service Analytics at Twitch
Imply
 
Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...
Rittman Analytics
 
MongoDB .local Paris 2020: Les bonnes pratiques pour travailler avec les donn...
MongoDB
 
OWF 2014 - Take back control of your Web tracking - Dataiku
Dataiku
 
Cracking the Code of Managing The Chaos Of Everyday Project Management
Fishbowl Solutions
 
Ad

Similar to GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery (20)

PDF
Complex realtime event analytics using BigQuery @Crunch Warmup
Márton Kodok
 
PDF
Google Dremel. Concept and Implementations.
Vicente Orjales
 
PPTX
BigQuery for the Big Data win
Ken Taylor
 
PPTX
Google Developer Group - Cloud Singapore BigQuery Webinar
Rasel Rana
 
PPTX
(Almost) Serverless Analytics System with BigQuery & AppEngine
Gabriel PREDA
 
PDF
Exploring BigData with Google BigQuery
Dharmesh Vaya
 
PDF
An overview of BigQuery
GirdhareeSaran
 
PDF
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Looker
 
PDF
[Webinar] Getting Started with BigQuery: Basics, Its Appilcations & Use Cases
Tatvic Analytics
 
PDF
Executive Intro to BigQuery
William M. Cohee
 
PDF
Big Data Analytics with Google BigQuery, by Javier Ramirez, datawaki, at Span...
javier ramirez
 
PDF
[Webinar] Interacting with BigQuery and Working with Advanced Queries
Tatvic Analytics
 
PDF
Google BigQuery is the future of Analytics! (Google Developer Conference)
Rasel Rana
 
PDF
Modern Thinking área digital MSKM 21/09/2017
MSMK - Madrid School of Marketing
 
PPTX
Implementing google big query automation using google analytics data
Countants
 
PDF
Big Data Analytics with Google BigQuery. GDG Summit Spain 2014
javier ramirez
 
PDF
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Perficient, Inc.
 
PDF
BigQuery 101
Karen Hsieh
 
PDF
2017 09-27 democratize data products with SQL
Yu Ishikawa
 
PPTX
Building Modern Data Pipelines on GCP via a FREE online Bootcamp
Data Con LA
 
Complex realtime event analytics using BigQuery @Crunch Warmup
Márton Kodok
 
Google Dremel. Concept and Implementations.
Vicente Orjales
 
BigQuery for the Big Data win
Ken Taylor
 
Google Developer Group - Cloud Singapore BigQuery Webinar
Rasel Rana
 
(Almost) Serverless Analytics System with BigQuery & AppEngine
Gabriel PREDA
 
Exploring BigData with Google BigQuery
Dharmesh Vaya
 
An overview of BigQuery
GirdhareeSaran
 
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Looker
 
[Webinar] Getting Started with BigQuery: Basics, Its Appilcations & Use Cases
Tatvic Analytics
 
Executive Intro to BigQuery
William M. Cohee
 
Big Data Analytics with Google BigQuery, by Javier Ramirez, datawaki, at Span...
javier ramirez
 
[Webinar] Interacting with BigQuery and Working with Advanced Queries
Tatvic Analytics
 
Google BigQuery is the future of Analytics! (Google Developer Conference)
Rasel Rana
 
Modern Thinking área digital MSKM 21/09/2017
MSMK - Madrid School of Marketing
 
Implementing google big query automation using google analytics data
Countants
 
Big Data Analytics with Google BigQuery. GDG Summit Spain 2014
javier ramirez
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Perficient, Inc.
 
BigQuery 101
Karen Hsieh
 
2017 09-27 democratize data products with SQL
Yu Ishikawa
 
Building Modern Data Pipelines on GCP via a FREE online Bootcamp
Data Con LA
 
Ad

More from Márton Kodok (20)

PDF
AI Agents with Gemini 2.0 - Beyond the Chatbot
Márton Kodok
 
PDF
Gemini 2.0 and Vertex AI for Innovation Workshop
Márton Kodok
 
PDF
Function Calling with the Vertex AI Gemini API
Márton Kodok
 
PDF
Vector search and multimodal embeddings in BigQuery
Márton Kodok
 
PDF
BigQuery Remote Functions for Dynamic Mapping of E-mobility Charging Networks
Márton Kodok
 
PDF
Build applications with generative AI on Google Cloud
Márton Kodok
 
PDF
Gen Apps on Google Cloud PaLM2 and Codey APIs in Action
Márton Kodok
 
PDF
DevBCN Vertex AI - Pipelines for your MLOps workflows
Márton Kodok
 
PDF
Discover BigQuery ML, build your own CREATE MODEL statement
Márton Kodok
 
PDF
Cloud Run - the rise of serverless and containerization
Márton Kodok
 
PDF
BigQuery best practices and recommendations to reduce costs with BI Engine, S...
Márton Kodok
 
PDF
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Márton Kodok
 
PDF
Vertex AI: Pipelines for your MLOps workflows
Márton Kodok
 
PDF
Cloud Workflows What's new in serverless orchestration and automation
Márton Kodok
 
PDF
Serverless orchestration and automation with Cloud Workflows
Márton Kodok
 
PDF
Serverless orchestration and automation with Cloud Workflows
Márton Kodok
 
PDF
Serverless orchestration and automation with Cloud Workflows
Márton Kodok
 
PDF
BigdataConference Europe - BigQuery ML
Márton Kodok
 
PDF
Vibe Koli 2019 - Utazás az egyetem padjaitól a Google Developer Expertig
Márton Kodok
 
PDF
Google Cloud Platform Solutions for DevOps Engineers
Márton Kodok
 
AI Agents with Gemini 2.0 - Beyond the Chatbot
Márton Kodok
 
Gemini 2.0 and Vertex AI for Innovation Workshop
Márton Kodok
 
Function Calling with the Vertex AI Gemini API
Márton Kodok
 
Vector search and multimodal embeddings in BigQuery
Márton Kodok
 
BigQuery Remote Functions for Dynamic Mapping of E-mobility Charging Networks
Márton Kodok
 
Build applications with generative AI on Google Cloud
Márton Kodok
 
Gen Apps on Google Cloud PaLM2 and Codey APIs in Action
Márton Kodok
 
DevBCN Vertex AI - Pipelines for your MLOps workflows
Márton Kodok
 
Discover BigQuery ML, build your own CREATE MODEL statement
Márton Kodok
 
Cloud Run - the rise of serverless and containerization
Márton Kodok
 
BigQuery best practices and recommendations to reduce costs with BI Engine, S...
Márton Kodok
 
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Márton Kodok
 
Vertex AI: Pipelines for your MLOps workflows
Márton Kodok
 
Cloud Workflows What's new in serverless orchestration and automation
Márton Kodok
 
Serverless orchestration and automation with Cloud Workflows
Márton Kodok
 
Serverless orchestration and automation with Cloud Workflows
Márton Kodok
 
Serverless orchestration and automation with Cloud Workflows
Márton Kodok
 
BigdataConference Europe - BigQuery ML
Márton Kodok
 
Vibe Koli 2019 - Utazás az egyetem padjaitól a Google Developer Expertig
Márton Kodok
 
Google Cloud Platform Solutions for DevOps Engineers
Márton Kodok
 

Recently uploaded (20)

PPTX
Presentation about variables and constant.pptx
kr2589474
 
PDF
49784907924775488180_LRN2959_Data_Pump_23ai.pdf
Abilash868456
 
PDF
An Experience-Based Look at AI Lead Generation Pricing, Features & B2B Results
Thomas albart
 
PPTX
Can You Build Dashboards Using Open Source Visualization Tool.pptx
Varsha Nayak
 
PDF
Adobe Illustrator Crack Full Download (Latest Version 2025) Pre-Activated
imang66g
 
PPTX
Maximizing Revenue with Marketo Measure: A Deep Dive into Multi-Touch Attribu...
bbedford2
 
PPTX
GALILEO CRS SYSTEM | GALILEO TRAVEL SOFTWARE
philipnathen82
 
PPTX
slidesgo-unlocking-the-code-the-dynamic-dance-of-variables-and-constants-2024...
kr2589474
 
PDF
New Download MiniTool Partition Wizard Crack Latest Version 2025
imang66g
 
PDF
On Software Engineers' Productivity - Beyond Misleading Metrics
Romén Rodríguez-Gil
 
PPTX
Visualising Data with Scatterplots in IBM SPSS Statistics.pptx
Version 1 Analytics
 
PDF
New Download FL Studio Crack Full Version [Latest 2025]
imang66g
 
PPT
Activate_Methodology_Summary presentatio
annapureddyn
 
PDF
Appium Automation Testing Tutorial PDF: Learn Mobile Testing in 7 Days
jamescantor38
 
PPTX
Odoo Integration Services by Candidroot Solutions
CandidRoot Solutions Private Limited
 
PDF
Bandai Playdia The Book - David Glotz
BluePanther6
 
PPT
Why Reliable Server Maintenance Service in New York is Crucial for Your Business
Sam Vohra
 
PDF
What to consider before purchasing Microsoft 365 Business Premium_PDF.pdf
Q-Advise
 
PDF
lesson-2-rules-of-netiquette.pdf.bshhsjdj
jasmenrojas249
 
PDF
Balancing Resource Capacity and Workloads with OnePlan – Avoid Overloading Te...
OnePlan Solutions
 
Presentation about variables and constant.pptx
kr2589474
 
49784907924775488180_LRN2959_Data_Pump_23ai.pdf
Abilash868456
 
An Experience-Based Look at AI Lead Generation Pricing, Features & B2B Results
Thomas albart
 
Can You Build Dashboards Using Open Source Visualization Tool.pptx
Varsha Nayak
 
Adobe Illustrator Crack Full Download (Latest Version 2025) Pre-Activated
imang66g
 
Maximizing Revenue with Marketo Measure: A Deep Dive into Multi-Touch Attribu...
bbedford2
 
GALILEO CRS SYSTEM | GALILEO TRAVEL SOFTWARE
philipnathen82
 
slidesgo-unlocking-the-code-the-dynamic-dance-of-variables-and-constants-2024...
kr2589474
 
New Download MiniTool Partition Wizard Crack Latest Version 2025
imang66g
 
On Software Engineers' Productivity - Beyond Misleading Metrics
Romén Rodríguez-Gil
 
Visualising Data with Scatterplots in IBM SPSS Statistics.pptx
Version 1 Analytics
 
New Download FL Studio Crack Full Version [Latest 2025]
imang66g
 
Activate_Methodology_Summary presentatio
annapureddyn
 
Appium Automation Testing Tutorial PDF: Learn Mobile Testing in 7 Days
jamescantor38
 
Odoo Integration Services by Candidroot Solutions
CandidRoot Solutions Private Limited
 
Bandai Playdia The Book - David Glotz
BluePanther6
 
Why Reliable Server Maintenance Service in New York is Crucial for Your Business
Sam Vohra
 
What to consider before purchasing Microsoft 365 Business Premium_PDF.pdf
Q-Advise
 
lesson-2-rules-of-netiquette.pdf.bshhsjdj
jasmenrojas249
 
Balancing Resource Capacity and Workloads with OnePlan – Avoid Overloading Te...
OnePlan Solutions
 

GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery

  • 1. Powering Interactive Data Analysis with Google BigQuery Márton KODOK @martonkodok Software Architect @ REEA.net
  • 2. Everycompany, no matter how far from the tech they are, isevolvingintoasoftwarecompany, and by extension a datacompany. Turning everything into “data” drives innovation Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
  • 3. For a small company it’simportant to have access to modernBigDatatools withoutrunningadedicatedteam for it. Small companies should do BigData - but how? Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
  • 4. ❏ Need backend/database to STORE, QUERY, EXTRACT data ❏ Deep analytics - large, multi-source, complex, unstructured ❏ Be real time ❏ Terabyte scale - Cost effective ❏ Run Ad-Hoc reports - Without Developer - interactive ❏ Minimal engineering efforts - no dedicated BigData team ❏ Simple Query language (prefered SQL / Javascript) Making analytics accessible to more companies Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
  • 5. Legacy Business Reporting System Web Mobile Web Server Database SQL Cached Platform Services CMS/Framework Report & Share Business Analysis Scheduled Tasks Batch Processing Compute Engine Multiple Instances Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
  • 6. Web Mobile Web Server Database SQL Cached Platform Services CMS/Framework Report & Share Business Analysis Scheduled Tasks Batch Processing Compute Engine Multiple Instances BehindtheScenes: DaysToInsights Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
  • 7. Legacy Business Reporting System Web Mobile Web Server Database SQL Cached Platform Services CMS/Framework Report & Share Business Analysis Scheduled Tasks Batch Processing Compute Engine Multiple Instances Minutes to kick in Hours to Run Batch Processing Hours to Clean and Aggregate DAYS TO INSIGHTS Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
  • 8. Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
  • 9. ● Analytics-as-a-Service - Data Warehouse in the Cloud ● Fully-Managed by Google (US or EU zone) ● Scales into Petabytes ● Ridiculously fast ● SQL 2011 Standard + Javascript UDF (User Defined Functions) ● Familiar DB Structure (table, views, record, nested, JSON) ● Open Interfaces (REST, ODBC, Web UI, BQ command line tool) ● Integrates with Google Sheets + Google Cloud Storage + Pub/Sub connectors ● Decent pricing (queries $5/TB, storage: $20/TB cold: $10/TB) *Oct 2017 What is BigQuery? Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
  • 10. Architecting for The Cloud BigQuery On-Premises Servers Pipelines ETL Engine Event Sourcing Frontend Platform Services Metrics / Logs/ Streaming Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
  • 11. Data Pipeline Integration at REEA.net Analytics Backend BigQuery On-Premises Servers Pipelines FluentD Event Sourcing Frontend Platform Services Metrics / Logs/ Streaming Development Team Data Analysts Report & Share Business Analysis Tools Tableau QlikView Data Studio Internal Dashboard Database SQL Application ServersServers Cloud Storage archive Load Export Replay Standard Devices HTTPS Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
  • 12. The following slides will present a sample Fluentd configuration to: 1. Transform a record 2. Copy event to multiple outputs 3. Store event data in File (for backup/log purposes) 4. Stream to BigQuery (for immediate analyses) Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
  • 13. <filter frontend.user.*> @type record_transformer </filter> <match frontend.user.*> @type copy <store> @type forest subtype file </store> <store> @type bigquery </store> … </match> Filter plugin mutates incoming data. Add/modify/delete event data transform attributes without a code deploy.1 2 3 4 Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua The copy output plugin copies events to multiple outputs. File(s), multiple databases, DB engines. Great to ship same event to multiple subsystems. The Bigquery output plugin on the fly streams the event to the BigQuery warehouse. No need to write integration. Data is available immediately for querying. Whenever needed other output plugins can be wired in: Kafka, Google Cloud Storage output plugin.
  • 14. record_transformer copy file BigQuery <filter frontend.user.*> @type record_transformer enable_ruby remove_keys host <record> bq {"insert_id":"${uid}","host":"${host}", "created":"${time.to_i}"} avg ${record["total"] / record["count"]} </record> </filter> syntax: Ruby, easy to use. Great for: - date transformation, - quick normalizations, - calculating something on the fly, and store in clear log/analytics db - renaming without code deploy. 1 Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua 2 3 4
  • 15. record_transformer copy file BigQuery <match frontend.user.*> @type copy <store> @type forest subtype file <template> path /tank/storage/${tag}.*.log time_slice_format %Y%m%d </template> </store> </match> 1 Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua 2 3 4
  • 16. record_transformer copy file BigQuery <match frontend.user.*> @type bigquery method insert auth_method json_key json_key /etc/td-agent/keys/key-31da042be48c.json project project_id dataset dataset_name time_field timestamp time_slice_format %Y%m%d table user$%{time_slice} ignore_unknown_values schema_path /etc/td-agent/schema/user_login.json </match> 1 Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua 2 3 4 Connector uses: - JSON key auth file - JSON table schema Pro features: - streaming to Partitioned tables - ignore unknown values (not reflected in schema)
  • 17. ● On data that it is difficult to process/analyze using traditional databases ● On exploring unstructured data ● Not a replacement to traditional DBs, but it compliments the system ● Applying Javascript UDF on columnar storage to resolve complex tasks (eg: Javascript for natural language processing) ● On streams (forms, Kafka, IoT streams) ● Major strength is handling Large datasets Where to use BigQuery? Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
  • 18. ➢ Optimize product pages Find, store, analyse in BQ time consuming user actions from using 25x more custom events/hits than Google Analytics ➢ Email engagement Having stored every open/click raw data improve: subject line, layout, follow up action emails, assistant like experience by heavy A/B Split Tests on email marketing campaigns (interactive feedback loop) ➢ Funnel Analysis Wrangle all the data to discover: a small improvement, an AI driven upsell personal like experience, pre-sell products configured on the go - not yet in catalog, but easily can be tweaked/customized Achievements - goal reached by measuring everything Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
  • 19. Funnel analysis: Time on upsell pages Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
  • 20. Example HITS chain: ● article1 -> page2 -> page3 -> page4 -> orderpage1 -> thankyoupage1 ● page1 -> article2-> page3 -> orderpage2 -> ... Attribute credit to first article visited on purchase Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
  • 21. ● No manual sharding ● No capacity guessing ● No idle resources ● No maintenance windows ● No manual scaling ● No file mgmt BigQuery: Serverless Data Warehouse Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua serverless data warehouse depicted
  • 22. ● no provisioning/deploy ● no running out of resources ● no more focus on large scale execution plan ● no more throwing away-, expiring-, aggregating old data. ● run raw ad-hoc queries (either by analysts/sales or Devs) ● use Javascript in SQL to have an awesome BigData experience wrangling “unstructured” like nerd Our benefits Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
  • 23. Easily Build Custom Reports and Dashboards Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
  • 24. Thank you. Slides available on: slideshare.net/martonkodok Reea.net - Integrated web solutions driven by creativity to deliver projects.