SlideShare a Scribd company logo
Building a Marketing Data
Warehouse from Scratch
Christopher Gutknecht | @chrisgutknecht | Bergzeit
2. Dashboards
1. Greenfield
Our Plan: The Three Phases of a Data Platform
3. Operational Analytics
What You Will Take Away from this Session
1. When and why you should invest in a Marketing DWH
3. Interesting use cases by combining data sources
5. Design outcome-oriented questions for analytics projects
2. Learn the data ecosystem and the benefits of BigQuery
4. Many tactical tips for daily use
About Chris: Head of Acquisition & Optimization
Digital Marketer
Tech nerd
Climber
1997 2008 2013 2020
Dad of 2
Big Thanks to Steffi,
our Data Scientist
Bergzeit: Combining Love for Mountains & Data
Online Store for Mountain Gear
122 M Revenue in Financial Year 20/21
14 Countries & 5 Languages
Commerce. Content. Guided Tours
Let’s Set Clear Expectations for this Session
Technical deep dive
Intro to data ecosystem
What this session IS What it’s NOT
Machine learning
Customer data focused
Google Cloud focused
Practical tips & mistakes
Ecommerce use cases
Why Is Data Knowledge Important for You?
Behavioural data -> digital success
Operational analytics on the rise
Requirement Privacy by design
Most Desirable Digital Marketing Skills 20/21
source: marketingcharts.com
The
Components of a
Modern Data
Platform
Not so fast… Why All This Complexity?
Why Do I Need a Marketing Data Warehouse?
Connectors
If your Frustration Grows with Reporting Tools
Transformations
Volume
Operational Use Cases
Complexity
More Data Sources
Don’t Be this Guy - Know When To Scale Up
But Don’t Overengineer - Gradually Scale
Infrastructure Maintenance
Long upfront planning
Past: The Old On-Premise Data Veterans
Manual Scaling
Strict Dimensional Modeling
Today: The Cool Cloud Kids on the Block
100% cloud, no Ops
Seamless scaling
Instantly ready
The Components of a Modern Data Platform
Data Warehouse
Data Ingestion
Data Catalog & Governance
Activation
Data Quality
Job Orchestration
Visualization
Transformation
Analysts are Turning into Analytics Engineers
Data Warehouse
Data Ingestion
Data Catalog & Governance
Activation
Data Quality
Job Orchestration
Visualization
Transformation
Phase 1: Greenfield
Let’s Start from The Beginning: Data Sources
Data Ingestion
Google Ads Data Transfer
GA4 Export
Google Merchant Center
Paid Connectors, e.g. Fivetran
Custom Ingestion Scripts
Google Sheets
Cost Connectors, e.g. Funnel.io
1. Navigate to Data transfers
Set up a Big Query Ads Transfer in One Minute
2. Configure the Transfer details
2. Storage & Transfer
Get Your GA4 Data into BQ in Two Minutes
1. BQ Linking in Admin UI
Easily Connect Your Sheets as BigQuery Tables
Data Loader Vendors: Easy Setup, Instant Results
Suggestions for Interesting Data Sources
Domain Data Source Available in Data Loaders?
SEO Google Search Console
SEO Pagespeed Insights & Lighthouse
SEO Google Bot Logfiles
Ecom Inventory Data & Attributes
Ecom Trusted Shops Reviews
Ecom Awin Open Orders
Social Instagram
Social Facebook
.... ...
2. Storage & Transfer
The Best Custom Way to Ingest Data into BQ
Data Source 3. BigQuery
1. Cloud Function
Data Transfer handles ingest job
Observability via alerts
Data Fetch with Python and Pandas
How Do We Batch-Ingest New Data?
Change Data Capture
Snapshots Copies
Full History Daily State
very easy
For <300k rows
rather easy
duplicate rows
storage efficient
complex architecture
Phase 2: Dashboards
We’ve Got Data: Show Me Shiny Reports!
Data Warehouse
Data Ingestion Visualization
How do Price Discounts affect Sales? Sources:
Price Discounts Data Model Sources:
date
sku
detail_views
product order value
ga sessions
date
sku
price
sale_price
diff_price_to_sale
diff_price_to_sale_grouped
products
How do Season Categories Affect Sales?
Season Category Data Model Sources:
date
sku
detail_views
product order value
ga sessions
date
sku
season_category
products
Category Revenue Share without Ratings
Category Rev & Ratings Data Model Sources:
date
sku
top_category
product order value
ga sessions
date
sku
rating_count
rating_count_grouped
products ratings
Which Category Has More Selection Orders?
Selection Orders Data Model Sources:
date
transaction_id
sub_category
is_multi_same_category
is_multi_same_color
is_multi_same_size
ga sessions
Wait… Don’t You Need to Know SQL for This?
You Need to Learn SQL: Take a BigQuery Course
Self-paced
45€ / User / Month
Takes 4-8 weeks
Certificate
WAIT! Before you build 100s of Dashboards...
Avoid code repetition
Style Guidelines
Test Coverage
Apply DEV Best Practices
SQL Version Control
Warning Signs You’re Doing Analytics Wrong
Don’t Create Datasets in the US, use EU
Always keep in EU
Can’t join with EU datasets
Avoid Saved Queries - They Don’t Scale
This Report is Broken. Who’s the Owner?
No Idea,
Help Yourself!
Don’t Use Custom Queries as Data Sources
Each Analyst uses a different SQL Code & Naming
Instead, Apply “Data Product” Thinking
Data Product Owner
Scrum Process
Data SLAs
Treat Data as a Product
#1 Pick And Implement A SQL Styleguide
https://github.com/mattm/sql-style-guide
#2 Define A Dataset & Table Naming Convention
Fields
Domains
Datasets
Tables
product
seo
seo_google_search_console
query_by_page_daily
ga_product_order_value
page
#3 Use dbt for all SQL Transformations
Build Your Data Product like a German House
Phase 3: Operational Analytics in Production
You Need Clean Data for Operational Analytics
What is Operational Analytics for Bergzeit?
ML
Products
Profit Bidding
Rule-based
Products
Data Uploads Attribution Model
Updating Affiliate Sales
Case: Upload Your Shopping Feed Every 10 Min
2. Cloud Function
with 15 lines of code
3. Schedule
Cronjob
1. Get GCS Bucket
Name
Code samples: https://gist.github.com/ChrisGutknecht/fde93092e21039299ab76715596eac01
Case: Profit Bidding & Report
More Details: https://www.slideshare.net/ChristopherGutknecht/gross-profit-bidding-for-ecommerce-smx-virtual-2021
Operational Data Errors Can be Really Costly
We Need To Prepare for Data Pipeline Errors
2. Execution
1. Source Data 3. Target Table
Test Coverage Test Coverage
Retry Policies
How Can We Solve Our Missing Data Problem?
Solution: Define a Table Freshness Alert with dbt
Define All Data Quality Tests in dbt
Non Null Values
Restricted Values
Uniqueness
Data Freshness
Simple: Cloud Scheduler for Retry Execution
Advanced: Use Airflow For Data Task Graphs
Retries on Every Task
Alerting & Monitoring
Data Tasks in DAGs
Snaspshots & Backfills
Let’s Finish With a Strategic View From the Peak
How Can We Generate Value? Focus on Actions
1. Define Actions 3. Factors
2. Success Metrics
What Will You
Do Differently If
You Have the
Data?
What Would
Success in
Metrics Look
Like?
Which Factors
Influence
Success?
4. Tests
How Can We
Test Actions on
These Factors?
Who Should Be Your Data Hire?
1. Focus:
SQL & Warehouse
3. Focus: ML Models
2. Focus: Data Pipelines
Your Takeaways from this Session
1. When and why you should invest in a Marketing DWH
3. How to explore use cases by combining data sources
5. Design outcome-oriented questions for analytics projects
2. Learn the data ecosystem and the benefits of BigQuery
4. Many tactical tips for daily use
Thanks for Your Time.
Looking Forward To Questions!
Chris Gutknecht | Teamlead A&O | Hiring a PPC!
2. Dashboards
1. Greenfield
ANNEX: The Three Phases of a Data Platform
3. Operational Analytics
Data Warehouse vs Data Lake
Structured Data
Table Schemas
Transactions
Sharded Files
Unstructured Data
Lakehouse
Why Focus On Google Cloud & Big Query?
Market Leader in Data Analytics*
Free Google Data Connectors
Seamless low-tech scaling
source:: Forrester Research 2021
The Best Cloud Data Warehouse? It Depends
Source: https://medium.com/pocket-gems/a-comparative-analysis-between-bigquery-redshift-and-snowflake-8d194fdf5693
Google Data Sources BigQuery = Google Cloud
How often do we Ingest Data?
Real-Time
Stream Processing
Batch Processing
or
Connected Sheets = ‘NoCode’ Analysis on BQ
Dom Woodman’s Search Console Downloader
https://www.pipedout.com/resources/tools/download-search-console/
Data Modeling: Star Schema & 3rd Normal Form
Third Normal
Form
Data Modeling Choices: Denormalized
Third Normal
Form
Denormalized
2. Dashboards
1. Greenfield
ANNEX: The Three Phases of a Data Platform
3. Operational Analytics
Or Pick the Official “Data Analytics” Certificate
8 Courses
4-6 Months (longer)
Intro to R
Certificate
Export Your Your GPC Costs to BigQuery
Monitor Your BigQuery Costs in a Dashboard
Monitor Expensive Queries: https://www.pascallandau.com/bigquery-snippets/monitor-query-costs/
2. Dashboards
1. Greenfield
ANNEX: The Three Phases of a Data Platform
3. Operational Analytics
How To Sync Segments: CDP vs Reverse ETL?
Customer Data Platform
Reverse ETL
(DWH = Central)

More Related Content

PDF
Your Raw Data is Ready - Introduction to Analytics Engineering | SMX Advanced...
Christopher Gutknecht
 
PDF
SMX Advanced - When to use Machine Learning for Search Campaigns
Christopher Gutknecht
 
PDF
그로스 해킹 & 데이터 프로덕트 (Growth Hacking & Data Product) - 고넥터 고영혁 (Gonnector Dylan Ko)
Dylan Ko
 
PDF
Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Christopher Gutknecht
 
PPTX
Data pipeline and data lake
DaeMyung Kang
 
PPTX
Google Data Studio
Mohammed Amir Khan
 
PPTX
Business Intelligence Presentation
Harrison Chisomo Chisonga
 
PDF
[Webinar Deck] Google Data Studio for Mastering the Art of Data Visualizations
Tatvic Analytics
 
Your Raw Data is Ready - Introduction to Analytics Engineering | SMX Advanced...
Christopher Gutknecht
 
SMX Advanced - When to use Machine Learning for Search Campaigns
Christopher Gutknecht
 
그로스 해킹 & 데이터 프로덕트 (Growth Hacking & Data Product) - 고넥터 고영혁 (Gonnector Dylan Ko)
Dylan Ko
 
Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Christopher Gutknecht
 
Data pipeline and data lake
DaeMyung Kang
 
Google Data Studio
Mohammed Amir Khan
 
Business Intelligence Presentation
Harrison Chisomo Chisonga
 
[Webinar Deck] Google Data Studio for Mastering the Art of Data Visualizations
Tatvic Analytics
 

What's hot (20)

PPTX
Power business intelligence
aasthabadoniya1
 
PPTX
DAX and Power BI Training - 001 Overview
Will Harvey
 
PPTX
PPT - Google Data Studio
secretbuttoncamera
 
PDF
SearchLove San Diego 2018 | Will Critchlow | From the Horse’s Mouth: What We ...
Distilled
 
PDF
Is Consent Mode Working.pdf
Doug Hall
 
PDF
어떻게 하면 데이터 사이언티스트가 될 수 있나요?
Yongho Ha
 
PDF
Columbus Web Analytics Wednesday - Google Analytics 4
Tim Wilson
 
PDF
Análisis de Datos SEO para el QUÉ y el CÚANDO [SOB 2018]
Luis M Villanueva
 
PDF
Little Big Data #1. 바닥부터 시작하는 데이터 인프라
Seongyun Byeon
 
PPTX
Getting Started with Google Data Studio
Chris Burgess
 
PDF
Google BigQuery Best Practices
Matillion
 
PDF
AI Restart 2023: Martin Kváš - Jak nám AI pomohla vybudovat firmu za pouhé tř...
Taste
 
PDF
Data Restart 2022: Marina Mchedlishvili - How to build strong data strategies...
Taste
 
PPTX
Better decision making with proper business intelligence
madhavlankapati
 
PDF
[NDC 발표] 모바일 게임데이터분석 및 실전 활용
Tapjoy X 5Rocks
 
PPTX
Data lake ppt
SwarnaLatha177
 
PDF
Data Engineering 101
DaeMyung Kang
 
PDF
Gathering Business Requirements for Data Warehouses
David Walker
 
PDF
New SEO Strategies: 3 Steps To Perfect SEO Content Creation
Search Engine Journal
 
PDF
40 Deep #SEO Insights for 2023
Koray Tugberk GUBUR
 
Power business intelligence
aasthabadoniya1
 
DAX and Power BI Training - 001 Overview
Will Harvey
 
PPT - Google Data Studio
secretbuttoncamera
 
SearchLove San Diego 2018 | Will Critchlow | From the Horse’s Mouth: What We ...
Distilled
 
Is Consent Mode Working.pdf
Doug Hall
 
어떻게 하면 데이터 사이언티스트가 될 수 있나요?
Yongho Ha
 
Columbus Web Analytics Wednesday - Google Analytics 4
Tim Wilson
 
Análisis de Datos SEO para el QUÉ y el CÚANDO [SOB 2018]
Luis M Villanueva
 
Little Big Data #1. 바닥부터 시작하는 데이터 인프라
Seongyun Byeon
 
Getting Started with Google Data Studio
Chris Burgess
 
Google BigQuery Best Practices
Matillion
 
AI Restart 2023: Martin Kváš - Jak nám AI pomohla vybudovat firmu za pouhé tř...
Taste
 
Data Restart 2022: Marina Mchedlishvili - How to build strong data strategies...
Taste
 
Better decision making with proper business intelligence
madhavlankapati
 
[NDC 발표] 모바일 게임데이터분석 및 실전 활용
Tapjoy X 5Rocks
 
Data lake ppt
SwarnaLatha177
 
Data Engineering 101
DaeMyung Kang
 
Gathering Business Requirements for Data Warehouses
David Walker
 
New SEO Strategies: 3 Steps To Perfect SEO Content Creation
Search Engine Journal
 
40 Deep #SEO Insights for 2023
Koray Tugberk GUBUR
 
Ad

Similar to Building a Marketing Data Warehouse from Scratch - SMX Advanced 202 (20)

PDF
Slides: Success Stories for Data-to-Cloud
DATAVERSITY
 
PPTX
Vlad Flaks (OWOX, Founder & CEO) "SaaSy Analytics and Dashboards" SaaS Nation...
Anna Vodyanitskaya
 
PDF
Looker's Ben Porterfield - Asking The Right Questions
Heavybit
 
PPTX
Building a Marketing Data Warehouse in Google BigQuery with Supermetrics
In Marketing We Trust
 
PPTX
Analytics & Data Strategy 101 by Deko Dimeski
Deko Dimeski
 
PDF
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
DATAVERSITY
 
PPTX
Bdf16 big-data-warehouse-case-study-data kitchen
Christopher Bergh
 
PDF
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Technologies
 
PPTX
Creating an Enterprise AI Strategy
AtScale
 
PPTX
FTFCU - How to Become a Data Driven Organization
Naveen Jain
 
PDF
Pluto7 - Tableau Webinar on enabling Organization to be Data Driven in 201...
Manju Devadas
 
PDF
When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
PDF
Big Data: Customer Intimacy & Develop New Business
Matthias Vallaey
 
PDF
Project A Data Modelling Best Practices Part I: How to model data in a data w...
Martin Loetzsch
 
PPTX
20150118 s snet analytics vca
Vishwanath Ramdas
 
PDF
Modern data integration expert sessions
JessicaMurrell3
 
PPTX
Modern Data Integration Expert Session Webinar
ibi
 
PPTX
Big Data & Business Analytics: Understanding the Marketspace
Bala Iyer
 
PDF
Data Management Workshop - ETOT 2016
DataGenic Ltd
 
PDF
Big Data analytics best practices
The Marketing Distillery
 
Slides: Success Stories for Data-to-Cloud
DATAVERSITY
 
Vlad Flaks (OWOX, Founder & CEO) "SaaSy Analytics and Dashboards" SaaS Nation...
Anna Vodyanitskaya
 
Looker's Ben Porterfield - Asking The Right Questions
Heavybit
 
Building a Marketing Data Warehouse in Google BigQuery with Supermetrics
In Marketing We Trust
 
Analytics & Data Strategy 101 by Deko Dimeski
Deko Dimeski
 
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
DATAVERSITY
 
Bdf16 big-data-warehouse-case-study-data kitchen
Christopher Bergh
 
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Technologies
 
Creating an Enterprise AI Strategy
AtScale
 
FTFCU - How to Become a Data Driven Organization
Naveen Jain
 
Pluto7 - Tableau Webinar on enabling Organization to be Data Driven in 201...
Manju Devadas
 
When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
Big Data: Customer Intimacy & Develop New Business
Matthias Vallaey
 
Project A Data Modelling Best Practices Part I: How to model data in a data w...
Martin Loetzsch
 
20150118 s snet analytics vca
Vishwanath Ramdas
 
Modern data integration expert sessions
JessicaMurrell3
 
Modern Data Integration Expert Session Webinar
ibi
 
Big Data & Business Analytics: Understanding the Marketspace
Bala Iyer
 
Data Management Workshop - ETOT 2016
DataGenic Ltd
 
Big Data analytics best practices
The Marketing Distillery
 
Ad

More from Christopher Gutknecht (8)

PDF
PMAX Product structures with BigQuery [GERMAN]
Christopher Gutknecht
 
PDF
How to recover from an unsuccessful SEO relaunch by activating your data (SMX...
Christopher Gutknecht
 
PDF
MeasureCamp_Custom GA4 Channel Groups with dbt
Christopher Gutknecht
 
PDF
Scaling Search Campaigns With Bulk Uploads and Ad Customizers (SMX 2023)
Christopher Gutknecht
 
PDF
Gross Profit Bidding for Ecommerce | SMX Virtual 2021
Christopher Gutknecht
 
PDF
Data Driven Attribution in BigQuery with Shapley Values and Markov Chains
Christopher Gutknecht
 
PDF
Questioning data quality and troubleshooting tracking gaps (version2 | Smx Su...
Christopher Gutknecht
 
PDF
Questioning Data Quality and Troubleshooting Tracking Gaps (SMX Munich 2020)
Christopher Gutknecht
 
PMAX Product structures with BigQuery [GERMAN]
Christopher Gutknecht
 
How to recover from an unsuccessful SEO relaunch by activating your data (SMX...
Christopher Gutknecht
 
MeasureCamp_Custom GA4 Channel Groups with dbt
Christopher Gutknecht
 
Scaling Search Campaigns With Bulk Uploads and Ad Customizers (SMX 2023)
Christopher Gutknecht
 
Gross Profit Bidding for Ecommerce | SMX Virtual 2021
Christopher Gutknecht
 
Data Driven Attribution in BigQuery with Shapley Values and Markov Chains
Christopher Gutknecht
 
Questioning data quality and troubleshooting tracking gaps (version2 | Smx Su...
Christopher Gutknecht
 
Questioning Data Quality and Troubleshooting Tracking Gaps (SMX Munich 2020)
Christopher Gutknecht
 

Recently uploaded (20)

PPTX
Data-Driven Machine Learning for Rail Infrastructure Health Monitoring
Sione Palu
 
PDF
Mastering Financial Analysis Materials.pdf
SalamiAbdullahi
 
PPTX
Introduction-to-Python-Programming-Language (1).pptx
dhyeysapariya
 
PDF
Blitz Campinas - Dia 24 de maio - Piettro.pdf
fabigreek
 
PPTX
Presentation on animal welfare a good topic
kidscream385
 
PDF
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
PDF
WISE main accomplishments for ISQOLS award July 2025.pdf
StatsCommunications
 
PDF
TIC ACTIVIDAD 1geeeeeeeeeeeeeeeeeeeeeeeeeeeeeer3.pdf
Thais Ruiz
 
PPTX
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 
PPTX
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
PPTX
Introduction to Biostatistics Presentation.pptx
AtemJoshua
 
PPTX
short term project on AI Driven Data Analytics
JMJCollegeComputerde
 
PDF
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
PDF
blockchain123456789012345678901234567890
tanvikhunt1003
 
PPTX
M1-T1.pptxM1-T1.pptxM1-T1.pptxM1-T1.pptx
teodoroferiarevanojr
 
PPT
From Vision to Reality: The Digital India Revolution
Harsh Bharvadiya
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PPTX
IP_Journal_Articles_2025IP_Journal_Articles_2025
mishell212144
 
PPTX
Multiscale Segmentation of Survey Respondents: Seeing the Trees and the Fores...
Sione Palu
 
PDF
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
Data-Driven Machine Learning for Rail Infrastructure Health Monitoring
Sione Palu
 
Mastering Financial Analysis Materials.pdf
SalamiAbdullahi
 
Introduction-to-Python-Programming-Language (1).pptx
dhyeysapariya
 
Blitz Campinas - Dia 24 de maio - Piettro.pdf
fabigreek
 
Presentation on animal welfare a good topic
kidscream385
 
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
WISE main accomplishments for ISQOLS award July 2025.pdf
StatsCommunications
 
TIC ACTIVIDAD 1geeeeeeeeeeeeeeeeeeeeeeeeeeeeeer3.pdf
Thais Ruiz
 
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
Introduction to Biostatistics Presentation.pptx
AtemJoshua
 
short term project on AI Driven Data Analytics
JMJCollegeComputerde
 
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
blockchain123456789012345678901234567890
tanvikhunt1003
 
M1-T1.pptxM1-T1.pptxM1-T1.pptxM1-T1.pptx
teodoroferiarevanojr
 
From Vision to Reality: The Digital India Revolution
Harsh Bharvadiya
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
IP_Journal_Articles_2025IP_Journal_Articles_2025
mishell212144
 
Multiscale Segmentation of Survey Respondents: Seeing the Trees and the Fores...
Sione Palu
 
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 

Building a Marketing Data Warehouse from Scratch - SMX Advanced 202

  • 1. Building a Marketing Data Warehouse from Scratch Christopher Gutknecht | @chrisgutknecht | Bergzeit
  • 2. 2. Dashboards 1. Greenfield Our Plan: The Three Phases of a Data Platform 3. Operational Analytics
  • 3. What You Will Take Away from this Session 1. When and why you should invest in a Marketing DWH 3. Interesting use cases by combining data sources 5. Design outcome-oriented questions for analytics projects 2. Learn the data ecosystem and the benefits of BigQuery 4. Many tactical tips for daily use
  • 4. About Chris: Head of Acquisition & Optimization Digital Marketer Tech nerd Climber 1997 2008 2013 2020 Dad of 2 Big Thanks to Steffi, our Data Scientist
  • 5. Bergzeit: Combining Love for Mountains & Data Online Store for Mountain Gear 122 M Revenue in Financial Year 20/21 14 Countries & 5 Languages Commerce. Content. Guided Tours
  • 6. Let’s Set Clear Expectations for this Session Technical deep dive Intro to data ecosystem What this session IS What it’s NOT Machine learning Customer data focused Google Cloud focused Practical tips & mistakes Ecommerce use cases
  • 7. Why Is Data Knowledge Important for You? Behavioural data -> digital success Operational analytics on the rise Requirement Privacy by design
  • 8. Most Desirable Digital Marketing Skills 20/21 source: marketingcharts.com
  • 10. Not so fast… Why All This Complexity?
  • 11. Why Do I Need a Marketing Data Warehouse? Connectors
  • 12. If your Frustration Grows with Reporting Tools Transformations Volume Operational Use Cases Complexity More Data Sources
  • 13. Don’t Be this Guy - Know When To Scale Up
  • 14. But Don’t Overengineer - Gradually Scale
  • 15. Infrastructure Maintenance Long upfront planning Past: The Old On-Premise Data Veterans Manual Scaling Strict Dimensional Modeling
  • 16. Today: The Cool Cloud Kids on the Block 100% cloud, no Ops Seamless scaling Instantly ready
  • 17. The Components of a Modern Data Platform Data Warehouse Data Ingestion Data Catalog & Governance Activation Data Quality Job Orchestration Visualization Transformation
  • 18. Analysts are Turning into Analytics Engineers Data Warehouse Data Ingestion Data Catalog & Governance Activation Data Quality Job Orchestration Visualization Transformation
  • 20. Let’s Start from The Beginning: Data Sources Data Ingestion Google Ads Data Transfer GA4 Export Google Merchant Center Paid Connectors, e.g. Fivetran Custom Ingestion Scripts Google Sheets Cost Connectors, e.g. Funnel.io
  • 21. 1. Navigate to Data transfers Set up a Big Query Ads Transfer in One Minute 2. Configure the Transfer details
  • 22. 2. Storage & Transfer Get Your GA4 Data into BQ in Two Minutes 1. BQ Linking in Admin UI
  • 23. Easily Connect Your Sheets as BigQuery Tables
  • 24. Data Loader Vendors: Easy Setup, Instant Results
  • 25. Suggestions for Interesting Data Sources Domain Data Source Available in Data Loaders? SEO Google Search Console SEO Pagespeed Insights & Lighthouse SEO Google Bot Logfiles Ecom Inventory Data & Attributes Ecom Trusted Shops Reviews Ecom Awin Open Orders Social Instagram Social Facebook .... ...
  • 26. 2. Storage & Transfer The Best Custom Way to Ingest Data into BQ Data Source 3. BigQuery 1. Cloud Function Data Transfer handles ingest job Observability via alerts Data Fetch with Python and Pandas
  • 27. How Do We Batch-Ingest New Data? Change Data Capture Snapshots Copies Full History Daily State very easy For <300k rows rather easy duplicate rows storage efficient complex architecture
  • 29. We’ve Got Data: Show Me Shiny Reports! Data Warehouse Data Ingestion Visualization
  • 30. How do Price Discounts affect Sales? Sources:
  • 31. Price Discounts Data Model Sources: date sku detail_views product order value ga sessions date sku price sale_price diff_price_to_sale diff_price_to_sale_grouped products
  • 32. How do Season Categories Affect Sales?
  • 33. Season Category Data Model Sources: date sku detail_views product order value ga sessions date sku season_category products
  • 34. Category Revenue Share without Ratings
  • 35. Category Rev & Ratings Data Model Sources: date sku top_category product order value ga sessions date sku rating_count rating_count_grouped products ratings
  • 36. Which Category Has More Selection Orders?
  • 37. Selection Orders Data Model Sources: date transaction_id sub_category is_multi_same_category is_multi_same_color is_multi_same_size ga sessions
  • 38. Wait… Don’t You Need to Know SQL for This?
  • 39. You Need to Learn SQL: Take a BigQuery Course Self-paced 45€ / User / Month Takes 4-8 weeks Certificate
  • 40. WAIT! Before you build 100s of Dashboards... Avoid code repetition Style Guidelines Test Coverage Apply DEV Best Practices SQL Version Control
  • 41. Warning Signs You’re Doing Analytics Wrong
  • 42. Don’t Create Datasets in the US, use EU Always keep in EU Can’t join with EU datasets
  • 43. Avoid Saved Queries - They Don’t Scale
  • 44. This Report is Broken. Who’s the Owner? No Idea, Help Yourself!
  • 45. Don’t Use Custom Queries as Data Sources
  • 46. Each Analyst uses a different SQL Code & Naming
  • 47. Instead, Apply “Data Product” Thinking Data Product Owner Scrum Process Data SLAs Treat Data as a Product
  • 48. #1 Pick And Implement A SQL Styleguide https://github.com/mattm/sql-style-guide
  • 49. #2 Define A Dataset & Table Naming Convention Fields Domains Datasets Tables product seo seo_google_search_console query_by_page_daily ga_product_order_value page
  • 50. #3 Use dbt for all SQL Transformations
  • 51. Build Your Data Product like a German House
  • 52. Phase 3: Operational Analytics in Production
  • 53. You Need Clean Data for Operational Analytics
  • 54. What is Operational Analytics for Bergzeit? ML Products Profit Bidding Rule-based Products Data Uploads Attribution Model Updating Affiliate Sales
  • 55. Case: Upload Your Shopping Feed Every 10 Min 2. Cloud Function with 15 lines of code 3. Schedule Cronjob 1. Get GCS Bucket Name Code samples: https://gist.github.com/ChrisGutknecht/fde93092e21039299ab76715596eac01
  • 56. Case: Profit Bidding & Report More Details: https://www.slideshare.net/ChristopherGutknecht/gross-profit-bidding-for-ecommerce-smx-virtual-2021
  • 57. Operational Data Errors Can be Really Costly
  • 58. We Need To Prepare for Data Pipeline Errors 2. Execution 1. Source Data 3. Target Table Test Coverage Test Coverage Retry Policies
  • 59. How Can We Solve Our Missing Data Problem?
  • 60. Solution: Define a Table Freshness Alert with dbt
  • 61. Define All Data Quality Tests in dbt Non Null Values Restricted Values Uniqueness Data Freshness
  • 62. Simple: Cloud Scheduler for Retry Execution
  • 63. Advanced: Use Airflow For Data Task Graphs Retries on Every Task Alerting & Monitoring Data Tasks in DAGs Snaspshots & Backfills
  • 64. Let’s Finish With a Strategic View From the Peak
  • 65. How Can We Generate Value? Focus on Actions 1. Define Actions 3. Factors 2. Success Metrics What Will You Do Differently If You Have the Data? What Would Success in Metrics Look Like? Which Factors Influence Success? 4. Tests How Can We Test Actions on These Factors?
  • 66. Who Should Be Your Data Hire? 1. Focus: SQL & Warehouse 3. Focus: ML Models 2. Focus: Data Pipelines
  • 67. Your Takeaways from this Session 1. When and why you should invest in a Marketing DWH 3. How to explore use cases by combining data sources 5. Design outcome-oriented questions for analytics projects 2. Learn the data ecosystem and the benefits of BigQuery 4. Many tactical tips for daily use
  • 68. Thanks for Your Time. Looking Forward To Questions! Chris Gutknecht | Teamlead A&O | Hiring a PPC!
  • 69. 2. Dashboards 1. Greenfield ANNEX: The Three Phases of a Data Platform 3. Operational Analytics
  • 70. Data Warehouse vs Data Lake Structured Data Table Schemas Transactions Sharded Files Unstructured Data Lakehouse
  • 71. Why Focus On Google Cloud & Big Query? Market Leader in Data Analytics* Free Google Data Connectors Seamless low-tech scaling source:: Forrester Research 2021
  • 72. The Best Cloud Data Warehouse? It Depends Source: https://medium.com/pocket-gems/a-comparative-analysis-between-bigquery-redshift-and-snowflake-8d194fdf5693 Google Data Sources BigQuery = Google Cloud
  • 73. How often do we Ingest Data? Real-Time Stream Processing Batch Processing or
  • 74. Connected Sheets = ‘NoCode’ Analysis on BQ
  • 75. Dom Woodman’s Search Console Downloader https://www.pipedout.com/resources/tools/download-search-console/
  • 76. Data Modeling: Star Schema & 3rd Normal Form Third Normal Form
  • 77. Data Modeling Choices: Denormalized Third Normal Form Denormalized
  • 78. 2. Dashboards 1. Greenfield ANNEX: The Three Phases of a Data Platform 3. Operational Analytics
  • 79. Or Pick the Official “Data Analytics” Certificate 8 Courses 4-6 Months (longer) Intro to R Certificate
  • 80. Export Your Your GPC Costs to BigQuery
  • 81. Monitor Your BigQuery Costs in a Dashboard Monitor Expensive Queries: https://www.pascallandau.com/bigquery-snippets/monitor-query-costs/
  • 82. 2. Dashboards 1. Greenfield ANNEX: The Three Phases of a Data Platform 3. Operational Analytics
  • 83. How To Sync Segments: CDP vs Reverse ETL? Customer Data Platform Reverse ETL (DWH = Central)