SlideShare a Scribd company logo
Gabriel PREDA
@eRadical
(Almost) Serverless Analytics System
with BigQuery & AppEngine
Agenda
Going Serverless with
AppEngine & Tasks
Pub/Sub, DataStore
BigQuery
Load
Batch
Streaming Inserts
Query
UDF
Export
...some BigQueries...
AeonsSome years ago...
~ 500,000 - 2,000,000 events / day
(on average)
Some time ago...
~2,000,000 - 22,000,000 events / day
Dec 2014: 57,430,000 events / day
1 day to recompute » 12 hours
NOW()
22,000,000 - 70,000,000 events / day
AVG » 40,000,000 events / day
Processing ~30GB-70GB / day
Recompute 1 day » 10-20 minutes
serverless?
Desired for: https://www.innertrends.com
other... (almost) serverless products
Cloud Functions (alpha - Node.JS)
Cloud DataFlow (Java, Python - beta)
BigQuery
https://cloud.google.com/bigquery/docs/
BigQuery - data types
● STRING - UTF-8 (2 bytes + encoded string size)
● BYTES - base64 encoded (except in Avro)
● INTEGER - 64-bit signed (8 bytes)
● FLOAT (8 bytes)
● BOOLEAN - true/false, 1/0 only in CSV (1 byte)
● TIMESTAMP ex:”2014-08-19 12:41:35.220 UTC” (8 bytes)
● DATE, TIME, DATETIME - limited support in Legacy SQL
● RECORD - a collection of fields (size of fields)
https://cloud.google.com/bigquery/data-types
BigQuery -> loadData()
Formats: CSV, JSON (newline delimited), Avro, Parquet (experimental)
Tools: Web UI, bq, API
Source:
local files,
Cloud Storage, [demo]
Cloud Datastore (backup files),
POST requests,
SQL DML*
Google Sheets
- Federated Data Sources
- Streaming Inserts
BigQuery -> loadData()
bq load ...
BigQuery -> loadData()
Got some rows?
BigQuery -> SELECT … FROM surprise…
query:
SELECT { * | field_path.* | expression } [ [ AS ] alias ] [ , ... ]
[ FROM from_body
[ WHERE bool_expression ]
[ OMIT RECORD IF bool_expression]
[ GROUP [ EACH ] BY [ ROLLUP ] { field_name_or_alias } [ , ... ] ]
[ HAVING bool_expression ]
[ ORDER BY field_name_or_alias [ { DESC | ASC } ] [, ... ] ]
[ LIMIT n ]
];
from_body:
from_item [, ...] | # Warning: Comma means UNION ALL here
from_item [ join_type ] JOIN [ EACH ] from_item [ ON join_predicate ] |
(FLATTEN({ table_name | (query) }, field_name_or_alias)) |
table_wildcard_function
from_item:
{ table_name | (query) } [ [ AS ] alias ]
join_type:
{ INNER | [ FULL ] [ OUTER ] | RIGHT [ OUTER ] | LEFT [ OUTER ] | CROSS }
BigQuery -> SELECT … FROM surprise…
Date-Partitioned Tables [demo]
Table Decorators - See the past w/ @
Table Wildcard Functions - TABLE_DATE_RANGE() & TABLE_QUERY()
Interesting functions
- DateTime » UTC_USEC_TO_DAY/HOUR/MONTH/WEEK/YEAR()
» Shifts a UNIX timestamp in microseconds to the beginning of the period it occurs in.
- JSON_EXTRACT[_SCALAR]()
- URL functions » HOST(), DOMAIN(), TLD()
- REGEXP_MATCH(), REGEXP_EXTRACT()
bigquery.defineFunction(
'expandAssetLibrary', // Name of the function exported to SQL
['user_id', 'video_id', 'stage_settings'], // Names of input columns
[ {'name': 'user_id', 'type': 'integer'}, // Output schema
{'name': 'video_id', 'type': 'string'},
{'name': 'asset', 'type': 'string'} ],
expandAssetLibrary // Reference to JavaScript UDF
);
function expandAssetLibrary(row, emit) { …………………………
emit({ user_id: row.user_id, video_id: row.video_id, asset: ss.url.replace('http://', ''));
}
BigQuery -> User Defined Functions
BigQuery -> DML
Standard SQL only
Maximum UPDATE/DELETE statements per day per table: 48
Maximum UPDATE/DELETE statements per day per project: 500
Maximum INSERT statements per day per table: 1,000
Maximum INSERT statements per day per project: 10,000
BigQuery -> export()
To: Google Cloud Storage
Format: CSV, JSON [.gz], Avro
…1G files
BigQuery -> some (Big)Queries
SELECT year, count(1)
FROM [bigquery-public-data:samples.natality]
WHERE father_age < 18
GROUP BY year
ORDER BY year
SELECT year, count(1)
FROM [bigquery-public-data:samples.natality]
WHERE mother_age < 18
GROUP BY year
ORDER BY year
SELECT table_id, row_count, CEIL(size_bytes/POW(1024, 3)) AS gb
FROM [bigquery-public-data:ghcn_m.__TABLES__] ORDER BY gb DESC
BigQuery -> some (Big)Queries
SELECT REGEXP_EXTRACT(path, r'.*.(.*)$') AS file_extension,
COUNT(1) AS k
FROM [bigquery-public-data:github_repos.files]
GROUP BY file_extension
ORDER BY k DESC
LIMIT 20
SELECT table_id, row_count,
CEIL(size_bytes/POW(1024, 3)) AS gb
FROM [bigquery-public-data:github_repos.__TABLES__]
ORDER BY gb DESC

More Related Content

What's hot (20)

PDF
Introduction to cron queue
ADCI Solutions
 
PPTX
Functional programming
Nyarai Tinashe Gomiwa
 
ODP
Data analytics with hadoop hive on multiple data centers
Hirotaka Niisato
 
PDF
2016 gunma.web games-and-asm.js
Noritada Shimizu
 
PDF
20151224-games
Noritada Shimizu
 
PPTX
Asynchronous programming
Filip Ekberg
 
PPTX
No More Deadlocks; Asynchronous Programming in .NET
Filip Ekberg
 
PDF
RxJS 5 in Depth
C4Media
 
PPTX
Working with NoSQL in a SQL Database (XDevApi)
Lior Altarescu
 
PPTX
NoSQL in SQL - Lior Altarescu
Wix Engineering
 
KEY
W3C HTML5 KIG-How to write low garbage real-time javascript
Changhwan Yi
 
DOCX
A Shiny Example-- R
Dr. Volkan OBAN
 
PPTX
University of Bedford Knowledge Network 2.12.13
Business BUZZ - Watford
 
PPTX
Data visualization by Kenneth Odoh
pyconfi
 
PDF
Do something in 5 minutes with gas 1-use spreadsheet as database
Bruce McPherson
 
PPTX
Functional Programming
SovTech (Scrums.com)
 
PPTX
Visdjango presentation django_boston_oct_2014
jlbaldwin
 
PDF
Rubyconfindia2018 - GPU accelerated libraries for Ruby
Prasun Anand
 
PPTX
Business Networking Cambridge April 2014
Business BUZZ - Watford
 
PDF
G* on GAE/J 挑戦編
Tsuyoshi Yamamoto
 
Introduction to cron queue
ADCI Solutions
 
Functional programming
Nyarai Tinashe Gomiwa
 
Data analytics with hadoop hive on multiple data centers
Hirotaka Niisato
 
2016 gunma.web games-and-asm.js
Noritada Shimizu
 
20151224-games
Noritada Shimizu
 
Asynchronous programming
Filip Ekberg
 
No More Deadlocks; Asynchronous Programming in .NET
Filip Ekberg
 
RxJS 5 in Depth
C4Media
 
Working with NoSQL in a SQL Database (XDevApi)
Lior Altarescu
 
NoSQL in SQL - Lior Altarescu
Wix Engineering
 
W3C HTML5 KIG-How to write low garbage real-time javascript
Changhwan Yi
 
A Shiny Example-- R
Dr. Volkan OBAN
 
University of Bedford Knowledge Network 2.12.13
Business BUZZ - Watford
 
Data visualization by Kenneth Odoh
pyconfi
 
Do something in 5 minutes with gas 1-use spreadsheet as database
Bruce McPherson
 
Functional Programming
SovTech (Scrums.com)
 
Visdjango presentation django_boston_oct_2014
jlbaldwin
 
Rubyconfindia2018 - GPU accelerated libraries for Ruby
Prasun Anand
 
Business Networking Cambridge April 2014
Business BUZZ - Watford
 
G* on GAE/J 挑戦編
Tsuyoshi Yamamoto
 

Viewers also liked (20)

PDF
Mashing the data
Felix Crisan
 
PPTX
Social Media For Beginners - Agcas 2012
Matthew Mobbs
 
PPSX
9no a 2da version
Ana María
 
PDF
Framtidens ehandel redan idag
Ulrika Schreil
 
PDF
Introducción a la cerámica popular canaria cuadernillo
Gustavo Rivero Vega
 
PDF
Weekly plannig52012
Atech System & Graphics Designs
 
PPTX
Свято 8 Березня в середній групі "Ромашка" ДНЗ № 28 м. Мукачево
Наталія Бабич
 
PPTX
Worcester Food & Active Living Policy Council: An Introduction
esheehancastro
 
PDF
Innovation in digital schools Gess Dubai 2013
Carlos J. Ochoa Fernández
 
PPTX
Professional scepticism judgment uia 2
Nik Hasyudeen
 
PDF
8th pre alg -jan22
jdurst65
 
PDF
Introducción a la ciencia e ingeniería de los materiales william d. callist...
elkinn
 
PPT
Guitar 5th grade
Whitehead_Music
 
PDF
IntroduccióN A La ClíNica PsicolóGica Con NiñOs
guesta14865ae
 
PDF
Evolucion de la informatica y su aplicacion
Jessy Acosta
 
PPTX
Introducción a la CMNUCC
CO2.cr
 
PPT
Retailing
Sandeep Singh Saini
 
PPTX
INTRODUCCIÓN A LA COMUNICACIÓN CIENTIFÍCA
Adriana Amo
 
PPTX
Introducción a la Biotecnología. Capítulo 2
CiberGeneticaUNAM
 
Mashing the data
Felix Crisan
 
Social Media For Beginners - Agcas 2012
Matthew Mobbs
 
9no a 2da version
Ana María
 
Framtidens ehandel redan idag
Ulrika Schreil
 
Introducción a la cerámica popular canaria cuadernillo
Gustavo Rivero Vega
 
Weekly plannig52012
Atech System & Graphics Designs
 
Свято 8 Березня в середній групі "Ромашка" ДНЗ № 28 м. Мукачево
Наталія Бабич
 
Worcester Food & Active Living Policy Council: An Introduction
esheehancastro
 
Innovation in digital schools Gess Dubai 2013
Carlos J. Ochoa Fernández
 
Professional scepticism judgment uia 2
Nik Hasyudeen
 
8th pre alg -jan22
jdurst65
 
Introducción a la ciencia e ingeniería de los materiales william d. callist...
elkinn
 
Guitar 5th grade
Whitehead_Music
 
IntroduccióN A La ClíNica PsicolóGica Con NiñOs
guesta14865ae
 
Evolucion de la informatica y su aplicacion
Jessy Acosta
 
Introducción a la CMNUCC
CO2.cr
 
INTRODUCCIÓN A LA COMUNICACIÓN CIENTIFÍCA
Adriana Amo
 
Introducción a la Biotecnología. Capítulo 2
CiberGeneticaUNAM
 
Ad

Similar to (Almost) Serverless Analytics System with BigQuery & AppEngine (20)

PPS
App bot
Jayanta Mukherjee
 
PDF
Using redux and angular 2 with meteor
Ken Ono
 
PDF
Using redux and angular 2 with meteor
Ken Ono
 
PPTX
U-SQL Query Execution and Performance Tuning
Michael Rys
 
PDF
Writing MySQL User-defined Functions in JavaScript
Roland Bouman
 
PDF
Rethinking metrics: metrics 2.0 @ Lisa 2014
Dieter Plaetinck
 
PDF
BigQueryで作る分析環境
将央 山口
 
PDF
03 2017Emea_RoadshowMilan-WhatsNew-Mariadbserver10_2andmaxscale 2_1
mlraviol
 
PDF
A Tour of Building Web Applications with R Shiny
Wendy Chen Dubois
 
PDF
What’s New in MariaDB Server 10.2
MariaDB plc
 
PDF
Large volume data analysis on the Typesafe Reactive Platform
Martin Zapletal
 
PDF
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
MariaDB plc
 
PDF
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
MariaDB plc
 
PDF
Programming IoT Gateways in JavaScript with macchina.io
Günter Obiltschnig
 
PDF
BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at...
Big Data Spain
 
PPTX
MySQL performance monitoring using Statsd and Graphite
DB-Art
 
PPTX
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Altinity Ltd
 
PDF
Monitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
InfluxData
 
PDF
Spark SQL Deep Dive @ Melbourne Spark Meetup
Databricks
 
PDF
Die Neuheiten in MariaDB 10.2 und MaxScale 2.1
MariaDB plc
 
Using redux and angular 2 with meteor
Ken Ono
 
Using redux and angular 2 with meteor
Ken Ono
 
U-SQL Query Execution and Performance Tuning
Michael Rys
 
Writing MySQL User-defined Functions in JavaScript
Roland Bouman
 
Rethinking metrics: metrics 2.0 @ Lisa 2014
Dieter Plaetinck
 
BigQueryで作る分析環境
将央 山口
 
03 2017Emea_RoadshowMilan-WhatsNew-Mariadbserver10_2andmaxscale 2_1
mlraviol
 
A Tour of Building Web Applications with R Shiny
Wendy Chen Dubois
 
What’s New in MariaDB Server 10.2
MariaDB plc
 
Large volume data analysis on the Typesafe Reactive Platform
Martin Zapletal
 
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
MariaDB plc
 
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
MariaDB plc
 
Programming IoT Gateways in JavaScript with macchina.io
Günter Obiltschnig
 
BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at...
Big Data Spain
 
MySQL performance monitoring using Statsd and Graphite
DB-Art
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Altinity Ltd
 
Monitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
InfluxData
 
Spark SQL Deep Dive @ Melbourne Spark Meetup
Databricks
 
Die Neuheiten in MariaDB 10.2 und MaxScale 2.1
MariaDB plc
 
Ad

Recently uploaded (20)

PDF
Why is partnering with a SaaS development company crucial for enterprise succ...
Nextbrain Technologies
 
PDF
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
PDF
Latest Capcut Pro 5.9.0 Crack Version For PC {Fully 2025
utfefguu
 
PDF
Empower Your Tech Vision- Why Businesses Prefer to Hire Remote Developers fro...
logixshapers59
 
PDF
Generic or Specific? Making sensible software design decisions
Bert Jan Schrijver
 
PDF
Technical-Careers-Roadmap-in-Software-Market.pdf
Hussein Ali
 
PPTX
Milwaukee Marketo User Group - Summer Road Trip: Mapping and Personalizing Yo...
bbedford2
 
PPTX
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
PPTX
Foundations of Marketo Engage - Powering Campaigns with Marketo Personalization
bbedford2
 
PPTX
Get Started with Maestro: Agent, Robot, and Human in Action – Session 5 of 5
klpathrudu
 
PDF
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
PDF
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
PDF
AOMEI Partition Assistant Crack 10.8.2 + WinPE Free Downlaod New Version 2025
bashirkhan333g
 
PPTX
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
PPTX
From spreadsheets and delays to real-time control
SatishKumar2651
 
PPTX
Build a Custom Agent for Agentic Testing.pptx
klpathrudu
 
PDF
ERP Consulting Services and Solutions by Contetra Pvt Ltd
jayjani123
 
PPTX
Library_Management_System_PPT111111.pptx
nmtnissancrm
 
PDF
Best Web development company in india 2025
Greenusys
 
PPTX
Function & Procedure: Function Vs Procedure in PL/SQL
Shani Tiwari
 
Why is partnering with a SaaS development company crucial for enterprise succ...
Nextbrain Technologies
 
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
Latest Capcut Pro 5.9.0 Crack Version For PC {Fully 2025
utfefguu
 
Empower Your Tech Vision- Why Businesses Prefer to Hire Remote Developers fro...
logixshapers59
 
Generic or Specific? Making sensible software design decisions
Bert Jan Schrijver
 
Technical-Careers-Roadmap-in-Software-Market.pdf
Hussein Ali
 
Milwaukee Marketo User Group - Summer Road Trip: Mapping and Personalizing Yo...
bbedford2
 
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
Foundations of Marketo Engage - Powering Campaigns with Marketo Personalization
bbedford2
 
Get Started with Maestro: Agent, Robot, and Human in Action – Session 5 of 5
klpathrudu
 
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
AOMEI Partition Assistant Crack 10.8.2 + WinPE Free Downlaod New Version 2025
bashirkhan333g
 
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
From spreadsheets and delays to real-time control
SatishKumar2651
 
Build a Custom Agent for Agentic Testing.pptx
klpathrudu
 
ERP Consulting Services and Solutions by Contetra Pvt Ltd
jayjani123
 
Library_Management_System_PPT111111.pptx
nmtnissancrm
 
Best Web development company in india 2025
Greenusys
 
Function & Procedure: Function Vs Procedure in PL/SQL
Shani Tiwari
 

(Almost) Serverless Analytics System with BigQuery & AppEngine

  • 1. Gabriel PREDA @eRadical (Almost) Serverless Analytics System with BigQuery & AppEngine
  • 2. Agenda Going Serverless with AppEngine & Tasks Pub/Sub, DataStore BigQuery Load Batch Streaming Inserts Query UDF Export ...some BigQueries...
  • 3. AeonsSome years ago... ~ 500,000 - 2,000,000 events / day (on average)
  • 4. Some time ago... ~2,000,000 - 22,000,000 events / day Dec 2014: 57,430,000 events / day 1 day to recompute » 12 hours
  • 5. NOW() 22,000,000 - 70,000,000 events / day AVG » 40,000,000 events / day Processing ~30GB-70GB / day Recompute 1 day » 10-20 minutes
  • 7. other... (almost) serverless products Cloud Functions (alpha - Node.JS) Cloud DataFlow (Java, Python - beta)
  • 9. BigQuery - data types ● STRING - UTF-8 (2 bytes + encoded string size) ● BYTES - base64 encoded (except in Avro) ● INTEGER - 64-bit signed (8 bytes) ● FLOAT (8 bytes) ● BOOLEAN - true/false, 1/0 only in CSV (1 byte) ● TIMESTAMP ex:”2014-08-19 12:41:35.220 UTC” (8 bytes) ● DATE, TIME, DATETIME - limited support in Legacy SQL ● RECORD - a collection of fields (size of fields) https://cloud.google.com/bigquery/data-types
  • 10. BigQuery -> loadData() Formats: CSV, JSON (newline delimited), Avro, Parquet (experimental) Tools: Web UI, bq, API Source: local files, Cloud Storage, [demo] Cloud Datastore (backup files), POST requests, SQL DML* Google Sheets - Federated Data Sources - Streaming Inserts
  • 13. BigQuery -> SELECT … FROM surprise… query: SELECT { * | field_path.* | expression } [ [ AS ] alias ] [ , ... ] [ FROM from_body [ WHERE bool_expression ] [ OMIT RECORD IF bool_expression] [ GROUP [ EACH ] BY [ ROLLUP ] { field_name_or_alias } [ , ... ] ] [ HAVING bool_expression ] [ ORDER BY field_name_or_alias [ { DESC | ASC } ] [, ... ] ] [ LIMIT n ] ]; from_body: from_item [, ...] | # Warning: Comma means UNION ALL here from_item [ join_type ] JOIN [ EACH ] from_item [ ON join_predicate ] | (FLATTEN({ table_name | (query) }, field_name_or_alias)) | table_wildcard_function from_item: { table_name | (query) } [ [ AS ] alias ] join_type: { INNER | [ FULL ] [ OUTER ] | RIGHT [ OUTER ] | LEFT [ OUTER ] | CROSS }
  • 14. BigQuery -> SELECT … FROM surprise… Date-Partitioned Tables [demo] Table Decorators - See the past w/ @ Table Wildcard Functions - TABLE_DATE_RANGE() & TABLE_QUERY() Interesting functions - DateTime » UTC_USEC_TO_DAY/HOUR/MONTH/WEEK/YEAR() » Shifts a UNIX timestamp in microseconds to the beginning of the period it occurs in. - JSON_EXTRACT[_SCALAR]() - URL functions » HOST(), DOMAIN(), TLD() - REGEXP_MATCH(), REGEXP_EXTRACT()
  • 15. bigquery.defineFunction( 'expandAssetLibrary', // Name of the function exported to SQL ['user_id', 'video_id', 'stage_settings'], // Names of input columns [ {'name': 'user_id', 'type': 'integer'}, // Output schema {'name': 'video_id', 'type': 'string'}, {'name': 'asset', 'type': 'string'} ], expandAssetLibrary // Reference to JavaScript UDF ); function expandAssetLibrary(row, emit) { ………………………… emit({ user_id: row.user_id, video_id: row.video_id, asset: ss.url.replace('http://', '')); } BigQuery -> User Defined Functions
  • 16. BigQuery -> DML Standard SQL only Maximum UPDATE/DELETE statements per day per table: 48 Maximum UPDATE/DELETE statements per day per project: 500 Maximum INSERT statements per day per table: 1,000 Maximum INSERT statements per day per project: 10,000
  • 17. BigQuery -> export() To: Google Cloud Storage Format: CSV, JSON [.gz], Avro …1G files
  • 18. BigQuery -> some (Big)Queries SELECT year, count(1) FROM [bigquery-public-data:samples.natality] WHERE father_age < 18 GROUP BY year ORDER BY year SELECT year, count(1) FROM [bigquery-public-data:samples.natality] WHERE mother_age < 18 GROUP BY year ORDER BY year SELECT table_id, row_count, CEIL(size_bytes/POW(1024, 3)) AS gb FROM [bigquery-public-data:ghcn_m.__TABLES__] ORDER BY gb DESC
  • 19. BigQuery -> some (Big)Queries SELECT REGEXP_EXTRACT(path, r'.*.(.*)$') AS file_extension, COUNT(1) AS k FROM [bigquery-public-data:github_repos.files] GROUP BY file_extension ORDER BY k DESC LIMIT 20 SELECT table_id, row_count, CEIL(size_bytes/POW(1024, 3)) AS gb FROM [bigquery-public-data:github_repos.__TABLES__] ORDER BY gb DESC