Big Data Analytics 
with 
Google BigQuery 
by javier ramirez 
@supercoco9 
https://teowaki.com 
https://datawaki.com
Big Data Analytics with Google BigQuery, by Javier Ramirez, datawaki, at Span Conf
INPUT 
/ 
OUTPUT 
Big Data's 
#1 Enemy
Read one 
terabyte of 
data in 
one second 
javier ramirez @supercoco9 https://teowaki.com
data that exceeds the 
processing capacity of 
conventional database 
systems. The data is too big, 
moves too fast, or doesn’t fit 
the structures of your 
database architectures. 
Ed Dumbill 
program chair for the O’Reilly Strata Conference 
javier ramirez @supercoco9 https://teowaki.com
bigdata is doing a fullscan 
to 330MM rows, matching 
them against a regexp, 
and getting the result 
(223MM rows) in just 5 
seconds 
javier ramirez @supercoco9 https://teowaki.com 
Javier Ramirez 
impresionable teowaki founder
REST API 
+ 
AngularJS web as 
an API client 
javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013
1. non intrusive metrics 
2. keep the history 
3. interactive queries 
4. cheap 
5. extra ball: real time 
javier ramirez @supercoco9 https://teowaki.com
javier ramirez @supercoco9 https://teowaki.com
Apache Hadoop 
Apache Cassandra 
Apache Spark 
Apache Storm 
Amazon Redshift 
javier ramirez @supercoco9 https://teowaki.com
bigdata is cool but... 
expensive cluster 
hard to set up and monitor 
not interactive enough
Our choice: 
Google BigQuery 
Data analysis as a service 
http://developers.google.com/bigquery 
javier ramirez @supercoco9 https://teowaki.com
Based on Dremel 
Specifically designed for 
interactive queries over 
petabytes of real-time 
data 
javier ramirez @supercoco9 https://teowaki.com
What Dremel is used for in Google 
• Analysis of crawled web documents. 
• Tracking install data for applications on Android Market. 
• Crash reporting for Google products. 
• OCR results from Google Books. 
• Spam analysis. 
• Debugging of map tiles on Google Maps. 
• Tablet migrations in managed Bigtable instances. 
• Results of tests run on Google’s distributed build system. 
• Disk I/O statistics for hundreds of thousands of disks. 
• Resource monitoring for jobs run in Google’s data centers. 
• Symbols and dependencies in Google’s codebase.
INDEXES 
Data 
Scientists's 
#1 Enemy
in BigQuery 
everything is 
a full-scan* 
*Over a ridiculously fast distributed filesystem. 
Dremel design goal: 1TB/sec. It was exceeded 
BigQuery delivers ~ 50Gb/Sec. 
javier ramirez @supercoco9 https://teowaki.com
Columnar 
storage 
javier ramirez @supercoco9 https://teowaki.com
Colossus filesystem 
Distributed/redundant 
Parallel reads 
Ultra fast network 
javier ramirez @supercoco9 https://teowaki.com
highly distributed 
execution using a tree 
javier ramirez @supercoco9 https://teowaki.com rubyc kiev 14
Getting started...
create dataset and tables
loading data 
You can feed flat CSV-like 
files or nested JSON 
objects 
javier ramirez @supercoco9 https://teowaki.com
bq cli 
bq load --nosynchronous_mode 
--encoding UTF-8 
--field_delimiter 'tab' 
--max_bad_records 100 
--source_format CSV 
api.stats 20131014T11-42- 
05Z.gz 
javier ramirez @supercoco9 https://teowaki.com
web console screenshot 
javier ramirez @supercoco9 https://teowaki.com
it's just sql, plus... 
analytical SQL functions. 
correlations. 
window functions. 
views. 
JSON fields. 
timestamped tables. 
javier ramirez @supercoco9 https://teowaki.com
Things you always wanted to 
try but were too scared to 
select count(*) from 
publicdata:samples.wikipedia 
where REGEXP_MATCH(title, "[0-9]*") 
AND wp_namespace = 0; 
223,163,387 
Query complete (5.6s elapsed, 9.13 GB processed, Cost: 32¢) 
javier ramirez @supercoco9 https://teowaki.com
Big Data Analytics with Google BigQuery, by Javier Ramirez, datawaki, at Span Conf
Global Database of Events, 
Language and Tone 
quarter billion rows 
30 years 
updated daily 
http://gdeltproject.org/data.html#googlebigquery
SELECT Year, Actor1Name, Actor2Name, Count FROM ( 
SELECT Actor1Name, Actor2Name, Year, 
COUNT(*) Count, RANK() OVER(PARTITION BY YEAR ORDER BY 
Count DESC) rank 
FROM 
(SELECT Actor1Name, Actor2Name, Year FROM 
[gdelt-bq:full.events] WHERE Actor1Name < Actor2Name 
and Actor1CountryCode != '' and Actor2CountryCode != '' 
and Actor1CountryCode!=Actor2CountryCode), 
(SELECT Actor2Name Actor1Name, Actor1Name Actor2Name, 
Year FROM [gdelt-bq:full.events] WHERE 
Actor1Name > Actor2Name and Actor1CountryCode != '' and 
Actor2CountryCode != '' and 
Actor1CountryCode!=Actor2CountryCode), 
WHERE Actor1Name IS NOT null 
AND Actor2Name IS NOT null 
GROUP EACH BY 1, 2, 3 
HAVING Count > 100 
) 
WHERE rank=1 
ORDER BY Year
Big Data Analytics with Google BigQuery, by Javier Ramirez, datawaki, at Span Conf
our most active user 
javier ramirez @supercoco9 https://teowaki.com
10 request we should be caching 
javier ramirez @supercoco9 https://teowaki.com
5 most created resources 
select uri, count(*) total from 
stats where method = 'POST' 
group by URI; 
javier ramirez @supercoco9 http://teowaki.com
...but 
/users/javier/shouts 
/users/rgo/shouts 
/teams/javier-community/links 
/teams/nosqlmatters-cgn/links 
javier ramirez @supercoco9 http://teowaki.com
5 most created resources 
javier ramirez @supercoco9 http://teowaki.com
what is it 
being used for?
Big Data Analytics with Google BigQuery, by Javier Ramirez, datawaki, at Span Conf
Big Data Analytics with Google BigQuery, by Javier Ramirez, datawaki, at Span Conf
Analysing weather information 
Finding patterns in e-commerce 
Match online/offline behaviour 
Log analysys 
Analysing inventory/booking data 
...
warning: BigQuery is not 
open source and not for 
free 
$26 per stored TB 
$5 per processed TB 
*the 1st TB processed every month is free of charge 
javier ramirez @supercoco9 https://teowaki.com
Big Data Analytics with Google BigQuery, by Javier Ramirez, datawaki, at Span Conf
Find related links at 
https://teowaki.com/teams/javier-community/link-categories/bigquery-talk 
Thanks 
javier ramirez 
@supercoco9 
https://teowaki.com 
https://datawaki.com

More Related Content

PPTX
100 Billion Documents And Counting: Rebuilding Message Search at Slack - Josh...
PDF
Moment-Based Estimation for Hierarchical Models in Apache Spark with Kyle Sch...
PDF
Indexing big data in the cloud
PDF
Event Sourcing + CQRS
PDF
Redis is the answer, what's the question - Tech Nottingham
PDF
MongoDB, Event Sourcing & Spark
PPTX
Mining the Web for Information using Hadoop
PPTX
Need 4 speed
100 Billion Documents And Counting: Rebuilding Message Search at Slack - Josh...
Moment-Based Estimation for Hierarchical Models in Apache Spark with Kyle Sch...
Indexing big data in the cloud
Event Sourcing + CQRS
Redis is the answer, what's the question - Tech Nottingham
MongoDB, Event Sourcing & Spark
Mining the Web for Information using Hadoop
Need 4 speed

Viewers also liked (18)

DOCX
Reseña bibliografica, apartheid del siglo xxi
PPTX
Collaborative Evolution of 3D Models
PPTX
Peligros del internet
PPTX
Traffic Cone Marketing - Marketing for Restaurants PowerPoint
PPTX
Kristian
PPT
S L U Social Media
PDF
12 Inspirational Quotes On True Leadership
PDF
allovers met cars
PPT
Padawan Learner-Padme's revenge
PPTX
Don't Quit! Improving Your District's Community Engagement & Communication is...
DOCX
Makalah huruf arab
PPT
Publish Website
PPTX
양제현 보고서 프랑스_sj55
PDF
Penelope Pills - Tessuti
PDF
Conclusion ecc 2012 marc pattinson
PPTX
The Dirty Laundry of Domestic Violence
PDF
July 2015 - Brazil’s to-do list for growth: Where to start?
PDF
Project management
Reseña bibliografica, apartheid del siglo xxi
Collaborative Evolution of 3D Models
Peligros del internet
Traffic Cone Marketing - Marketing for Restaurants PowerPoint
Kristian
S L U Social Media
12 Inspirational Quotes On True Leadership
allovers met cars
Padawan Learner-Padme's revenge
Don't Quit! Improving Your District's Community Engagement & Communication is...
Makalah huruf arab
Publish Website
양제현 보고서 프랑스_sj55
Penelope Pills - Tessuti
Conclusion ecc 2012 marc pattinson
The Dirty Laundry of Domestic Violence
July 2015 - Brazil’s to-do list for growth: Where to start?
Project management
Ad

Similar to Big Data Analytics with Google BigQuery, by Javier Ramirez, datawaki, at Span Conf (20)

PDF
Big Data Analytics with Google BigQuery. GDG Summit Spain 2014
ODP
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
PDF
API Analytics with Redis and Bigquery. NoSQLmatters Cologne '14 edition. Javi...
ODP
Big Data analytics with Nginx, Logstash, Redis, Google Bigquery and Neo4j, ja...
PDF
Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teo...
PDF
Exploring BigData with Google BigQuery
PDF
Executive Intro to BigQuery
PDF
Google Dremel. Concept and Implementations.
PPTX
BigQuery for the Big Data win
PPTX
Introduction to Google BigQuery
PDF
API analytics with Redis and Google Bigquery. NoSQL matters edition
PDF
Big query
PDF
Google BigQuery for Everyday Developer
PDF
api analytics redis bigquery. Lrug
PDF
BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at...
PDF
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
PDF
Google BigQuery is the future of Analytics! (Google Developer Conference)
PDF
Big Query - Women Techmarkers (Ukraine - March 2014)
PDF
Big Query Basics
PPTX
Taras Kloba "Аналіз 100 мільярдів записів за 30 секунд за допомогою Google Bi...
Big Data Analytics with Google BigQuery. GDG Summit Spain 2014
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
API Analytics with Redis and Bigquery. NoSQLmatters Cologne '14 edition. Javi...
Big Data analytics with Nginx, Logstash, Redis, Google Bigquery and Neo4j, ja...
Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teo...
Exploring BigData with Google BigQuery
Executive Intro to BigQuery
Google Dremel. Concept and Implementations.
BigQuery for the Big Data win
Introduction to Google BigQuery
API analytics with Redis and Google Bigquery. NoSQL matters edition
Big query
Google BigQuery for Everyday Developer
api analytics redis bigquery. Lrug
BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
Google BigQuery is the future of Analytics! (Google Developer Conference)
Big Query - Women Techmarkers (Ukraine - March 2014)
Big Query Basics
Taras Kloba "Аналіз 100 мільярдів записів за 30 секунд за допомогою Google Bi...
Ad

More from javier ramirez (20)

PDF
The Future of Fast Databases: Lessons from a Decade of QuestDB
PDF
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
PDF
How We Added Replication to QuestDB - JonTheBeach
PDF
The Building Blocks of QuestDB, a Time Series Database
PDF
¿Se puede vivir del open source? T3chfest
PDF
QuestDB: The building blocks of a fast open-source time-series database
PDF
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
PDF
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
PDF
Deduplicating and analysing time-series data with Apache Beam and QuestDB
PDF
Your Database Cannot Do this (well)
PDF
Your Timestamps Deserve Better than a Generic Database
PDF
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
PDF
QuestDB-Community-Call-20220728
PDF
Processing and analysing streaming data with Python. Pycon Italy 2022
PDF
QuestDB: ingesting a million time series per second on a single instance. Big...
PDF
Servicios e infraestructura de AWS y la próxima región en Aragón
PPTX
Primeros pasos en desarrollo serverless
PDF
How AWS is reinventing the cloud
PDF
Analitica de datos en tiempo real con Apache Flink y Apache BEAM
PDF
Getting started with streaming analytics
The Future of Fast Databases: Lessons from a Decade of QuestDB
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
How We Added Replication to QuestDB - JonTheBeach
The Building Blocks of QuestDB, a Time Series Database
¿Se puede vivir del open source? T3chfest
QuestDB: The building blocks of a fast open-source time-series database
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Deduplicating and analysing time-series data with Apache Beam and QuestDB
Your Database Cannot Do this (well)
Your Timestamps Deserve Better than a Generic Database
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
QuestDB-Community-Call-20220728
Processing and analysing streaming data with Python. Pycon Italy 2022
QuestDB: ingesting a million time series per second on a single instance. Big...
Servicios e infraestructura de AWS y la próxima región en Aragón
Primeros pasos en desarrollo serverless
How AWS is reinventing the cloud
Analitica de datos en tiempo real con Apache Flink y Apache BEAM
Getting started with streaming analytics

Recently uploaded (20)

PDF
Workplace Software and Skills - OpenStax
PDF
AI Guide for Business Growth - Arna Softech
PDF
EaseUS PDF Editor Pro 6.2.0.2 Crack with License Key 2025
PDF
Cloud Native Aachen Meetup - Aug 21, 2025
DOC
UTEP毕业证学历认证,宾夕法尼亚克拉里恩大学毕业证未毕业
PDF
AI/ML Infra Meetup | LLM Agents and Implementation Challenges
PDF
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
PPTX
Lecture 5 Software Requirement Engineering
PDF
BoxLang Dynamic AWS Lambda - Japan Edition
PDF
Type Class Derivation in Scala 3 - Jose Luis Pintado Barbero
PPTX
MLforCyber_MLDataSetsandFeatures_Presentation.pptx
PPTX
Airline CRS | Airline CRS Systems | CRS System
PDF
novaPDF Pro 11.9.482 Crack + License Key [Latest 2025]
PPTX
Python is a high-level, interpreted programming language
PDF
Top 10 Software Development Trends to Watch in 2025 🚀.pdf
PPTX
Full-Stack Developer Courses That Actually Land You Jobs
PDF
AI-Powered Fuzz Testing: The Future of QA
DOCX
Modern SharePoint Intranet Templates That Boost Employee Engagement in 2025.docx
PDF
The Dynamic Duo Transforming Financial Accounting Systems Through Modern Expe...
PPTX
Tech Workshop Escape Room Tech Workshop
Workplace Software and Skills - OpenStax
AI Guide for Business Growth - Arna Softech
EaseUS PDF Editor Pro 6.2.0.2 Crack with License Key 2025
Cloud Native Aachen Meetup - Aug 21, 2025
UTEP毕业证学历认证,宾夕法尼亚克拉里恩大学毕业证未毕业
AI/ML Infra Meetup | LLM Agents and Implementation Challenges
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
Lecture 5 Software Requirement Engineering
BoxLang Dynamic AWS Lambda - Japan Edition
Type Class Derivation in Scala 3 - Jose Luis Pintado Barbero
MLforCyber_MLDataSetsandFeatures_Presentation.pptx
Airline CRS | Airline CRS Systems | CRS System
novaPDF Pro 11.9.482 Crack + License Key [Latest 2025]
Python is a high-level, interpreted programming language
Top 10 Software Development Trends to Watch in 2025 🚀.pdf
Full-Stack Developer Courses That Actually Land You Jobs
AI-Powered Fuzz Testing: The Future of QA
Modern SharePoint Intranet Templates That Boost Employee Engagement in 2025.docx
The Dynamic Duo Transforming Financial Accounting Systems Through Modern Expe...
Tech Workshop Escape Room Tech Workshop

Big Data Analytics with Google BigQuery, by Javier Ramirez, datawaki, at Span Conf

  • 1. Big Data Analytics with Google BigQuery by javier ramirez @supercoco9 https://teowaki.com https://datawaki.com
  • 3. INPUT / OUTPUT Big Data's #1 Enemy
  • 4. Read one terabyte of data in one second javier ramirez @supercoco9 https://teowaki.com
  • 5. data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the structures of your database architectures. Ed Dumbill program chair for the O’Reilly Strata Conference javier ramirez @supercoco9 https://teowaki.com
  • 6. bigdata is doing a fullscan to 330MM rows, matching them against a regexp, and getting the result (223MM rows) in just 5 seconds javier ramirez @supercoco9 https://teowaki.com Javier Ramirez impresionable teowaki founder
  • 7. REST API + AngularJS web as an API client javier ramirez @supercoco9 https://teowaki.com nosqlmatters 2013
  • 8. 1. non intrusive metrics 2. keep the history 3. interactive queries 4. cheap 5. extra ball: real time javier ramirez @supercoco9 https://teowaki.com
  • 9. javier ramirez @supercoco9 https://teowaki.com
  • 10. Apache Hadoop Apache Cassandra Apache Spark Apache Storm Amazon Redshift javier ramirez @supercoco9 https://teowaki.com
  • 11. bigdata is cool but... expensive cluster hard to set up and monitor not interactive enough
  • 12. Our choice: Google BigQuery Data analysis as a service http://developers.google.com/bigquery javier ramirez @supercoco9 https://teowaki.com
  • 13. Based on Dremel Specifically designed for interactive queries over petabytes of real-time data javier ramirez @supercoco9 https://teowaki.com
  • 14. What Dremel is used for in Google • Analysis of crawled web documents. • Tracking install data for applications on Android Market. • Crash reporting for Google products. • OCR results from Google Books. • Spam analysis. • Debugging of map tiles on Google Maps. • Tablet migrations in managed Bigtable instances. • Results of tests run on Google’s distributed build system. • Disk I/O statistics for hundreds of thousands of disks. • Resource monitoring for jobs run in Google’s data centers. • Symbols and dependencies in Google’s codebase.
  • 16. in BigQuery everything is a full-scan* *Over a ridiculously fast distributed filesystem. Dremel design goal: 1TB/sec. It was exceeded BigQuery delivers ~ 50Gb/Sec. javier ramirez @supercoco9 https://teowaki.com
  • 17. Columnar storage javier ramirez @supercoco9 https://teowaki.com
  • 18. Colossus filesystem Distributed/redundant Parallel reads Ultra fast network javier ramirez @supercoco9 https://teowaki.com
  • 19. highly distributed execution using a tree javier ramirez @supercoco9 https://teowaki.com rubyc kiev 14
  • 22. loading data You can feed flat CSV-like files or nested JSON objects javier ramirez @supercoco9 https://teowaki.com
  • 23. bq cli bq load --nosynchronous_mode --encoding UTF-8 --field_delimiter 'tab' --max_bad_records 100 --source_format CSV api.stats 20131014T11-42- 05Z.gz javier ramirez @supercoco9 https://teowaki.com
  • 24. web console screenshot javier ramirez @supercoco9 https://teowaki.com
  • 25. it's just sql, plus... analytical SQL functions. correlations. window functions. views. JSON fields. timestamped tables. javier ramirez @supercoco9 https://teowaki.com
  • 26. Things you always wanted to try but were too scared to select count(*) from publicdata:samples.wikipedia where REGEXP_MATCH(title, "[0-9]*") AND wp_namespace = 0; 223,163,387 Query complete (5.6s elapsed, 9.13 GB processed, Cost: 32¢) javier ramirez @supercoco9 https://teowaki.com
  • 28. Global Database of Events, Language and Tone quarter billion rows 30 years updated daily http://gdeltproject.org/data.html#googlebigquery
  • 29. SELECT Year, Actor1Name, Actor2Name, Count FROM ( SELECT Actor1Name, Actor2Name, Year, COUNT(*) Count, RANK() OVER(PARTITION BY YEAR ORDER BY Count DESC) rank FROM (SELECT Actor1Name, Actor2Name, Year FROM [gdelt-bq:full.events] WHERE Actor1Name < Actor2Name and Actor1CountryCode != '' and Actor2CountryCode != '' and Actor1CountryCode!=Actor2CountryCode), (SELECT Actor2Name Actor1Name, Actor1Name Actor2Name, Year FROM [gdelt-bq:full.events] WHERE Actor1Name > Actor2Name and Actor1CountryCode != '' and Actor2CountryCode != '' and Actor1CountryCode!=Actor2CountryCode), WHERE Actor1Name IS NOT null AND Actor2Name IS NOT null GROUP EACH BY 1, 2, 3 HAVING Count > 100 ) WHERE rank=1 ORDER BY Year
  • 31. our most active user javier ramirez @supercoco9 https://teowaki.com
  • 32. 10 request we should be caching javier ramirez @supercoco9 https://teowaki.com
  • 33. 5 most created resources select uri, count(*) total from stats where method = 'POST' group by URI; javier ramirez @supercoco9 http://teowaki.com
  • 34. ...but /users/javier/shouts /users/rgo/shouts /teams/javier-community/links /teams/nosqlmatters-cgn/links javier ramirez @supercoco9 http://teowaki.com
  • 35. 5 most created resources javier ramirez @supercoco9 http://teowaki.com
  • 36. what is it being used for?
  • 39. Analysing weather information Finding patterns in e-commerce Match online/offline behaviour Log analysys Analysing inventory/booking data ...
  • 40. warning: BigQuery is not open source and not for free $26 per stored TB $5 per processed TB *the 1st TB processed every month is free of charge javier ramirez @supercoco9 https://teowaki.com
  • 42. Find related links at https://teowaki.com/teams/javier-community/link-categories/bigquery-talk Thanks javier ramirez @supercoco9 https://teowaki.com https://datawaki.com