Google Storage, Bigquery and Prediction APIs
Patrick Chanezon, Developer Advocate, Cloud
@chanezon, chanezon@google.com
Sao Paulo, October 29th 2010
Developer DayGoogle 2010
Friday, October 29, 2010
Mobile Agenda for GDD
http://bit.ly/mgddbr
Developer DayGoogle 2010
Friday, October 29, 2010
Developer DayGoogle 2010
Agenda
• Google Storage for Developers
• Prediction API
• BigQuery
Friday, October 29, 2010
What is
cloud
computing?
Infrastructure…
Platform…
Software…
… as a Service
Friday, October 29, 2010
What is
cloud
computing?
Place
Postage
Here
IaaS
PaaS
SaaS
Infrastructure…
Platform…
Software…
…asaService
Friday, October 29, 2010
Developer DayGoogle 2010
Google Storage
Prediction API
BigQuery
1. Google Apps
2. Third party Apps:
Google Apps Marketplace
3. ________
5
Google App Engine
IaaS
PaaS
SaaS
Google's Cloud Offerings
Friday, October 29, 2010
Developer DayGoogle 2010
Google Storage
Prediction API
BigQuery
Your Apps
1. Google Apps
2. Third party Apps:
Google Apps Marketplace
3. ________
5
Google App Engine
IaaS
PaaS
SaaS
Google's Cloud Offerings
Friday, October 29, 2010
Developer DayGoogle 2010
Google Storage for Developers
Store your data in Google's cloud
Friday, October 29, 2010
Developer DayGoogle 2010
What Is Google Storage?
• Store your data in Google's cloud
o any format, any amount, any time
• You control access to your data
o private, shared, or public
• Access via Google APIs or 3rd party tools/libraries
Friday, October 29, 2010
Developer DayGoogle 2010
Sample Use Cases
Static content hosting
e.g. static html, images, music, video
Backup and recovery
e.g. personal data, business records
Sharing
e.g. share data with your customers
Data storage for applications
e.g. used as storage backend for Android, App Engine,
Cloud based apps
Storage for Computation
e.g. BigQuery, Prediction API
Friday, October 29, 2010
Developer DayGoogle 2010
Google Storage Benefits
High Performance and Scalability
Backed by Google infrastructure
Strong Security and Privacy
Control access to your data
Easy to Use
Get started fast with Google & 3rd party tools
Friday, October 29, 2010
Developer DayGoogle 2010
Google Storage Technical Details
• RESTful API 
o Verbs: GET, PUT, POST, HEAD, DELETE 
o Resources: identified by URI
o Compatible with S3 
• Buckets 
o Flat containers, i.e. no bucket hierarchy
 
• Objects 
o Any type
o Size: 100 GB / object
• Access Control for Google Accounts 
o For individuals and groups
• Two Ways to Authenticate Requests 
o Sign request using access keys 
o Web browser login
Friday, October 29, 2010
Developer DayGoogle 2010
Performance and Scalability
• Objects of any type and 100 GB / Object
• Unlimited numbers of objects, 1000s of buckets
• All data replicated to multiple US data centers
• Leveraging Google's worldwide network for data delivery
• Only you can use bucket names with your domain names
• “Read-your-writes” data consistency
• Range Get
Friday, October 29, 2010
Developer DayGoogle 2010
Security and Privacy Features
• Key-based authentication
• Authenticated downloads from a web browser
• Sharing with individuals
• Group sharing via Google Groups
• Access control for buckets and objects
• Set Read/Write/List permissions
Friday, October 29, 2010
Developer DayGoogle 2010
Tools
Google Storage Manager
gsutil
Friday, October 29, 2010
Developer DayGoogle 2010
Google Storage usage within Google
Haiti Relief Imagery USPTO data
Partner Reporting
Google
BigQuery
Google
Prediction API
Partner Reporting
Friday, October 29, 2010
Developer DayGoogle 2010
Some Early Google Storage Adopters
Friday, October 29, 2010
Developer DayGoogle 2010
Google Storage - Pricing
o Storage
$0.17/GB/Month
o Network
Upload - $0.10/GB
Download
$0.30/GB APAC
$0.15/GB Americas / EMEA
o Requests
PUT, POST, LIST - $0.01 / 1,000 Requests
GET, HEAD - $0.01 / 10,000 Requests
Friday, October 29, 2010
Developer DayGoogle 2010
Google Storage - Availability
• Limited preview in US* currently
o 100GB free storage and network per account
o Sign up for wait list at
o http://code.google.com/apis/storage/
* Non-US preview available on case-by-case basis
Friday, October 29, 2010
Developer DayGoogle 2010
Google Storage Summary
• Store any kind of data using Google's cloud infrastructure
• Easy to Use APIs
• Many available tools and libraries
o gsutil, Google Storage Manager
o 3rd party:
Boto, CloudBerry, CyberDuck, JetS3t, …
Friday, October 29, 2010
Developer DayGoogle 2010
Google Prediction API
Google's prediction engine in the cloud
Friday, October 29, 2010
Developer DayGoogle 2010
Introducing the Google Prediction API
• Google's sophisticated machine learning technology
• Available as an on-demand RESTful HTTP web service
Friday, October 29, 2010
Developer DayGoogle 2010
"english"
The quick brown fox jumped over the
lazy dog.
"english"
To err is human, but to really foul things
up you need a computer.
"spanish" No hay mal que por bien no venga.
"spanish" La tercera es la vencida.
?
To be or not to be, that is the
question.
? La fe mueve montañas.
2. PREDICT
The Prediction API
later searches for
those features
during prediction.
How does it work?
1. TRAIN
The Prediction API
finds relevant
features in the
sample data during
training.
Friday, October 29, 2010
Developer DayGoogle 2010
Customer
Sentiment
Transaction
Risk
Species
Identification
Message
Routing
Legal Docket
Classification
Suspicious
Activity
Work Roster
Assignment
Recommend
Products
Political
Bias
Uplift
Marketing
Email
Filtering
Diagnostics
Inappropriate
Content
Career
Counseling
Churn
Prediction
... and many more ...
A virtually endless number of applications...
Friday, October 29, 2010
Developer DayGoogle 2010
Automatically categorize and respond to emails by language
• Customer: ACME Corp, a multinational organization
• Goal: Respond to customer emails in their language
• Data: Many emails, tagged with their languages
• Outcome: Predict language and respond accordingly
A Prediction API Example
Friday, October 29, 2010
Developer DayGoogle 2010
Using the Prediction API
1. Upload
2. Train
Upload your training data to
Google Storage
Build a model from your data
Make new predictions3. Predict
A simple three step process...
Friday, October 29, 2010
Developer DayGoogle 2010
Upload your training data to Google Storage
• Training data: outputs and input features
• Data format: comma separated value format (CSV), result in first column
"english","To err is human, but to really ..."
"spanish","No hay mal que por bien no venga."
...
Upload to Google Storage
gsutil cp ${data} gs://yourbucket/${data}
Step 1: Upload
Friday, October 29, 2010
Developer DayGoogle 2010
Create a new model by training on data
To train a model:
POST prediction/v1.1/training?data=mybucket%2Fmydata
Training runs asynchronously. To see if it has finished:
GET prediction/v1.1/training/mybucket%2Fmydata
{"data":{
"data":"mybucket/mydata",
"modelinfo":"estimated accuracy: 0.xx"}}}
Step 2: Train
Friday, October 29, 2010
Developer DayGoogle 2010
Apply the trained model to make predictions on new data
POST prediction/v1.1/query/mybucket%2Fmydata/predict
{ "data":{
"input": { "text" : [
"J'aime X! C'est le meilleur" ]}}}
Step 3: Predict
Friday, October 29, 2010
Developer DayGoogle 2010
Apply the trained model to make predictions on new data
POST prediction/v1.1/query/mybucket%2Fmydata/predict
{ "data":{
"input": { "text" : [
"J'aime X! C'est le meilleur" ]}}}
{ data : {
"kind" : "prediction#output",
"outputLabel":"French",
"outputMulti" :[
{"label":"French", "score": x.xx}
{"label":"English", "score": x.xx}
{"label":"Spanish", "score": x.xx}]}}
Step 3: Predict
Friday, October 29, 2010
Developer DayGoogle 2010
Apply the trained model to make predictions on new data
import httplib
# put new data in JSON format
params = { ... }
header = {"Content-Type" : "application/json"}
conn =
httplib.HTTPConnection("www.googleapis.com")conn.reques
t("POST",
"/prediction/v1.1/query/mybucket%2Fmydata/predict",
params, header)
print conn.getresponse()
Step 3: Predict
Friday, October 29, 2010
Developer DayGoogle 2010
Data
• Input Features: numeric or unstructured text
• Output: up to hundreds of discrete categories
Training
• Many machine learning techniques
• Automatically selected
• Performed asynchronously
Access from many platforms:
• Web app from Google App Engine
• Apps Script (e.g. from Google Spreadsheet)
• Desktop app
Prediction API Capabilities
Friday, October 29, 2010
Developer DayGoogle 2010
• Updated Syntax
• Multi-category prediction
o Tag entry with multiple labels
• Continuous Output
o Finer grained prediction rankings based on multiple labels
• Mixed Inputs
o Both numeric and text inputs are now supported
Can combine continuous output with mixed inputs
Prediction API v1.1 - new features
Friday, October 29, 2010
Developer DayGoogle 2010
Google BigQuery
Interactive analysis of large datasets in Google's cloud
Friday, October 29, 2010
Developer DayGoogle 2010
Introducing Google BigQuery
– Google's large data adhoc analysis technology
• Analyze massive amounts of data in seconds
– Simple SQL-like query language
– Flexible access
• REST APIs, JSON-RPC, Google Apps Script
Friday, October 29, 2010
Developer DayGoogle 2010
Working with large data is a challenge
Why BigQuery?
Friday, October 29, 2010
Developer DayGoogle 2010
Spam
Trends
Detection
Web
Dashboards
Network
Optimization
Interactive
Tools
Many Use Cases ...
Friday, October 29, 2010
Developer DayGoogle 2010
• Scalable: Billions of rows
• Fast: Response in seconds
• Simple: Queries in SQL
• Web Service
oREST
oJSON-RPC
oGoogle App Scripts
Key Capabilities of BigQuery
Friday, October 29, 2010
Developer DayGoogle 2010
1. Upload
2. Import
Upload your raw data to
Google Storage
Import raw data into
BigQuery table
Perform SQL queries
on table
3. Query
Another simple three step process...
Using BigQuery
Friday, October 29, 2010
Developer DayGoogle 2010
Compact subset of SQL
o SELECT ... FROM ...
WHERE ...
GROUP BY ... ORDER BY ...
LIMIT ...;
Common functions
o Math, String, Time, ...
Additional statistical approximations
o TOP
o COUNT DISTINCT
Writing Queries
Friday, October 29, 2010
Developer DayGoogle 2010
GET /bigquery/v1/tables/{table name}
GET /bigquery/v1/query?q={query}
Sample JSON Reply:
{
"results": {
"fields": { [
{"id":"COUNT(*)","type":"uint64"}, ... ]
},
"rows": [
{"f":[{"v":"2949"}, ...]},
{"f":[{"v":"5387"}, ...]}, ... ]
}
}
Also supports JSON-RPC
BigQuery via REST
Friday, October 29, 2010
Developer DayGoogle 2010
Standard Google Authentication
• Client Login
• OAuth
• AuthSub
HTTPS support
• protects your credentials
• protects your data
Relies on Google Storage to manage access
Security and Privacy
Friday, October 29, 2010
Developer DayGoogle 2010
Wikimedia Revision history data from:
http://download.wikimedia.org/enwiki/latest/enwiki-latest-pages-meta-history.xml.7z
Wikimedia Revision History
Large Data Analysis Example
Friday, October 29, 2010
Developer DayGoogle 2010
Wikimedia Revision history data from:
http://download.wikimedia.org/enwiki/latest/enwiki-latest-pages-meta-history.xml.7z
Wikimedia Revision History
Large Data Analysis Example
Friday, October 29, 2010
Developer DayGoogle 2010
Python DB API 2.0 + B. Clapper's sqlcmd
http://www.clapper.org/software/python/sqlcmd/
Using BigQuery Shell
Friday, October 29, 2010
Developer DayGoogle 2010
BigQuery from a Spreadsheet
Friday, October 29, 2010
Developer DayGoogle 2010
BigQuery from a Spreadsheet
Friday, October 29, 2010
Developer DayGoogle 2010
Input Data: http://delic.io.us/chanezon
–6000 urls, 14000 tags in 6 years
Analyze my delicious tags
–use delicious API to get all tagged urls
–cleanup data, resize (100Mb limit)
–PUT data in Google storage
–Define table
–analyze
Predict how I would tag a technology article
–input is tag,url,text
–send new url and text
–get predicted tag
Prediction API and BigQuery Demo: Tagger
Friday, October 29, 2010
Developer DayGoogle 2010
Nick Johnson’s blog
–http://blog.notdot.net/2010/06/Trying-out-the-
new-Prediction-API
–42,753 submissions, for a week
–63% accuracy, to categorize new submissions
Guessing Subreddits with Prediction API
Friday, October 29, 2010
Developer DayGoogle 2010
• Google Storage
o High speed data storage on Google Cloud
• Prediction API
o Google's machine learning technology able to
predict outcomes based on sample data
• BigQuery
o Interactive analysis of very large data sets
o Simple SQL query language access
Recap
Friday, October 29, 2010
Developer DayGoogle 2010
• Google Storage for Developers
o http://code.google.com/apis/storage
• Prediction API
o http://code.google.com/apis/prediction
• BigQuery
o http://code.google.com/apis/bigquery
More information
Friday, October 29, 2010
Mobile Agenda for GDD
http://bit.ly/mgddbr
Developer DayGoogle 2010
Friday, October 29, 2010
Developer DayGoogle 2010
Friday, October 29, 2010

More Related Content

PDF
GDD Brazil 2010 - What's new in Google App Engine and Google App Engine For B...
PDF
An indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
PDF
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
PPTX
30 days of google cloud event
PDF
Complex realtime event analytics using BigQuery @Crunch Warmup
PDF
Google Big Query UDFs
PDF
How Google Does Big Data - DevNexus 2014
PDF
Exploring BigData with Google BigQuery
GDD Brazil 2010 - What's new in Google App Engine and Google App Engine For B...
An indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
30 days of google cloud event
Complex realtime event analytics using BigQuery @Crunch Warmup
Google Big Query UDFs
How Google Does Big Data - DevNexus 2014
Exploring BigData with Google BigQuery

What's hot (20)

PDF
Google BigQuery
PDF
Google Cloud Platform at Vente-Exclusive.com
PDF
Big Query Basics
PDF
Big query the first step - (MOSG)
PDF
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
PDF
Google BigQuery for Everyday Developer
PDF
How BigQuery broke my heart
ODP
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
PDF
Big query
PDF
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012
PDF
Google and big query
PDF
BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at...
PDF
Google BigQuery - Features & Benefits
PDF
BigQuery for Beginners
PDF
Redshift VS BigQuery
PDF
Connecta Event: Big Query och dataanalys med Google Cloud Platform
PDF
node.js on Google Compute Engine
PPTX
Google BigQuery 101 & What’s New
PPTX
Big Data Best Practices on GCP
PDF
TDC2016SP - Trilha BigData
Google BigQuery
Google Cloud Platform at Vente-Exclusive.com
Big Query Basics
Big query the first step - (MOSG)
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
Google BigQuery for Everyday Developer
How BigQuery broke my heart
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
Big query
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012
Google and big query
BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at...
Google BigQuery - Features & Benefits
BigQuery for Beginners
Redshift VS BigQuery
Connecta Event: Big Query och dataanalys med Google Cloud Platform
node.js on Google Compute Engine
Google BigQuery 101 & What’s New
Big Data Best Practices on GCP
TDC2016SP - Trilha BigData
Ad

Viewers also liked (20)

PDF
JavaScript: agora é sério
PPTX
Contruindo Aplicações móveis com o Cordova e o Visual Studio
PPTX
Curso: Desenvolvimento de aplicativos híbridos (dia 2)
PDF
Mobile Dev - Aplicativos
PDF
Google Spanner - Synchronously-Replicated, Globally-Distributed, Multi-Versio...
PPTX
HockeyApp: A plataforma para seus apps
PDF
Google_case_study_TIL_2014
PDF
Axiodis infographic es
PPT
Ecuaciones2
ZIP
Solving Problems with YUI3: AutoComplete
PPT
Windows 7 (modified to fit training)
PPTX
Curso: Desenvolvimento de aplicativos híbridos (dia 1)
PDF
Seo and the Inner Workings of Google Autocomplete
PDF
DoubleClick - Rich Media Fundamentals
PDF
#Stackoverflow útravaló haladóknak
PPTX
Press & Blogger Outreach for Link Building
PDF
Why UI developers love GraphQL
PPTX
Spanner
PDF
DoubleClick Dynamic Remarketing Case Study
PPTX
Curso de Desenvolvimento de Aplicativos Híbridos com PhoneGap/Cordova, e Ionic
JavaScript: agora é sério
Contruindo Aplicações móveis com o Cordova e o Visual Studio
Curso: Desenvolvimento de aplicativos híbridos (dia 2)
Mobile Dev - Aplicativos
Google Spanner - Synchronously-Replicated, Globally-Distributed, Multi-Versio...
HockeyApp: A plataforma para seus apps
Google_case_study_TIL_2014
Axiodis infographic es
Ecuaciones2
Solving Problems with YUI3: AutoComplete
Windows 7 (modified to fit training)
Curso: Desenvolvimento de aplicativos híbridos (dia 1)
Seo and the Inner Workings of Google Autocomplete
DoubleClick - Rich Media Fundamentals
#Stackoverflow útravaló haladóknak
Press & Blogger Outreach for Link Building
Why UI developers love GraphQL
Spanner
DoubleClick Dynamic Remarketing Case Study
Curso de Desenvolvimento de Aplicativos Híbridos com PhoneGap/Cordova, e Ionic
Ad

Similar to GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs (20)

PDF
Google Developer Day 2010 Japan: Part 1: Google App Engine for Business の概要 P...
PDF
Introduction to Google Cloud platform technologies
PDF
Building Apps on Google Cloud Technologies
PDF
Introduction to Google's Cloud Technologies
PDF
Intro to Google's Cloud Technologies
PDF
Quick Intro to Google Cloud Technologies
PDF
Building Integrated Applications on Google's Cloud Technologies
PPT
Google cloud platform
PDF
Google Cloud for Data Crunchers - Strata Conf 2011
PDF
Building Integrated Applications on Google's Cloud Technologies
PPT
Computing at scale
KEY
CloudOps evening presentation from Google
PDF
Google Cloud Technologies Overview
PDF
Google Platform Overview (April 2014)
PDF
Introduction to Google Cloud Platform Technologies
PDF
Google Technical Webinar - Building Mashups with Google Apps and SAP, using S...
PPTX
Google Developers Overview Deck 2015
PPTX
Google Cloud Platform: Prototype ->Production-> Planet scale
PDF
From zero to Google APIs: Beyond search & AI... leverage all of Google
PDF
Powerful Google developer tools for immediate impact! (2023-24 A)
Google Developer Day 2010 Japan: Part 1: Google App Engine for Business の概要 P...
Introduction to Google Cloud platform technologies
Building Apps on Google Cloud Technologies
Introduction to Google's Cloud Technologies
Intro to Google's Cloud Technologies
Quick Intro to Google Cloud Technologies
Building Integrated Applications on Google's Cloud Technologies
Google cloud platform
Google Cloud for Data Crunchers - Strata Conf 2011
Building Integrated Applications on Google's Cloud Technologies
Computing at scale
CloudOps evening presentation from Google
Google Cloud Technologies Overview
Google Platform Overview (April 2014)
Introduction to Google Cloud Platform Technologies
Google Technical Webinar - Building Mashups with Google Apps and SAP, using S...
Google Developers Overview Deck 2015
Google Cloud Platform: Prototype ->Production-> Planet scale
From zero to Google APIs: Beyond search & AI... leverage all of Google
Powerful Google developer tools for immediate impact! (2023-24 A)

More from Patrick Chanezon (20)

PPTX
KubeCon 2019 - Scaling your cluster (both ways)
PPTX
KubeCon China 2019 - Building Apps with Containers, Functions and Managed Ser...
PPTX
Dockercon 2019 Developing Apps with Containers, Functions and Cloud Services
PPTX
GIDS 2019: Developing Apps with Containers, Functions and Cloud Services
PPTX
Docker Enterprise Workshop - Intro
PPTX
Docker Enterprise Workshop - Technical
PPTX
The Tao of Docker - ITES 2018
PPTX
Moby KubeCon 2017
PPTX
Microsoft Techsummit Zurich Docker and Microsoft
PPTX
Develop and deploy Kubernetes applications with Docker - IBM Index 2018
PPTX
Docker Meetup Feb 2018 Develop and deploy Kubernetes Apps with Docker
PPTX
DockerCon EU 2017 Recap
PPTX
Docker Innovation Culture
PPTX
The Tao of Docker - Devfest Nantes 2017
PPTX
Docker 之道 Modernize Traditional Applications with 无为 Create New Cloud Native ...
PPTX
Moby Open Source Summit North America 2017
PPTX
Moby Introduction - June 2017
PPTX
Docker Cap Gemini CloudXperience 2017 - la revolution des conteneurs logiciels
PPTX
Weave User Group Talk - DockerCon 2017 Recap
PPTX
Oscon 2017: Build your own container-based system with the Moby project
KubeCon 2019 - Scaling your cluster (both ways)
KubeCon China 2019 - Building Apps with Containers, Functions and Managed Ser...
Dockercon 2019 Developing Apps with Containers, Functions and Cloud Services
GIDS 2019: Developing Apps with Containers, Functions and Cloud Services
Docker Enterprise Workshop - Intro
Docker Enterprise Workshop - Technical
The Tao of Docker - ITES 2018
Moby KubeCon 2017
Microsoft Techsummit Zurich Docker and Microsoft
Develop and deploy Kubernetes applications with Docker - IBM Index 2018
Docker Meetup Feb 2018 Develop and deploy Kubernetes Apps with Docker
DockerCon EU 2017 Recap
Docker Innovation Culture
The Tao of Docker - Devfest Nantes 2017
Docker 之道 Modernize Traditional Applications with 无为 Create New Cloud Native ...
Moby Open Source Summit North America 2017
Moby Introduction - June 2017
Docker Cap Gemini CloudXperience 2017 - la revolution des conteneurs logiciels
Weave User Group Talk - DockerCon 2017 Recap
Oscon 2017: Build your own container-based system with the Moby project

Recently uploaded (20)

PPTX
future_of_ai_comprehensive_20250822032121.pptx
PDF
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
PDF
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
PDF
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf
PDF
Improvisation in detection of pomegranate leaf disease using transfer learni...
PDF
Flame analysis and combustion estimation using large language and vision assi...
PDF
Dell Pro Micro: Speed customer interactions, patient processing, and learning...
PDF
Lung cancer patients survival prediction using outlier detection and optimize...
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
PDF
The influence of sentiment analysis in enhancing early warning system model f...
PPTX
MuleSoft-Compete-Deck for midddleware integrations
PDF
Accessing-Finance-in-Jordan-MENA 2024 2025.pdf
PDF
NewMind AI Weekly Chronicles – August ’25 Week IV
PDF
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
PDF
CXOs-Are-you-still-doing-manual-DevOps-in-the-age-of-AI.pdf
PDF
Consumable AI The What, Why & How for Small Teams.pdf
PDF
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf
PDF
Comparative analysis of machine learning models for fake news detection in so...
DOCX
Basics of Cloud Computing - Cloud Ecosystem
PDF
INTERSPEECH 2025 「Recent Advances and Future Directions in Voice Conversion」
future_of_ai_comprehensive_20250822032121.pptx
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf
Improvisation in detection of pomegranate leaf disease using transfer learni...
Flame analysis and combustion estimation using large language and vision assi...
Dell Pro Micro: Speed customer interactions, patient processing, and learning...
Lung cancer patients survival prediction using outlier detection and optimize...
Taming the Chaos: How to Turn Unstructured Data into Decisions
The influence of sentiment analysis in enhancing early warning system model f...
MuleSoft-Compete-Deck for midddleware integrations
Accessing-Finance-in-Jordan-MENA 2024 2025.pdf
NewMind AI Weekly Chronicles – August ’25 Week IV
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
CXOs-Are-you-still-doing-manual-DevOps-in-the-age-of-AI.pdf
Consumable AI The What, Why & How for Small Teams.pdf
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf
Comparative analysis of machine learning models for fake news detection in so...
Basics of Cloud Computing - Cloud Ecosystem
INTERSPEECH 2025 「Recent Advances and Future Directions in Voice Conversion」

GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

  • 1. Google Storage, Bigquery and Prediction APIs Patrick Chanezon, Developer Advocate, Cloud @chanezon, [email protected] Sao Paulo, October 29th 2010 Developer DayGoogle 2010 Friday, October 29, 2010
  • 2. Mobile Agenda for GDD http://bit.ly/mgddbr Developer DayGoogle 2010 Friday, October 29, 2010
  • 3. Developer DayGoogle 2010 Agenda • Google Storage for Developers • Prediction API • BigQuery Friday, October 29, 2010
  • 6. Developer DayGoogle 2010 Google Storage Prediction API BigQuery 1. Google Apps 2. Third party Apps: Google Apps Marketplace 3. ________ 5 Google App Engine IaaS PaaS SaaS Google's Cloud Offerings Friday, October 29, 2010
  • 7. Developer DayGoogle 2010 Google Storage Prediction API BigQuery Your Apps 1. Google Apps 2. Third party Apps: Google Apps Marketplace 3. ________ 5 Google App Engine IaaS PaaS SaaS Google's Cloud Offerings Friday, October 29, 2010
  • 8. Developer DayGoogle 2010 Google Storage for Developers Store your data in Google's cloud Friday, October 29, 2010
  • 9. Developer DayGoogle 2010 What Is Google Storage? • Store your data in Google's cloud o any format, any amount, any time • You control access to your data o private, shared, or public • Access via Google APIs or 3rd party tools/libraries Friday, October 29, 2010
  • 10. Developer DayGoogle 2010 Sample Use Cases Static content hosting e.g. static html, images, music, video Backup and recovery e.g. personal data, business records Sharing e.g. share data with your customers Data storage for applications e.g. used as storage backend for Android, App Engine, Cloud based apps Storage for Computation e.g. BigQuery, Prediction API Friday, October 29, 2010
  • 11. Developer DayGoogle 2010 Google Storage Benefits High Performance and Scalability Backed by Google infrastructure Strong Security and Privacy Control access to your data Easy to Use Get started fast with Google & 3rd party tools Friday, October 29, 2010
  • 12. Developer DayGoogle 2010 Google Storage Technical Details • RESTful API  o Verbs: GET, PUT, POST, HEAD, DELETE  o Resources: identified by URI o Compatible with S3  • Buckets  o Flat containers, i.e. no bucket hierarchy   • Objects  o Any type o Size: 100 GB / object • Access Control for Google Accounts  o For individuals and groups • Two Ways to Authenticate Requests  o Sign request using access keys  o Web browser login Friday, October 29, 2010
  • 13. Developer DayGoogle 2010 Performance and Scalability • Objects of any type and 100 GB / Object • Unlimited numbers of objects, 1000s of buckets • All data replicated to multiple US data centers • Leveraging Google's worldwide network for data delivery • Only you can use bucket names with your domain names • “Read-your-writes” data consistency • Range Get Friday, October 29, 2010
  • 14. Developer DayGoogle 2010 Security and Privacy Features • Key-based authentication • Authenticated downloads from a web browser • Sharing with individuals • Group sharing via Google Groups • Access control for buckets and objects • Set Read/Write/List permissions Friday, October 29, 2010
  • 15. Developer DayGoogle 2010 Tools Google Storage Manager gsutil Friday, October 29, 2010
  • 16. Developer DayGoogle 2010 Google Storage usage within Google Haiti Relief Imagery USPTO data Partner Reporting Google BigQuery Google Prediction API Partner Reporting Friday, October 29, 2010
  • 17. Developer DayGoogle 2010 Some Early Google Storage Adopters Friday, October 29, 2010
  • 18. Developer DayGoogle 2010 Google Storage - Pricing o Storage $0.17/GB/Month o Network Upload - $0.10/GB Download $0.30/GB APAC $0.15/GB Americas / EMEA o Requests PUT, POST, LIST - $0.01 / 1,000 Requests GET, HEAD - $0.01 / 10,000 Requests Friday, October 29, 2010
  • 19. Developer DayGoogle 2010 Google Storage - Availability • Limited preview in US* currently o 100GB free storage and network per account o Sign up for wait list at o http://code.google.com/apis/storage/ * Non-US preview available on case-by-case basis Friday, October 29, 2010
  • 20. Developer DayGoogle 2010 Google Storage Summary • Store any kind of data using Google's cloud infrastructure • Easy to Use APIs • Many available tools and libraries o gsutil, Google Storage Manager o 3rd party: Boto, CloudBerry, CyberDuck, JetS3t, … Friday, October 29, 2010
  • 21. Developer DayGoogle 2010 Google Prediction API Google's prediction engine in the cloud Friday, October 29, 2010
  • 22. Developer DayGoogle 2010 Introducing the Google Prediction API • Google's sophisticated machine learning technology • Available as an on-demand RESTful HTTP web service Friday, October 29, 2010
  • 23. Developer DayGoogle 2010 "english" The quick brown fox jumped over the lazy dog. "english" To err is human, but to really foul things up you need a computer. "spanish" No hay mal que por bien no venga. "spanish" La tercera es la vencida. ? To be or not to be, that is the question. ? La fe mueve montañas. 2. PREDICT The Prediction API later searches for those features during prediction. How does it work? 1. TRAIN The Prediction API finds relevant features in the sample data during training. Friday, October 29, 2010
  • 24. Developer DayGoogle 2010 Customer Sentiment Transaction Risk Species Identification Message Routing Legal Docket Classification Suspicious Activity Work Roster Assignment Recommend Products Political Bias Uplift Marketing Email Filtering Diagnostics Inappropriate Content Career Counseling Churn Prediction ... and many more ... A virtually endless number of applications... Friday, October 29, 2010
  • 25. Developer DayGoogle 2010 Automatically categorize and respond to emails by language • Customer: ACME Corp, a multinational organization • Goal: Respond to customer emails in their language • Data: Many emails, tagged with their languages • Outcome: Predict language and respond accordingly A Prediction API Example Friday, October 29, 2010
  • 26. Developer DayGoogle 2010 Using the Prediction API 1. Upload 2. Train Upload your training data to Google Storage Build a model from your data Make new predictions3. Predict A simple three step process... Friday, October 29, 2010
  • 27. Developer DayGoogle 2010 Upload your training data to Google Storage • Training data: outputs and input features • Data format: comma separated value format (CSV), result in first column "english","To err is human, but to really ..." "spanish","No hay mal que por bien no venga." ... Upload to Google Storage gsutil cp ${data} gs://yourbucket/${data} Step 1: Upload Friday, October 29, 2010
  • 28. Developer DayGoogle 2010 Create a new model by training on data To train a model: POST prediction/v1.1/training?data=mybucket%2Fmydata Training runs asynchronously. To see if it has finished: GET prediction/v1.1/training/mybucket%2Fmydata {"data":{ "data":"mybucket/mydata", "modelinfo":"estimated accuracy: 0.xx"}}} Step 2: Train Friday, October 29, 2010
  • 29. Developer DayGoogle 2010 Apply the trained model to make predictions on new data POST prediction/v1.1/query/mybucket%2Fmydata/predict { "data":{ "input": { "text" : [ "J'aime X! C'est le meilleur" ]}}} Step 3: Predict Friday, October 29, 2010
  • 30. Developer DayGoogle 2010 Apply the trained model to make predictions on new data POST prediction/v1.1/query/mybucket%2Fmydata/predict { "data":{ "input": { "text" : [ "J'aime X! C'est le meilleur" ]}}} { data : { "kind" : "prediction#output", "outputLabel":"French", "outputMulti" :[ {"label":"French", "score": x.xx} {"label":"English", "score": x.xx} {"label":"Spanish", "score": x.xx}]}} Step 3: Predict Friday, October 29, 2010
  • 31. Developer DayGoogle 2010 Apply the trained model to make predictions on new data import httplib # put new data in JSON format params = { ... } header = {"Content-Type" : "application/json"} conn = httplib.HTTPConnection("www.googleapis.com")conn.reques t("POST", "/prediction/v1.1/query/mybucket%2Fmydata/predict", params, header) print conn.getresponse() Step 3: Predict Friday, October 29, 2010
  • 32. Developer DayGoogle 2010 Data • Input Features: numeric or unstructured text • Output: up to hundreds of discrete categories Training • Many machine learning techniques • Automatically selected • Performed asynchronously Access from many platforms: • Web app from Google App Engine • Apps Script (e.g. from Google Spreadsheet) • Desktop app Prediction API Capabilities Friday, October 29, 2010
  • 33. Developer DayGoogle 2010 • Updated Syntax • Multi-category prediction o Tag entry with multiple labels • Continuous Output o Finer grained prediction rankings based on multiple labels • Mixed Inputs o Both numeric and text inputs are now supported Can combine continuous output with mixed inputs Prediction API v1.1 - new features Friday, October 29, 2010
  • 34. Developer DayGoogle 2010 Google BigQuery Interactive analysis of large datasets in Google's cloud Friday, October 29, 2010
  • 35. Developer DayGoogle 2010 Introducing Google BigQuery – Google's large data adhoc analysis technology • Analyze massive amounts of data in seconds – Simple SQL-like query language – Flexible access • REST APIs, JSON-RPC, Google Apps Script Friday, October 29, 2010
  • 36. Developer DayGoogle 2010 Working with large data is a challenge Why BigQuery? Friday, October 29, 2010
  • 38. Developer DayGoogle 2010 • Scalable: Billions of rows • Fast: Response in seconds • Simple: Queries in SQL • Web Service oREST oJSON-RPC oGoogle App Scripts Key Capabilities of BigQuery Friday, October 29, 2010
  • 39. Developer DayGoogle 2010 1. Upload 2. Import Upload your raw data to Google Storage Import raw data into BigQuery table Perform SQL queries on table 3. Query Another simple three step process... Using BigQuery Friday, October 29, 2010
  • 40. Developer DayGoogle 2010 Compact subset of SQL o SELECT ... FROM ... WHERE ... GROUP BY ... ORDER BY ... LIMIT ...; Common functions o Math, String, Time, ... Additional statistical approximations o TOP o COUNT DISTINCT Writing Queries Friday, October 29, 2010
  • 41. Developer DayGoogle 2010 GET /bigquery/v1/tables/{table name} GET /bigquery/v1/query?q={query} Sample JSON Reply: { "results": { "fields": { [ {"id":"COUNT(*)","type":"uint64"}, ... ] }, "rows": [ {"f":[{"v":"2949"}, ...]}, {"f":[{"v":"5387"}, ...]}, ... ] } } Also supports JSON-RPC BigQuery via REST Friday, October 29, 2010
  • 42. Developer DayGoogle 2010 Standard Google Authentication • Client Login • OAuth • AuthSub HTTPS support • protects your credentials • protects your data Relies on Google Storage to manage access Security and Privacy Friday, October 29, 2010
  • 43. Developer DayGoogle 2010 Wikimedia Revision history data from: http://download.wikimedia.org/enwiki/latest/enwiki-latest-pages-meta-history.xml.7z Wikimedia Revision History Large Data Analysis Example Friday, October 29, 2010
  • 44. Developer DayGoogle 2010 Wikimedia Revision history data from: http://download.wikimedia.org/enwiki/latest/enwiki-latest-pages-meta-history.xml.7z Wikimedia Revision History Large Data Analysis Example Friday, October 29, 2010
  • 45. Developer DayGoogle 2010 Python DB API 2.0 + B. Clapper's sqlcmd http://www.clapper.org/software/python/sqlcmd/ Using BigQuery Shell Friday, October 29, 2010
  • 46. Developer DayGoogle 2010 BigQuery from a Spreadsheet Friday, October 29, 2010
  • 47. Developer DayGoogle 2010 BigQuery from a Spreadsheet Friday, October 29, 2010
  • 48. Developer DayGoogle 2010 Input Data: http://delic.io.us/chanezon –6000 urls, 14000 tags in 6 years Analyze my delicious tags –use delicious API to get all tagged urls –cleanup data, resize (100Mb limit) –PUT data in Google storage –Define table –analyze Predict how I would tag a technology article –input is tag,url,text –send new url and text –get predicted tag Prediction API and BigQuery Demo: Tagger Friday, October 29, 2010
  • 49. Developer DayGoogle 2010 Nick Johnson’s blog –http://blog.notdot.net/2010/06/Trying-out-the- new-Prediction-API –42,753 submissions, for a week –63% accuracy, to categorize new submissions Guessing Subreddits with Prediction API Friday, October 29, 2010
  • 50. Developer DayGoogle 2010 • Google Storage o High speed data storage on Google Cloud • Prediction API o Google's machine learning technology able to predict outcomes based on sample data • BigQuery o Interactive analysis of very large data sets o Simple SQL query language access Recap Friday, October 29, 2010
  • 51. Developer DayGoogle 2010 • Google Storage for Developers o http://code.google.com/apis/storage • Prediction API o http://code.google.com/apis/prediction • BigQuery o http://code.google.com/apis/bigquery More information Friday, October 29, 2010
  • 52. Mobile Agenda for GDD http://bit.ly/mgddbr Developer DayGoogle 2010 Friday, October 29, 2010