SlideShare a Scribd company logo
Introducción a las soluciones
Big Data de Google
Ismael Yuste
Strategic Cloud Engineer Google Cloud
MSMK Madrid, 21 de Septiembre de 2017
Agenda
● Google Cloud Platform
● BigData
● Machine Learning
● Use Cases
Google Cloud
Platform
Google
Data Centers
Los centros de datos de Google son la
base de toda la plataforma de Google
Cloud. Ofrecen poder computación,
almacenamiento, memoria, GPUs para
nuestras aplicaciones. Además,
alberga el corazón de aplicaciones
como Gmail, Youtube, Search...
● Rapidez
● Baja latencia
● Eficiencia de operaciones
● Eficiencia Energética
● Uso de Energías Renovables
● Cercanía al usuario
● Seguridad de la Información
Google Datacenters - Cloud Regions
Big Data
Soluciones de Big Data integradas de
principio a fin, que permite capturar los
datos, procesarlos y almacenarlos en
una plataforma integrada. Combina
servicios nativos en la nube y
herramientas Open Source
gestionadas, tanto en tiempo real como
por lotes.
Big Data
BigQuery
Cloud
Dataflow
Cloud
Dataproc
Cloud
Datalab
Cloud
Pub/Sub
Genomics
Big Data - Big Query
Tu almacén de
datos corporativo,
rápido, económico
y completamente
gestionado para
análisis de
grandes grupos
de datos
● Ingestión de datos flexible.
● Disponibilidad global.
● Seguridad y permisos integrados.
● Control de coste.
● Altamente disponible.
● Completamente integrado.
● Conecta con otros productos de Google.
Big Data - Cloud Dataflow
Servicio
completamente
gestionado y
modelo de
programación
para el proceso de
Big Data
● Gestión de Recursos integrado.
● A demanda.
● Ejecución de los trabajos inteligente.
● Auto escalado.
● Modelo de programación unificado.
● Open Source.
● Monitorizaje.
● Integración.
● Procesado confiable y consistente.
Big Data - Cloud Dataproc
Servicio
gestionado Spark
y Hadoop
● Gestión de Cluster integrado.
● Cluster dimensionables.
● Integración.
● Versionado.
● Herramientas de Gestión.
● Acciones de inicialización.
● Gestión manual o automática.
● Máquinas Virtuales flexibles.
Big Data
Datalab. Herramienta de exploración, análisis y visualización de
Big Data.
Pub/Sub. Servicio global en tiempo real para gestión de
mensajes y streaming de datos.
Big Data
Dataprep. Servicio de datos inteligente que permite explorar,
limpiar y preparar datos estructurados o no para su posterior
análisis.
Data Studio. Convierte tus datos en informes y cuadros de
mando que son sencillos de crear, de compartir, y totalmente
personalizables, desde fuentes de datos como Bigquery,
Analytics o Youtube.
Data Lifecycle Steps
Ingest
The first stage is to pull in
the raw data, such as
streaming data from
devices, on-premises
batch data, application
logs, or mobile-app user
events and analytics.
Store
After the data has been
retrieved, it needs to be
stored in a format that is
durable and can be easily
accessed.
Process & Analyze
In this stage, the data is
transformed from raw
form into actionable
information.
Explore & Visualize
The final stage is to
convert the results of the
analysis into a format
that is easy to draw
insights from and to
share with colleagues
and peers.
© 2017 Google Inc. All rights reserved.
Ingestion Storage Process & Analyze
Cloud Pub/Sub
Stackdriver
Logging
Cloud Transfer
Service
Cloud Storage
Cloud SQL
Cloud Datastore
Cloud BigTable
BigQuery
Cloud Dataflow
Cloud Dataproc
BigQuery
Cloud Console
Google Data Studio
Google Sheets
Cloud Datalab
BI/Analytics
Partners
Cloud Spanner
Explore & Visualize
Products to Support Data Lifecycle
Typical Big Data
Jobs Programming
Resource
provisioning
Performance
tuning
Monitoring
Reliability
Deployment &
configuration
Handling
growing scale
Utilization
improvements
Big Data with
Google
Focus on insights.
Not infrastructure.
From batch to real-time.
Programming
Understanding
Data & Analytics
Cloud Dataproc
Fully managed Hadoop and Spark with
industry-leading performance
BigQuery
Fully managed data warehouse for
large-scale analytics
Cloud Dataflow
Real-time data pipelines, with open source
SDK via Apache Beam
Separation of Storage and Compute
● Access any storage system from any processing tool
● Keep as much data as you want, economically
● Share data in place, no more FTP and copying
Storage
Processing
BigQuery Storage
(tables)
BigQuery Analytics
Cloud Bigtable
(NoSQL)
Cloud Dataproc
Cloud Storage
(files)
Cloud Dataflow
10+ years of Big Data innovation - Open Source
Google
Papers
20082002 2004 2006 2010 2012 2014 2015
GFS
Map
Reduce
Flume
Java
Millwheel
Open
Source
2005
Google
Cloud
Products BigQuery Pub/Sub Dataflow Bigtable
BigTable Dremel PubSub
Tensorflow
Dataflow
Apache
Beam(Incubating)
Product Mapping
BigQuery
Cloud
Dataflow
Cloud
Dataproc
Cloud
Datalab
Cloud
Pub/Sub
Machine Learning
Google Cloud ML Platform facilita
servicios modernos de machine
learning, con modelos pre-entrenados y
un servicio para generar tus propios
modelos.
Machine Learning
Cloud Machine
Learning
Vision API
Speech
API
Natural
Language API
Translation
API
Jobs API
Machine Learning - Cloud ML
Machine
learning sobre
cualquier tipo y
volumen de
datos
● Predicción a escala.
● Construcción de modelos sencilla.
● Capacidades de Aprendizaje Profundo (Deep Learning).
● Integración.
● HyperTune.
● Servicio gestionado y escalable.
● Modelos portables.
Machine Learning - APIs
Vision API . Analiza imágenes con el poder
de Google.
Speech API. Convierte conversaciones a
texto con el poder de la nube.
Machine Learning - APIs
Natural Language API . Saca conclusiones
de texto desestructurado con Cloud ML.
Translation API. Traduce sobre la marcha
entre miles de pares de lenguas.
Machine Learning - APIs
Jobs API . Gestiona tu portal de empleo con
Cloud ML.
Cloud Video Intelligence API. Analiza y
extrae información de tus videos.
Referencias para estar al día
Google Cloud Platform Blog
Google Cloud Platform Web
GCP Twitter
Google + GCP Community
GCP Podcast
Google Cloud Platform Canal de Youtube
Ejemplos de uso
When art meets big data: Analyzing 200,000 items from The Met
collection in BigQuery
Today we’re adding a new public dataset to
Google BigQuery: over 200,000 items from The
Metropolitan Museum of Art (aka “The Met”),
representing all its public domain art from a
total of 1.5 million art objects. The Met Museum
Public Domain dataset includes metadata about
each piece of art, along with an image or
images of the artifact. Google and The Met
Museum have been close collaborators for
years through Google Arts & Culture and we’re
incredibly excited to bring the museum's public
dataset to BigQuery.
Ejemplos de uso
Traveloka’s journey to stream analytics on Google Cloud Platform
Traveloka is a travel technology company based
in Jakarta, Indonesia, currently operating in six
countries. Founded in 2012 by former Silicon
Valley engineers, its goal is to revolutionize
human mobility.
One of the most strategic parts of our business
is a streaming data processing pipeline that
powers a number of use cases, including fraud
detection, personalization, ads optimization,
cross selling, A/B testing, and promotion
eligibility. That pipeline is also used by our
business analysts for monitoring and
understanding business metrics, both for
historical analysis and in real time.
Ejemplos de uso
Getting Your Feet Wet in the Data Lake: Analytics 360 in BigQuery
Benefits for Data Engineers, Analysts and
Marketers
As a Big Data platform, BigQuery offers benefits
for multiple stages and roles in the Big Data
process:
For marketers and analysts, you can run ad hoc
queries and get the results within minutes or
seconds. The elusive quest for understanding
online and offline attribution, user funnels, and
long-term customer value comes within reach.
For data engineers, BigQuery offers a
tremendous operational benefit, as outlined in
the next section.
Ejemplos de uso
How WePay uses stream analytics for real-time fraud detection
using GCP and Apache Kafka
When payments platform WePay was founded in 2008,
MySQL was our only backend storage. It served its purpose
well when data volume and traffic throughput were relatively
low, but by 2016, our business was growing rapidly and they
were growing along with it. Consequently, we started to see
performance degradation to the point where we could no
longer run concurrent queries without a negative impact on
latency.
Clearly, we needed a new stream analytics pipeline for fraud
detection that would give us answers to queries in near-real
time without affecting our main transactional business
system. In this post, I’ll explain how we built and deployed
such a pipeline to production using Apache Kafka and
Google Cloud Platform (GCP) services like Google Cloud
Dataflow and Cloud Bigtable.
¿ Preguntas ?
Ismael Yuste
linkedin.com/in/ismaelyuste/
@IsmaelYuste

More Related Content

PDF
Google на конференции Big Data Russia
rusbase.vc
 
PDF
Google cloud big data summit master gcp big data summit la - 10-20-2015
Raj Babu
 
PPTX
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
Infochimps, a CSC Big Data Business
 
PDF
Make your data talk
Data Driven Innovation
 
PDF
Deep Learning Image Processing Applications in the Enterprise
Ganesan Narayanasamy
 
PDF
Data analysis trend 2015 2016 v071
Chun Myung Kyu
 
PDF
Enabling the Bank of the Future by Ignacio Bernal
Big Data Spain
 
PDF
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Chris Jang
 
Google на конференции Big Data Russia
rusbase.vc
 
Google cloud big data summit master gcp big data summit la - 10-20-2015
Raj Babu
 
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
Infochimps, a CSC Big Data Business
 
Make your data talk
Data Driven Innovation
 
Deep Learning Image Processing Applications in the Enterprise
Ganesan Narayanasamy
 
Data analysis trend 2015 2016 v071
Chun Myung Kyu
 
Enabling the Bank of the Future by Ignacio Bernal
Big Data Spain
 
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Chris Jang
 

What's hot (20)

PPTX
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
Sri Ambati
 
PPTX
Infochimps + CloudCon: Infinite Monkey Theorem
Infochimps, a CSC Big Data Business
 
PDF
Critical Breakthroughs and Challenges in Big Data and Analytics
Data Driven Innovation
 
PPTX
Unlocking Operational Intelligence from the Data Lake
MongoDB
 
PDF
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
Big Data Spain
 
PDF
Big Data Paris - A Modern Enterprise Architecture
MongoDB
 
PDF
QCon 2018 | Gimel | PayPal's Analytic Platform
Deepak Chandramouli
 
PDF
Next generation Polyglot Architectures using Neo4j by Stefan Kolmar
Big Data Spain
 
PDF
WJAX 2013 Slides online: Big Data beyond Apache Hadoop - How to integrate ALL...
Kai Wähner
 
PDF
Snowflakes in the Cloud Real world experience on a new approach for Big Data
DevFest DC
 
PDF
IoT at Google Scale
James Chittenden
 
PDF
Advanced data science algorithms applied to scalable stream processing by Dav...
Big Data Spain
 
PDF
Building Identity Graph at Scale for Programmatic Media Buying Using Apache S...
Databricks
 
PPTX
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
Infochimps, a CSC Big Data Business
 
PDF
Google Cloud Machine Learning
India Quotient
 
PDF
Achieving Business Value by Fusing Hadoop and Corporate Data
Inside Analysis
 
PPTX
Eric Andersen Keynote
Data Con LA
 
PDF
Single View of Well, Production and Assets
John Archer
 
PDF
02 a holistic approach to big data
Raul Chong
 
PPTX
50 Shades of Data - Dutch Oracle Architects Platform (February 2018)
Lucas Jellema
 
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
Sri Ambati
 
Infochimps + CloudCon: Infinite Monkey Theorem
Infochimps, a CSC Big Data Business
 
Critical Breakthroughs and Challenges in Big Data and Analytics
Data Driven Innovation
 
Unlocking Operational Intelligence from the Data Lake
MongoDB
 
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
Big Data Spain
 
Big Data Paris - A Modern Enterprise Architecture
MongoDB
 
QCon 2018 | Gimel | PayPal's Analytic Platform
Deepak Chandramouli
 
Next generation Polyglot Architectures using Neo4j by Stefan Kolmar
Big Data Spain
 
WJAX 2013 Slides online: Big Data beyond Apache Hadoop - How to integrate ALL...
Kai Wähner
 
Snowflakes in the Cloud Real world experience on a new approach for Big Data
DevFest DC
 
IoT at Google Scale
James Chittenden
 
Advanced data science algorithms applied to scalable stream processing by Dav...
Big Data Spain
 
Building Identity Graph at Scale for Programmatic Media Buying Using Apache S...
Databricks
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
Infochimps, a CSC Big Data Business
 
Google Cloud Machine Learning
India Quotient
 
Achieving Business Value by Fusing Hadoop and Corporate Data
Inside Analysis
 
Eric Andersen Keynote
Data Con LA
 
Single View of Well, Production and Assets
John Archer
 
02 a holistic approach to big data
Raul Chong
 
50 Shades of Data - Dutch Oracle Architects Platform (February 2018)
Lucas Jellema
 
Ad

Similar to Modern Thinking área digital MSKM 21/09/2017 (20)

PPTX
Microsoft cloud big data strategy
James Serra
 
PDF
Get Started Quickly with IBM's Hadoop as a Service
IBM Cloud Data Services
 
PDF
Running Data Platforms Like Products
VMware Tanzu
 
PPTX
Big Data Platform and Architecture Recommendation
Sofyan Hadi AHmad
 
PDF
Entrepreneurship Tips With HTML5 & App Engine Startup Weekend (June 2012)
Ido Green
 
PDF
Big Data Companies and Apache Software
Bob Marcus
 
PDF
Connecta Event: Big Query och dataanalys med Google Cloud Platform
ConnectaDigital
 
PDF
Google Cloud Data Platform - Why Google for Data Analysis?
Andreas Raible
 
DOCX
Google Cloud Platform.docx
GCP Masters
 
PPTX
GCP Data Engineering Online Training in Hyderabad - GCP.pptx
sivavisualpath
 
PPTX
Top Trends in Building Data Lakes for Machine Learning and AI
Holden Ackerman
 
PDF
GEN AI EDM -Generative AI: Beyond Chatbots, Shaping the Future
akhilkhandelwal30
 
PDF
ICP for Data- Enterprise platform for AI, ML and Data Science
Karan Sachdeva
 
PDF
The Big Picture on Big Data and Cognos
Senturus
 
PDF
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
GoDataDriven
 
PDF
Ml ops on AWS
PhilipBasford
 
PPTX
Using Visualization to Succeed with Big Data
Pactera_US
 
PPTX
IBM Smarter Analytics
Adrian Turcu
 
PDF
Applying BigQuery ML on e-commerce data analytics
Márton Kodok
 
PPTX
Unlock Innovation with AWS Generative AI: Transform Your Business with Scalab...
Akhil Khandelwal
 
Microsoft cloud big data strategy
James Serra
 
Get Started Quickly with IBM's Hadoop as a Service
IBM Cloud Data Services
 
Running Data Platforms Like Products
VMware Tanzu
 
Big Data Platform and Architecture Recommendation
Sofyan Hadi AHmad
 
Entrepreneurship Tips With HTML5 & App Engine Startup Weekend (June 2012)
Ido Green
 
Big Data Companies and Apache Software
Bob Marcus
 
Connecta Event: Big Query och dataanalys med Google Cloud Platform
ConnectaDigital
 
Google Cloud Data Platform - Why Google for Data Analysis?
Andreas Raible
 
Google Cloud Platform.docx
GCP Masters
 
GCP Data Engineering Online Training in Hyderabad - GCP.pptx
sivavisualpath
 
Top Trends in Building Data Lakes for Machine Learning and AI
Holden Ackerman
 
GEN AI EDM -Generative AI: Beyond Chatbots, Shaping the Future
akhilkhandelwal30
 
ICP for Data- Enterprise platform for AI, ML and Data Science
Karan Sachdeva
 
The Big Picture on Big Data and Cognos
Senturus
 
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
GoDataDriven
 
Ml ops on AWS
PhilipBasford
 
Using Visualization to Succeed with Big Data
Pactera_US
 
IBM Smarter Analytics
Adrian Turcu
 
Applying BigQuery ML on e-commerce data analytics
Márton Kodok
 
Unlock Innovation with AWS Generative AI: Transform Your Business with Scalab...
Akhil Khandelwal
 
Ad

Recently uploaded (20)

PDF
AI, Algorithms & Authority: Building Magnetic Brands in 2025's Digital Battle...
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 
PDF
The Dollar a Day Strategy: Your Hidden SEO Weapon - Dennis Yu, BlitzMetrics
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 
PDF
Why Digital Marketing is the Future of Business Growth
Elysium Aviation Academy
 
PDF
Becoming a Better You: How to Discover a Better Version of Yourself - Jamie T...
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 
PDF
Rebranding Social Media: Hello, Scroll Media by Saleh Lzeik
Saleh Lzeik
 
PDF
How to Maximise Social Media Benefits with Digitale.pdf
Digitale
 
PDF
We Help You Turn Every Click Into a Customer With Data-Driven Digital Marketing
ignitemarketing
 
PDF
Digital Marketing Landscape for Beginners (Nepali Case Study Included) – Day ...
Dipendra Poudel | The Digital Dipendra
 
PDF
SUPERMETRIC Design Agency - Capabilities Deck
Olaf Kreitz
 
PPTX
What Branding looks like, by: Cayancela Sánchez Jairo
Jairo Cayancela Sánchez
 
PDF
The Early-Stage Growth Hack You’re Overlooking.pdf
AminaSeigell
 
PDF
Join our community of inspired thinkers today!
Rich Vibes Publication
 
PPTX
Agriculture marketing trade and price list
thegreatprettyvprobr
 
PDF
How AI is Reshaping SEO: Trends, Predictions, and Opportunities for Marketers
Fractl - Content Marketing Agency
 
PDF
The AI-Powered Paid Media Playbook | Amsive Webinar
Amsive
 
PPTX
How Digital Marketing Transformed Local Businesses in India
GTB Infotech
 
PDF
Top OBD Service Providers in Delhi NCR for Automated Business Calls
Mishtel Services Private Limited
 
PDF
Emotional Intelligence in AI: The New Marketing Superpower - Jennifer Jones-M...
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 
PDF
Owning the Outcome When You Don’t Own the Click: A New Look at Zero-Click Mar...
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 
PPTX
Agriculture marketing trade and price list
thegreatprettyvprobr
 
AI, Algorithms & Authority: Building Magnetic Brands in 2025's Digital Battle...
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 
The Dollar a Day Strategy: Your Hidden SEO Weapon - Dennis Yu, BlitzMetrics
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 
Why Digital Marketing is the Future of Business Growth
Elysium Aviation Academy
 
Becoming a Better You: How to Discover a Better Version of Yourself - Jamie T...
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 
Rebranding Social Media: Hello, Scroll Media by Saleh Lzeik
Saleh Lzeik
 
How to Maximise Social Media Benefits with Digitale.pdf
Digitale
 
We Help You Turn Every Click Into a Customer With Data-Driven Digital Marketing
ignitemarketing
 
Digital Marketing Landscape for Beginners (Nepali Case Study Included) – Day ...
Dipendra Poudel | The Digital Dipendra
 
SUPERMETRIC Design Agency - Capabilities Deck
Olaf Kreitz
 
What Branding looks like, by: Cayancela Sánchez Jairo
Jairo Cayancela Sánchez
 
The Early-Stage Growth Hack You’re Overlooking.pdf
AminaSeigell
 
Join our community of inspired thinkers today!
Rich Vibes Publication
 
Agriculture marketing trade and price list
thegreatprettyvprobr
 
How AI is Reshaping SEO: Trends, Predictions, and Opportunities for Marketers
Fractl - Content Marketing Agency
 
The AI-Powered Paid Media Playbook | Amsive Webinar
Amsive
 
How Digital Marketing Transformed Local Businesses in India
GTB Infotech
 
Top OBD Service Providers in Delhi NCR for Automated Business Calls
Mishtel Services Private Limited
 
Emotional Intelligence in AI: The New Marketing Superpower - Jennifer Jones-M...
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 
Owning the Outcome When You Don’t Own the Click: A New Look at Zero-Click Mar...
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 
Agriculture marketing trade and price list
thegreatprettyvprobr
 

Modern Thinking área digital MSKM 21/09/2017

  • 1. Introducción a las soluciones Big Data de Google Ismael Yuste Strategic Cloud Engineer Google Cloud MSMK Madrid, 21 de Septiembre de 2017
  • 2. Agenda ● Google Cloud Platform ● BigData ● Machine Learning ● Use Cases
  • 4. Google Data Centers Los centros de datos de Google son la base de toda la plataforma de Google Cloud. Ofrecen poder computación, almacenamiento, memoria, GPUs para nuestras aplicaciones. Además, alberga el corazón de aplicaciones como Gmail, Youtube, Search... ● Rapidez ● Baja latencia ● Eficiencia de operaciones ● Eficiencia Energética ● Uso de Energías Renovables ● Cercanía al usuario ● Seguridad de la Información
  • 5. Google Datacenters - Cloud Regions
  • 6. Big Data Soluciones de Big Data integradas de principio a fin, que permite capturar los datos, procesarlos y almacenarlos en una plataforma integrada. Combina servicios nativos en la nube y herramientas Open Source gestionadas, tanto en tiempo real como por lotes. Big Data BigQuery Cloud Dataflow Cloud Dataproc Cloud Datalab Cloud Pub/Sub Genomics
  • 7. Big Data - Big Query Tu almacén de datos corporativo, rápido, económico y completamente gestionado para análisis de grandes grupos de datos ● Ingestión de datos flexible. ● Disponibilidad global. ● Seguridad y permisos integrados. ● Control de coste. ● Altamente disponible. ● Completamente integrado. ● Conecta con otros productos de Google.
  • 8. Big Data - Cloud Dataflow Servicio completamente gestionado y modelo de programación para el proceso de Big Data ● Gestión de Recursos integrado. ● A demanda. ● Ejecución de los trabajos inteligente. ● Auto escalado. ● Modelo de programación unificado. ● Open Source. ● Monitorizaje. ● Integración. ● Procesado confiable y consistente.
  • 9. Big Data - Cloud Dataproc Servicio gestionado Spark y Hadoop ● Gestión de Cluster integrado. ● Cluster dimensionables. ● Integración. ● Versionado. ● Herramientas de Gestión. ● Acciones de inicialización. ● Gestión manual o automática. ● Máquinas Virtuales flexibles.
  • 10. Big Data Datalab. Herramienta de exploración, análisis y visualización de Big Data. Pub/Sub. Servicio global en tiempo real para gestión de mensajes y streaming de datos.
  • 11. Big Data Dataprep. Servicio de datos inteligente que permite explorar, limpiar y preparar datos estructurados o no para su posterior análisis. Data Studio. Convierte tus datos en informes y cuadros de mando que son sencillos de crear, de compartir, y totalmente personalizables, desde fuentes de datos como Bigquery, Analytics o Youtube.
  • 12. Data Lifecycle Steps Ingest The first stage is to pull in the raw data, such as streaming data from devices, on-premises batch data, application logs, or mobile-app user events and analytics. Store After the data has been retrieved, it needs to be stored in a format that is durable and can be easily accessed. Process & Analyze In this stage, the data is transformed from raw form into actionable information. Explore & Visualize The final stage is to convert the results of the analysis into a format that is easy to draw insights from and to share with colleagues and peers.
  • 13. © 2017 Google Inc. All rights reserved. Ingestion Storage Process & Analyze Cloud Pub/Sub Stackdriver Logging Cloud Transfer Service Cloud Storage Cloud SQL Cloud Datastore Cloud BigTable BigQuery Cloud Dataflow Cloud Dataproc BigQuery Cloud Console Google Data Studio Google Sheets Cloud Datalab BI/Analytics Partners Cloud Spanner Explore & Visualize Products to Support Data Lifecycle
  • 14. Typical Big Data Jobs Programming Resource provisioning Performance tuning Monitoring Reliability Deployment & configuration Handling growing scale Utilization improvements
  • 15. Big Data with Google Focus on insights. Not infrastructure. From batch to real-time. Programming Understanding
  • 16. Data & Analytics Cloud Dataproc Fully managed Hadoop and Spark with industry-leading performance BigQuery Fully managed data warehouse for large-scale analytics Cloud Dataflow Real-time data pipelines, with open source SDK via Apache Beam
  • 17. Separation of Storage and Compute ● Access any storage system from any processing tool ● Keep as much data as you want, economically ● Share data in place, no more FTP and copying Storage Processing BigQuery Storage (tables) BigQuery Analytics Cloud Bigtable (NoSQL) Cloud Dataproc Cloud Storage (files) Cloud Dataflow
  • 18. 10+ years of Big Data innovation - Open Source Google Papers 20082002 2004 2006 2010 2012 2014 2015 GFS Map Reduce Flume Java Millwheel Open Source 2005 Google Cloud Products BigQuery Pub/Sub Dataflow Bigtable BigTable Dremel PubSub Tensorflow Dataflow Apache Beam(Incubating)
  • 20. Machine Learning Google Cloud ML Platform facilita servicios modernos de machine learning, con modelos pre-entrenados y un servicio para generar tus propios modelos. Machine Learning Cloud Machine Learning Vision API Speech API Natural Language API Translation API Jobs API
  • 21. Machine Learning - Cloud ML Machine learning sobre cualquier tipo y volumen de datos ● Predicción a escala. ● Construcción de modelos sencilla. ● Capacidades de Aprendizaje Profundo (Deep Learning). ● Integración. ● HyperTune. ● Servicio gestionado y escalable. ● Modelos portables.
  • 22. Machine Learning - APIs Vision API . Analiza imágenes con el poder de Google. Speech API. Convierte conversaciones a texto con el poder de la nube.
  • 23. Machine Learning - APIs Natural Language API . Saca conclusiones de texto desestructurado con Cloud ML. Translation API. Traduce sobre la marcha entre miles de pares de lenguas.
  • 24. Machine Learning - APIs Jobs API . Gestiona tu portal de empleo con Cloud ML. Cloud Video Intelligence API. Analiza y extrae información de tus videos.
  • 25. Referencias para estar al día Google Cloud Platform Blog Google Cloud Platform Web GCP Twitter Google + GCP Community GCP Podcast Google Cloud Platform Canal de Youtube
  • 26. Ejemplos de uso When art meets big data: Analyzing 200,000 items from The Met collection in BigQuery Today we’re adding a new public dataset to Google BigQuery: over 200,000 items from The Metropolitan Museum of Art (aka “The Met”), representing all its public domain art from a total of 1.5 million art objects. The Met Museum Public Domain dataset includes metadata about each piece of art, along with an image or images of the artifact. Google and The Met Museum have been close collaborators for years through Google Arts & Culture and we’re incredibly excited to bring the museum's public dataset to BigQuery.
  • 27. Ejemplos de uso Traveloka’s journey to stream analytics on Google Cloud Platform Traveloka is a travel technology company based in Jakarta, Indonesia, currently operating in six countries. Founded in 2012 by former Silicon Valley engineers, its goal is to revolutionize human mobility. One of the most strategic parts of our business is a streaming data processing pipeline that powers a number of use cases, including fraud detection, personalization, ads optimization, cross selling, A/B testing, and promotion eligibility. That pipeline is also used by our business analysts for monitoring and understanding business metrics, both for historical analysis and in real time.
  • 28. Ejemplos de uso Getting Your Feet Wet in the Data Lake: Analytics 360 in BigQuery Benefits for Data Engineers, Analysts and Marketers As a Big Data platform, BigQuery offers benefits for multiple stages and roles in the Big Data process: For marketers and analysts, you can run ad hoc queries and get the results within minutes or seconds. The elusive quest for understanding online and offline attribution, user funnels, and long-term customer value comes within reach. For data engineers, BigQuery offers a tremendous operational benefit, as outlined in the next section.
  • 29. Ejemplos de uso How WePay uses stream analytics for real-time fraud detection using GCP and Apache Kafka When payments platform WePay was founded in 2008, MySQL was our only backend storage. It served its purpose well when data volume and traffic throughput were relatively low, but by 2016, our business was growing rapidly and they were growing along with it. Consequently, we started to see performance degradation to the point where we could no longer run concurrent queries without a negative impact on latency. Clearly, we needed a new stream analytics pipeline for fraud detection that would give us answers to queries in near-real time without affecting our main transactional business system. In this post, I’ll explain how we built and deployed such a pipeline to production using Apache Kafka and Google Cloud Platform (GCP) services like Google Cloud Dataflow and Cloud Bigtable.
  • 30. ¿ Preguntas ? Ismael Yuste linkedin.com/in/ismaelyuste/ @IsmaelYuste