Big Data at Globant
Success Cases in AWS
Sabina A. Schneider
What is Big Data?
What is Data Science?
Data Architecture                  Enterprise                  High
                                  Information               Availability
                                    Strategy                   and
                                                           Performance
                     NoSQL
                    Distributed                 Mission
                    Solutions                   Critical




                        Product Positioning in the Market

                    Deeper insight about your Customers

                            Analytics and Alerts on KPIs

                Cross-reference data with different sources
Core Technologies
BigData Ecosystem
Scalable Architecture in the Cloud

 Mobile Devices in
     the cars

                                                                                                                            Third Party
                                                   Web App         Web App              Web App
                                                                                                                            Integration


                     Elastic Load
  Mobile Devices      Balancer
                                                                Auto scaling singly




   Web Client

                                    NoSQL DB   S3 Bucket    Cloudfront    EMR Cluster               Storm
                                                                                                  Real Time
                                                                                                  processing


                                                       Hadoop

                                                                                                   Analytics
                                                                                                   Dashboard

                                                       Trends                                                  Web Client
                                                                         Pig

                                               BigData – storage and processing
Metamarkets                 has
developed a web-based
analytics     console       that
supports drill-downs and
roll-ups of high dimensional
data      sets       (real-time
bidding), comprising billions
of events, in real-time.

Data store collects 10 GB
of information every day,
and has over 15 TB.

Reports using Hadoop and
Hive on AWS Infrastructure.

The 40-instance cluster can
scan, filter, and aggregate 1
billion    rows     in   950
milliseconds.
Gree is a leading
casual           game
development
company.
Globant developed a
Hadoop           based
architecture to store
gaming events and
generate     telemetry
information.     These
metrics are used to
analyze,      segment
gamer          profiles,
estimate revenue and
perform      predictive
analysis on game
performance.
Products Positioning
in the Market
• Tweets recollection on
specific events (eg:
elections), integrated
with a set of
MapReduce based
queries

• Data stored in a 20-
node Hadoop cluster


• Google Visualization
tools for widget based
Dashboard
What?
• Innovation to the Financial Market
• Sentiment Analytics to what’s happening now and what can happen next in the
Market
• Predictions one week in advance according to comments on Tweeter


Challenges
• Aggresive Real Time analysis on Social Networks
• Dashboarding comparing with real values from Yahoo Finances
• Sentiment Analysis and Languague filtering
• Analytics Predictions
Data Science
                                  Recommend
                                     ation             Classification

               Sophisticated
               Mathematical
                algorithm

                                         Statistical
                                                                    Clustering
                                         Algorithm




                                Predictions on KPIs

                               Predictions on Metrics
Moneygram Transaction Scoring
Analysis of Moneygram historical transactional data labeled as Fraudulent/Non Fraudulent

     • 8 years of transactional data to analyze

Training using Support Vector Machines of historical data

     • Classification achieved by using only a subset of data using soft margins (by use of slack
     variables) to construct dividing hyperplane
     • Possible use of kernel principal components to preprocess data and reduce dimensionality of
     training dataset
     • Avoid high computation times (sparse solution)

Benefits
    • Detect fraudulent transactions with a higher level of accuracy
    • Increase in customer service satisfaction (less false-positives)
Shopping cart suggestion engine
Generate suggestions based on client shopping history

• Cluster a large dataset representing clients' shopping history using
unsupervised learning algorithms.

• Use information from new/existing client to classify into the clusterized
shopping history from ALL clients.

• Generate suggestions based on the cluster's shopping preferences

• Use of Hadoop and Mahout for clustering and posterior classification
•   Metadata word clustering using Solr

•   Content management and information sorting/ categorization classified by location.
    Enhance the performance at a view level.

•   Indexing of jwt content coming from different sources (internal and external) developed
    with Solr on Lucene. Integration with myJwt.com: internal social network.

      •   organize the content storage: service running in the Cloud that receives content,
          generate different assets (snapshot, thumbnails), extract metadata to be
          centralized in one place
      •   myIdeas: collect ideas from different creative designers from different location
          and share a bonus between the bright ideas
Data Visualization
                     Our data visualization practice allows our customers to understand
                     the evolution of key business drivers, trends, and drill down into the
                     root causes of deviations.

                     Our HTML5 data visualization solution, allows us to combine the
                     flexibility of a custom made solution with a fast time to market. It’s
                     based in standard Widgets, allowing each user to customize the
                     dashboard as required, and visualize it on every device.
Big Data Visualization Framework
Cloud server                     Browser
                 User input

               Video streaming
Globant and Big Data on AWS
Kantar Media manages TV Advertisement displayed on DirecTV US.
We developed the addressable advertisement reporting solution, used by advertisers to plan and analyze the
performance of addressable advertisement.
Advertisement displayed on TV is customized to each user profile. The solution allows obtaining reliable
measurements from TV, analyzes the structure of the audience that has watched each advertisement, and
allows evaluating the ROI of the marketing campaign.
Globant and Big Data on AWS
Touch screen based
scorecard, used by
the top management
to analyze and
compare results from
different countries
and products.
Thank you!

More Related Content

PDF
Big data on aws
PPT
Survey of Real-time Processing Systems for Big Data
PPTX
Apresentação do workshop Notícias do Front: O que há de novo no mundo da inov...
PPTX
Inspiring Report Março - Globant Brasil
PPTX
Globant presentation
PDF
Webinar iBeacon Globant April 2014
PDF
Encontro Redes e Negócios
PDF
2015 AWS IoT HACK DAY
Big data on aws
Survey of Real-time Processing Systems for Big Data
Apresentação do workshop Notícias do Front: O que há de novo no mundo da inov...
Inspiring Report Março - Globant Brasil
Globant presentation
Webinar iBeacon Globant April 2014
Encontro Redes e Negócios
2015 AWS IoT HACK DAY

Similar to Globant and Big Data on AWS (20)

PDF
Barak regev
PPTX
Evolving analytics at ebay - 2012 Tableau Customer Conference
PPTX
Anexinet Big Data Solutions
PPTX
Unlocking Operational Intelligence from the Data Lake
PDF
Big Data Companies and Apache Software
PDF
Big Data Paris - A Modern Enterprise Architecture
PPTX
Big Data Expo 2015 - Pentaho The Future of Analytics
PDF
Denodo DataFest 2017: Lowering IT Costs with Big Data and Cloud Modernization
PDF
Denodo Datafest 2017 London Tekin Mentes Logitech
PDF
Big Data Expo 2015 - Talend Delivering Real Time
PDF
Next-Gen Cloud Analytics with AWS, Big Data and Data Virtualization
PDF
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
PPTX
StreamCentral Technical Overview
PPTX
Introduction to Big Data using AWS Services
PDF
MindSphere: The cloud-based, open IoT operating system. Damiano Manocchia
PDF
Modern Thinking área digital MSKM 21/09/2017
PPTX
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
PDF
Big Data: Its Characteristics And Architecture Capabilities
PDF
Bringing the Power of Big Data Computation to Salesforce
PDF
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Barak regev
Evolving analytics at ebay - 2012 Tableau Customer Conference
Anexinet Big Data Solutions
Unlocking Operational Intelligence from the Data Lake
Big Data Companies and Apache Software
Big Data Paris - A Modern Enterprise Architecture
Big Data Expo 2015 - Pentaho The Future of Analytics
Denodo DataFest 2017: Lowering IT Costs with Big Data and Cloud Modernization
Denodo Datafest 2017 London Tekin Mentes Logitech
Big Data Expo 2015 - Talend Delivering Real Time
Next-Gen Cloud Analytics with AWS, Big Data and Data Virtualization
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
StreamCentral Technical Overview
Introduction to Big Data using AWS Services
MindSphere: The cloud-based, open IoT operating system. Damiano Manocchia
Modern Thinking área digital MSKM 21/09/2017
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Big Data: Its Characteristics And Architecture Capabilities
Bringing the Power of Big Data Computation to Salesforce
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Ad

More from Amazon Web Services LATAM (20)

PPTX
AWS para terceiro setor - Sessão 1 - Introdução à nuvem
PPTX
AWS para terceiro setor - Sessão 2 - Armazenamento e Backup
PPTX
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
PPTX
AWS para terceiro setor - Sessão 1 - Introdução à nuvem
PPTX
AWS para terceiro setor - Sessão 2 - Armazenamento e Backup
PPTX
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
PPTX
Automatice el proceso de entrega con CI/CD en AWS
PPTX
Automatize seu processo de entrega de software com CI/CD na AWS
PPTX
Cómo empezar con Amazon EKS
PPTX
Como começar com Amazon EKS
PPTX
Ransomware: como recuperar os seus dados na nuvem AWS
PPTX
Ransomware: cómo recuperar sus datos en la nube de AWS
PPTX
Ransomware: Estratégias de Mitigação
PPTX
Ransomware: Estratégias de Mitigación
PPTX
Aprenda a migrar y transferir datos al usar la nube de AWS
PPTX
Aprenda como migrar e transferir dados ao utilizar a nuvem da AWS
PPTX
Cómo mover a un almacenamiento de archivos administrados
PPTX
Simplifique su BI con AWS
PPTX
Simplifique o seu BI com a AWS
PPTX
Os benefícios de migrar seus workloads de Big Data para a AWS
AWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
Automatice el proceso de entrega con CI/CD en AWS
Automatize seu processo de entrega de software com CI/CD na AWS
Cómo empezar con Amazon EKS
Como começar com Amazon EKS
Ransomware: como recuperar os seus dados na nuvem AWS
Ransomware: cómo recuperar sus datos en la nube de AWS
Ransomware: Estratégias de Mitigação
Ransomware: Estratégias de Mitigación
Aprenda a migrar y transferir datos al usar la nube de AWS
Aprenda como migrar e transferir dados ao utilizar a nuvem da AWS
Cómo mover a un almacenamiento de archivos administrados
Simplifique su BI con AWS
Simplifique o seu BI com a AWS
Os benefícios de migrar seus workloads de Big Data para a AWS
Ad

Recently uploaded (20)

PDF
CXOs-Are-you-still-doing-manual-DevOps-in-the-age-of-AI.pdf
PDF
Convolutional neural network based encoder-decoder for efficient real-time ob...
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PDF
Comparative analysis of machine learning models for fake news detection in so...
PDF
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
PDF
Dell Pro Micro: Speed customer interactions, patient processing, and learning...
PDF
Early detection and classification of bone marrow changes in lumbar vertebrae...
PPTX
Custom Battery Pack Design Considerations for Performance and Safety
PPTX
TEXTILE technology diploma scope and career opportunities
PPTX
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
PDF
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
PDF
UiPath Agentic Automation session 1: RPA to Agents
PPTX
Build Your First AI Agent with UiPath.pptx
PPTX
Microsoft Excel 365/2024 Beginner's training
PPTX
Training Program for knowledge in solar cell and solar industry
PDF
4 layer Arch & Reference Arch of IoT.pdf
PDF
Transform-Your-Supply-Chain-with-AI-Driven-Quality-Engineering.pdf
PDF
Five Habits of High-Impact Board Members
PDF
giants, standing on the shoulders of - by Daniel Stenberg
PDF
STKI Israel Market Study 2025 version august
CXOs-Are-you-still-doing-manual-DevOps-in-the-age-of-AI.pdf
Convolutional neural network based encoder-decoder for efficient real-time ob...
NewMind AI Weekly Chronicles – August ’25 Week III
Comparative analysis of machine learning models for fake news detection in so...
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
Dell Pro Micro: Speed customer interactions, patient processing, and learning...
Early detection and classification of bone marrow changes in lumbar vertebrae...
Custom Battery Pack Design Considerations for Performance and Safety
TEXTILE technology diploma scope and career opportunities
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
UiPath Agentic Automation session 1: RPA to Agents
Build Your First AI Agent with UiPath.pptx
Microsoft Excel 365/2024 Beginner's training
Training Program for knowledge in solar cell and solar industry
4 layer Arch & Reference Arch of IoT.pdf
Transform-Your-Supply-Chain-with-AI-Driven-Quality-Engineering.pdf
Five Habits of High-Impact Board Members
giants, standing on the shoulders of - by Daniel Stenberg
STKI Israel Market Study 2025 version august

Globant and Big Data on AWS

  • 1. Big Data at Globant Success Cases in AWS Sabina A. Schneider
  • 2. What is Big Data?
  • 3. What is Data Science?
  • 4. Data Architecture Enterprise High Information Availability Strategy and Performance NoSQL Distributed Mission Solutions Critical Product Positioning in the Market Deeper insight about your Customers Analytics and Alerts on KPIs Cross-reference data with different sources
  • 7. Scalable Architecture in the Cloud Mobile Devices in the cars Third Party Web App Web App Web App Integration Elastic Load Mobile Devices Balancer Auto scaling singly Web Client NoSQL DB S3 Bucket Cloudfront EMR Cluster Storm Real Time processing Hadoop Analytics Dashboard Trends Web Client Pig BigData – storage and processing
  • 8. Metamarkets has developed a web-based analytics console that supports drill-downs and roll-ups of high dimensional data sets (real-time bidding), comprising billions of events, in real-time. Data store collects 10 GB of information every day, and has over 15 TB. Reports using Hadoop and Hive on AWS Infrastructure. The 40-instance cluster can scan, filter, and aggregate 1 billion rows in 950 milliseconds.
  • 9. Gree is a leading casual game development company. Globant developed a Hadoop based architecture to store gaming events and generate telemetry information. These metrics are used to analyze, segment gamer profiles, estimate revenue and perform predictive analysis on game performance.
  • 10. Products Positioning in the Market • Tweets recollection on specific events (eg: elections), integrated with a set of MapReduce based queries • Data stored in a 20- node Hadoop cluster • Google Visualization tools for widget based Dashboard
  • 11. What? • Innovation to the Financial Market • Sentiment Analytics to what’s happening now and what can happen next in the Market • Predictions one week in advance according to comments on Tweeter Challenges • Aggresive Real Time analysis on Social Networks • Dashboarding comparing with real values from Yahoo Finances • Sentiment Analysis and Languague filtering • Analytics Predictions
  • 12. Data Science Recommend ation Classification Sophisticated Mathematical algorithm Statistical Clustering Algorithm Predictions on KPIs Predictions on Metrics
  • 13. Moneygram Transaction Scoring Analysis of Moneygram historical transactional data labeled as Fraudulent/Non Fraudulent • 8 years of transactional data to analyze Training using Support Vector Machines of historical data • Classification achieved by using only a subset of data using soft margins (by use of slack variables) to construct dividing hyperplane • Possible use of kernel principal components to preprocess data and reduce dimensionality of training dataset • Avoid high computation times (sparse solution) Benefits • Detect fraudulent transactions with a higher level of accuracy • Increase in customer service satisfaction (less false-positives)
  • 14. Shopping cart suggestion engine Generate suggestions based on client shopping history • Cluster a large dataset representing clients' shopping history using unsupervised learning algorithms. • Use information from new/existing client to classify into the clusterized shopping history from ALL clients. • Generate suggestions based on the cluster's shopping preferences • Use of Hadoop and Mahout for clustering and posterior classification
  • 15. Metadata word clustering using Solr • Content management and information sorting/ categorization classified by location. Enhance the performance at a view level. • Indexing of jwt content coming from different sources (internal and external) developed with Solr on Lucene. Integration with myJwt.com: internal social network. • organize the content storage: service running in the Cloud that receives content, generate different assets (snapshot, thumbnails), extract metadata to be centralized in one place • myIdeas: collect ideas from different creative designers from different location and share a bonus between the bright ideas
  • 16. Data Visualization Our data visualization practice allows our customers to understand the evolution of key business drivers, trends, and drill down into the root causes of deviations. Our HTML5 data visualization solution, allows us to combine the flexibility of a custom made solution with a fast time to market. It’s based in standard Widgets, allowing each user to customize the dashboard as required, and visualize it on every device.
  • 18. Cloud server Browser User input Video streaming
  • 20. Kantar Media manages TV Advertisement displayed on DirecTV US. We developed the addressable advertisement reporting solution, used by advertisers to plan and analyze the performance of addressable advertisement. Advertisement displayed on TV is customized to each user profile. The solution allows obtaining reliable measurements from TV, analyzes the structure of the audience that has watched each advertisement, and allows evaluating the ROI of the marketing campaign.
  • 22. Touch screen based scorecard, used by the top management to analyze and compare results from different countries and products.