SlideShare a Scribd company logo
Framework for Real time Analytics
By Mohsin Hakim
Real Time Analytics
Index
Introduction
Evolving BI and Analytics for Big Data
Impacts to Traditional BI Databases
Challenges
MongoDB with Hadoop
Case Studies
Current Scenario
Introduction
 Analytics falls along a spectrum. On one end of the spectrum sit batch analytical applications, which are
used for complex, long-running analyses. They tend to have slower response times (up to minutes, hours, or
days) and lower requirements for availability. Examples of batch analytics include Hadoop-based workloads
 On the other end of the spectrum sit real-time analytical applications, which provide lighter-weight
analytics very quickly. Latency is low (sub-second) and availability requirements are high (e.g., 99.99%).
MongoDB is typically used for real-time analytics. Example applications include:
Business Intelligence (BI) and analytics provides an essential set of technologies and processes
that organizations have relied upon over many years to guide strategic business decisions.
Introduction
1. Predictable Frequency. Data is extracted from source systems at regular intervals -
typically measured in days, months and quarters
2. Static Sources. Data is sourced from controlled, internal systems supporting established
and well-defined back-office processes
3. Fixed Models. Data structures are known and modeled in advance of analysis. This
enables the development of a single schema to accommodate data from all of the source
systems, but adds significant time to the upfront design
4. Defined Queries. Questions to be asked of the data (i.e., the reporting queries) are
pre-defined. If not all of the query requirements are known upfront, or requirements
change, then the schema has to be modified to accommodate changes
5. Slow-changing requirements. Rigorous change-control is enforced before the
introduction of new data sources or reporting requirements
6. Limited users. The consumers of BI reports are typically business managers and senior
executives
Evolving BI and Analytics for Big Data
Higher Uptime Requirements
The immediacy of real-time analytics accessed
from multiple fixed and mobile devices places
additional demands on the continuous availability
of BI systems.
Batch-based systems can often tolerate a certain
level of downtime, for example for scheduled
maintenance. Online systems on the other hand
need to maintain operations during both failures
and planned upgrades.
The Need for Speed & Scale
Time to value is everything. For example, having
access to real-time customer sentiment or
logistics tracking is of little benefit unless the data
can be analyzed and reported in real-time. As a
consequence, the frequency of data acquisition,
integration and analysis must increase from days
to minutes or less, placing significant operational
overhead on BI systems.
Agile Analytics and Reporting
With such a diversity of new data sources,
business analysts can not know all of the
questions they need to ask in advance.
Therefore an essential requirement is that
the data can be stored before knowing how
it will be processed and queried.
The Changing Face of Data
Data generated by such workloads as social,
mobile, sensor and logging, is much more
complex and variably structured than
traditional transaction data from back-office
systems such as ERP, CRM, PoS (Point of Sale)
and Accounts Receivable.
Taking BI to the Cloud
The drive to embrace cloud computing to
reduce costs and improve agility means BI
components that have traditionally relied on
databases deployed on monolithic, scale-up
systems have to be re-designed for the
elastic scale-out, service-oriented
architectures of cloud.
Impacts to Traditional BI Databases
The relational databases underpinning many of today’s traditional BI platforms are not well suited to the requirements of big
data:
• Semi-structured and unstructured data typical in mobile, social and sensor-driven applications cannot be efficiently
represented as rows and columns in a relational database table
• Rapid evolution of database schema to support new data sources and rapidly changing data structures is not
possible in relational databases, which rely on costly ALTER TABLE operations to add or modify table attributes
• Performance overhead of JOINs and transaction semantics prevents relational databases from keeping pace with the
ingestion of high-velocity data sources
• Quickly growing data volumes require scaling databases out across commodity hardware, rather than the scale-up
approach typical of most relational databases
Relational databases’ inability to
handle the speed, size and diversity
of rapidly changing data generated
by modern applications is already
driving the enterprise adoption of
NoSQL and Big Data technologies in
both operational and analytical
roles.
The purpose
• Flume in Hadoop, for batch processing, which make the data relevant time-wise; it can be used
for real time because it would be too fresh, only from several min to even a second late.
• Flume engine, using server side in order to make decisions regarding the current state of
affairs.
• Decisions Making are made based on whatever data is received from customers’ current
condition without all of the history in their user profiles, which would enable a much more
informed decision.
• State of Art Auto updating charting and report creation with Dashboard UI.
Increase scalability and performance of Organizations using Real
Time Analysis platform with a focus on storing, processing and
analyzing the exponentially growing data using big data
technologies.
Challenges
1. Getting data metrics to the right people
Often, social media is treated like the ugly stepchild within the marketing department and real-time
social media analytics are either absent or ignored.
2. Visualization
Visualizing real-time social media analytics is another key element involved in developing insights
that matter.
Simply displaying values graphically helps in making the kinds of fast interpretations necessary for
making decisions with real-time data, but adding more complex algorithms and using models
provides deeper insights, especially when visualized.
3. Unstructured data is challenging
Unlike the survey data firms are used to dealing with, most (IBM estimates 80%) is unstructured —
meaning it consists of words rather than numbers. And, text analytics lags seriously behind numeric
analysis.
4. Increasing signal to noise
Social media data is inherently noisy. Reducing noise to even detect signal is challenging — especially
in real time. Sure, with enough time, new analytics tools can ferret out the few meaningful
Top 10 Priorities
1 Enable new fast-paced business practices
2 Don’t expect the new stuff to replace the old stuff
3 Do not assume that all the data needs to be in real time, all the time
4 Correlate real-time data with data from other sources and latencies
5 Start with a proof of value with measurable outcomes
6 As a safe starter project, accelerate successful latent processes into near real time
7 Think about operationalizing analytics
8 Think about the skills you need
9 Examine application business rules to ensure they are ready for real-time data flows
10 Evaluate technology platforms and expertise for availability and reliability
Challenges
Real-Time Analytics is Hard
Can’t Stay Ahead. You need to account for
many types of data, including unstructured
and semi-structured data. And new sources
present themselves unpredictably.
Relational databases aren’t capable of
handling this, which leaves you hamstrung.
Can’t Scale. You need to analyze terabytes
or petabytes of data. You need sub-second
response times. That’s a lot more than a
single server can handle. Relational
databases weren’t designed for this
Batch. Batch processes are the right
approach for some jobs. But in many cases,
you need to analyze rapidly changing,
multi-structured data in real time. You
don’t have the luxury of lengthy ETL
processes to cleanse data for later.
MongoDB Makes it Easy
Do the Impossible. MongoDB can incorporate any
kind of data – any structure, any format, any
source – no matter how often it changes. Your
analytical engines can be comprehensive and real-
time.
Scale Big. MongoDB is built to scale out on
commodity hardware, in your data center or in the
cloud. And without complex hardware or extra
software. This shouldn’t be hard, and with
MongoDB, it isn’t.
Real Time. MongoDB can analyze data of any
structure directly within the database, giving you
results in real time, and without expensive data
warehouse loads.
Why Other Databases Fall Short and MangoDB
Most databases make you chose between a flexible data
model, low latency at scale, and powerful access. But
increasingly you need all three at the same time.
 Rigid Schemas. You should be able to analyze unstructured, semi-structured, and
polymorphic data. And it should be easy to add new data. But this data doesn’t
belong in relational rows and columns. Plus, relational schemas are hard to
change incrementally, especially without impacting performance or taking the
database offline.
 Scaling Problems. Relational databases were designed for single-server
configurations, not for horizontal scale-out. They were meant to serve 100s of ops
per second, not 100,000s of ops per second. Even with a lot of engineering hours,
custom sharding layers, and caches, scaling an RDBMS is hard at best and
impossible at worst.
 Takes Too Long. Analyzing data in real time requires a break from the familiar
ETL and data warehouse approach. You don’t have time for lengthy load
schedules, or to build new query models. You need to run aggregation queries
against variably structured data. And you should be able to do so in place, in real
time.
Organizations are using MongoDB for analytics because it
lets them store any kind of data, analyze it in real time,
and change the schema as they go.
New Data. MongoDB’s document model enables you to store and process data
of any structure: events, time series data, geospatial coordinates, text and
binary data, and anything else. You can adapt the structure of a document’s
schema just by adding new fields, making it simple to bring in new data as it
becomes available.
Horizontal Scalability. MongoDB’s automatic sharding distributes data across
fleets of commodity servers, with complete application transparency. With
multiple options for scaling – including range-based, hash-based and location-
aware sharding – MongoDB can support thousands of nodes, petabytes of
data, and hundreds of thousands of ops per second without requiring you to
build custom partitioning and caching layers.
Powerful Analytics, In Place, In Real Time. With rich index and query
support – including secondary, geospatial and text search indexes – as well as
the aggregation framework and native MapReduce, MongoDB can run complex
ad-hoc analytics and reporting in place.
MongoDB with Hadoop
MongoDB Hadoop
Ebay
User data and metadata
management for product
catalog
User analysis for personalized
search & recommendations
Orbitz
Management of hotel data
and pricing
Hotel segmentation to support
building search facets
Pearson
Student identity and access
control. Content
management of course
materials
Student analytics to create
adaptive learning programs
Foursquare
User data, check-ins,
reviews, venue content
management
User analysis, segmentation and
personalization
Tier 1
Investment
Bank
Tick data, quants analysis,
reference data distribution
Risk modeling, security and fraud
detection
Industrial
Machinery
Manufactur
er
Storage and real-time
analytics of sensor data
collected from connected
vehicles
Preventive maintenance
programs for fleet optimization.
In-field monitoring of vehicle
components for design
enhancements
SFR
Customer service applications
accessed via online portals
and call centers
Analysis of customer usage,
devices & pricing to optimize
plans
The following table provides examples of customers using MongoDB together with Hadoop to power big
data applications.
Whether improving customer service, supporting cross-sell and upsell, enhancing business efficiency or
reducing risk, MongoDB and Hadoop provide the foundation to operationalize big data.
Future Trends in Real-Time Data, BI, and
Analytics
Data types handled in real time today. Numerous TDWI surveys have shown that structured
data (which
includes relational data) is by far the most common class of data types handled for BI and
analytic purposes, as well as many operational and transactional ones. It’s no surprise that
structured data bubbled to the top of Figure 16. Other data types and sources commonly
handled in real time today include application logs (33%), event data (26%), semi-structured
data (26%), and hierarchical and raw data (24% each).
Data types to be handled in real time within three years. Looking ahead, a number of data
types are poised for greater real-time usage. Some are in limited use today but will
experience aggressive adoption within three years, namely social media data (38%), Web logs
and clickstreams (34%), and unstructured data (34%). Others are handled in real time today
and will become even more so, namely event (36%), semi-structured (33%), structured (31%),
and hierarchical (30%) data.
Case Studies
MongoDB Integration with BI and Analytics
Tools
To make online big data actionable through dashboards, reports,
visualizations and integration with other data sources, it must be
accessible to established BI and analytics tools. MongoDB offers integration
with more of the leading BI tools than any other NoSQL or online big data
technology, including:
Actuate Alteryx Informatica
Jaspersoft Logi Analytics MicroStrategy
Pentaho Qliktech SAP Lumira
WindyGrid’s
One person, one laptop, and MongoDB’s technology jumpstarted a project that, with
other people joining in, went from prototype to one of the nation’s pioneering projects
to analyze and act on municipal data in real time. In just four months.
WindyGrid put Chicago on the path of revolutionizing how it operates not by replacing
the administrative systems already in place, but by using MongoDB to bring that data
together into a new application. With MongoDB’s flexible data model, WindyGrid doesn’t
have to go back and redo the schema for each new piece of data. Instead, it can evolve
schemas in real time. Which is crucial as WindyGrid expands and adds predictive
analytics, growing by millions of pieces of structured and unstructured data each day.
Crittercism is A Mobile Pioneer
Crittercism doesn’t just monitor apps or gather information. Using MongoDB’s powerful built in
query functions, it analyzes avalanches of unstructured and non-uniform data in real time. It
recognizes patterns, identifies trends, and diagnoses problems. That means that Cirttercism’s
customers immediately understand the root cause of problems and the impact they’re having on
business. So they know how to prioritize and correct the problems they’re facing and improve
performance
The kind of real time analysis that Crittercism provides customers would also be impossible
with traditional databases. Crittercism is using MongoDB’s powerful query functions to
analyze the broad variety of data it collects, in real time, within the database. A more
traditional data warehouse approach, with ETLs and long loading times, can’t match this
type of speed.
At the same time, MongoDB lets Crittercism efficiently handle the tons of data it’s
collecting. During the past two years, the number of requests that Crittercism gathers and
analyzes has jumped from 700 to 45,000 per second. Relational databases have a hard time
scaling to meet these kinds of demands, typically requiring expensive add-on software, or
additional layers of proprietary code, to keep up. With MongoDB, horizontal scalability
across multiple data centers is a native function.
McAfee - Global Cybersecurity
GTI analyzes cyberthreats from all angles, identifying threat relationships, such as malware used in
network intrusions, websites hosting malware, botnet associations, and more. Threat information is
extremely time sensitive; knowing about a threat from weeks ago is useless.
In order to provide up to date, comprehensive threat information, needs to quickly process terabytes of
different data types (such as IP address or domain) into meaningful relationships:
e.g. Is this web site good or bad? What other sites have been interacting with it? The success of the cloud-based system also
depends on a bidirectional data flow: GTI gathers data from millions of client sensors and provides real-time intelligence
back to these end products, at a rate of 100 billion queries per month.
Was unable to address these needs and effectively scale out to millions of records with their existing solutions. For example,
the HBase / Hadoop setup made it difficult to run interesting, complex queries, and experienced bugs with the Java garbage
collector running out of memory. Another issue was with sharding and syncing;
Lucene was able to index in interesting ways, but required too much customization.
compensated for all the rebuilding and redeploying of Katta shards with “the usual scripting duct tape,” but what they really
needed was a solution that could seamlessly handle the sharding and updating on its own.
selected MongoDB, which had excellent documentation and a growing community that was “on fire.”
Power Journalism
BuzzFeed, the social news and entertainment company, relies on MongoDB to analyze all performance data
for its content across the social web. A core part of BuzzFeed’s publishing platform, MongoDB exposes
metrics to editors and writers in real time, to help them understand how its content is performing and to
optimize for the social web. The company has been using MongoDB since 2010. Here’s why.
1.Analytics provide more insight, more quickly. relies on MongoDB for its strategic analytics platform. With apps and
dashboards built on MongoDB, can pinpoint when content is viewed and how it is shared. With this approach, is able to quickly
gain insight on how its content performs, nimbly optimize user’s experience for posts that are performing best and is able to
deliver critical feedback to its writers and editors.
2.BuzzFeed is data-driven. At BuzzFeed, data drives decision-making and powers the company. MongoDB enables to
effectively analyze, track and expose a range of metrics to writers and employees. This includes: the number of clicks; how
often and where posts are being shared; which views on different social media properties lead to the most shares; and how
views differ across mobile and desktop.
3.Successful web journalism demands scale. processes large volumes of data and this is increasing each year as the site’s
traffic continues to grow. Originally built on a relational data store, decided to use MongoDB, a more scalable solution, to
collect and track the data they need with a richer functionality than a standard key-value store.
4.Editors gain edge with access to data in minutes. Fast, easy access to data is critical to helping editors determine what
content will be most shareable in the social media world. With MongoDB, is able to expose performance data shortly after
publication, enabling editors to quickly respond by tweaking headlines and determine the best way to promote.
5.Setting the infrastructure for new applications. As continues its efforts to leverage stats and optimization, MongoDB will
feature prominently in the new infrastructure. MongoDB makes it easy to build apps quickly – a requirement as rolls out
additional products.
Current Scenario
Current Offerings

More Related Content

PPTX
Real Time Analytics
Mohsin Hakim
 
PDF
The technology of the business data lake
Capgemini
 
PPTX
Pervasive analytics through data & analytic centricity
Cloudera, Inc.
 
PPTX
THE FUTURE OF DATA: PROVISIONING ANALYTICS-READY DATA AT SPEED
webwinkelvakdag
 
PDF
Taming Big Data With Modern Software Architecture
Big Data User Group Karlsruhe/Stuttgart
 
PDF
Achieve data democracy in data lake with data integration
Saurabh K. Gupta
 
PDF
Building a Big Data Analytics Platform- Impetus White Paper
Impetus Technologies
 
PDF
Reconciling your Enterprise Data Warehouse to Source Systems
Method360
 
Real Time Analytics
Mohsin Hakim
 
The technology of the business data lake
Capgemini
 
Pervasive analytics through data & analytic centricity
Cloudera, Inc.
 
THE FUTURE OF DATA: PROVISIONING ANALYTICS-READY DATA AT SPEED
webwinkelvakdag
 
Taming Big Data With Modern Software Architecture
Big Data User Group Karlsruhe/Stuttgart
 
Achieve data democracy in data lake with data integration
Saurabh K. Gupta
 
Building a Big Data Analytics Platform- Impetus White Paper
Impetus Technologies
 
Reconciling your Enterprise Data Warehouse to Source Systems
Method360
 

What's hot (18)

PDF
Modern Integrated Data Environment - Whitepaper | Qubole
Vasu S
 
PDF
Struggling with data management
David Walker
 
PPTX
001 More introduction to big data analytics
Dendej Sawarnkatat
 
PPT
Lecture 04 - Granularity in the Data Warehouse
phanleson
 
PDF
The principles of the business data lake
Capgemini
 
PDF
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seeling Cheung
 
PPTX
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
PDF
Accenture hana-in-memory-pov
K Thomas
 
PDF
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
Denodo
 
PDF
Enabling Cloud Data Integration (EMEA)
Denodo
 
PDF
IBM 2016 - Six reasons to upgrade your database
Francisco González Jiménez
 
PPTX
3 Ways Tableau Improves Predictive Analytics
Nandita Nityanandam
 
DOCX
Microsoft SQL Azure - Scaling Out with SQL Azure Whitepaper
Microsoft Private Cloud
 
PDF
How Real TIme Data Changes the Data Warehouse
mark madsen
 
PDF
Redefining Data Analytics Through Search
Connexica
 
PPT
Best practices and trends in people soft
Hazelknight Media & Entertainment Pvt Ltd
 
PPT
Dw & etl concepts
jeshocarme
 
PDF
Use Big Data Technologies to Modernize Your Enterprise Data Warehouse
EMC
 
Modern Integrated Data Environment - Whitepaper | Qubole
Vasu S
 
Struggling with data management
David Walker
 
001 More introduction to big data analytics
Dendej Sawarnkatat
 
Lecture 04 - Granularity in the Data Warehouse
phanleson
 
The principles of the business data lake
Capgemini
 
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seeling Cheung
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
Accenture hana-in-memory-pov
K Thomas
 
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
Denodo
 
Enabling Cloud Data Integration (EMEA)
Denodo
 
IBM 2016 - Six reasons to upgrade your database
Francisco González Jiménez
 
3 Ways Tableau Improves Predictive Analytics
Nandita Nityanandam
 
Microsoft SQL Azure - Scaling Out with SQL Azure Whitepaper
Microsoft Private Cloud
 
How Real TIme Data Changes the Data Warehouse
mark madsen
 
Redefining Data Analytics Through Search
Connexica
 
Best practices and trends in people soft
Hazelknight Media & Entertainment Pvt Ltd
 
Dw & etl concepts
jeshocarme
 
Use Big Data Technologies to Modernize Your Enterprise Data Warehouse
EMC
 
Ad

Viewers also liked (8)

PDF
Building Big Data Streaming Architectures
David Martínez Rego
 
PPTX
KDD 2016 Streaming Analytics Tutorial
Neera Agarwal
 
PPTX
High-Volume Data Collection and Real Time Analytics Using Redis
cacois
 
PPT
Big Data Real Time Analytics - A Facebook Case Study
Nati Shalom
 
PPTX
Real-Time Analytics with MemSQL and Spark
SingleStore
 
PDF
Real Time Analytics: Algorithms and Systems
Arun Kejariwal
 
PDF
QCon São Paulo: Real-Time Analytics with Spark Streaming
Paco Nathan
 
PPTX
Real time Analytics with Apache Kafka and Apache Spark
Rahul Jain
 
Building Big Data Streaming Architectures
David Martínez Rego
 
KDD 2016 Streaming Analytics Tutorial
Neera Agarwal
 
High-Volume Data Collection and Real Time Analytics Using Redis
cacois
 
Big Data Real Time Analytics - A Facebook Case Study
Nati Shalom
 
Real-Time Analytics with MemSQL and Spark
SingleStore
 
Real Time Analytics: Algorithms and Systems
Arun Kejariwal
 
QCon São Paulo: Real-Time Analytics with Spark Streaming
Paco Nathan
 
Real time Analytics with Apache Kafka and Apache Spark
Rahul Jain
 
Ad

Similar to Real Time Analytics (20)

PDF
Business_Analytics_Presentation_Luke_Caratan
Luke Caratan
 
PPTX
Big data unit 2
RojaT4
 
PDF
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
ijscai
 
PDF
Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...
IJSCAI Journal
 
PDF
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
gerogepatton
 
PDF
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
gerogepatton
 
PDF
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
gerogepatton
 
PDF
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
gerogepatton
 
PDF
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
ijscai
 
PPTX
Bangalore Executive Seminar 2015: MongoDB - Your database of choice for real ...
MongoDB
 
PPT
Big Data Analytics Materials, Chapter: 1
RUHULAMINHAZARIKA
 
PDF
Analytics big data ibm
Accenture
 
PDF
IBM-Infoworld Big Data deep dive
Kun Le
 
PDF
INF2190_W1_2016_public
Attila Barta
 
PDF
BigData Analytics_1.7
Rohit Mittal
 
PPTX
Big data
Mani Gandan
 
PDF
Big Data Analytics.pdfbgfjgjgghfhhffhdfyf
VijayKaran7
 
PPTX
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
Vladi Vexler
 
PPTX
Big Data Analytics for BI, BA and QA
Dmitry Tolpeko
 
Business_Analytics_Presentation_Luke_Caratan
Luke Caratan
 
Big data unit 2
RojaT4
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
ijscai
 
Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...
IJSCAI Journal
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
gerogepatton
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
gerogepatton
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
gerogepatton
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
gerogepatton
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
ijscai
 
Bangalore Executive Seminar 2015: MongoDB - Your database of choice for real ...
MongoDB
 
Big Data Analytics Materials, Chapter: 1
RUHULAMINHAZARIKA
 
Analytics big data ibm
Accenture
 
IBM-Infoworld Big Data deep dive
Kun Le
 
INF2190_W1_2016_public
Attila Barta
 
BigData Analytics_1.7
Rohit Mittal
 
Big data
Mani Gandan
 
Big Data Analytics.pdfbgfjgjgghfhhffhdfyf
VijayKaran7
 
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
Vladi Vexler
 
Big Data Analytics for BI, BA and QA
Dmitry Tolpeko
 

More from Mohsin Hakim (8)

DOCX
MohsinHakim
Mohsin Hakim
 
DOCX
Mohsin hakim
Mohsin Hakim
 
PPTX
Iphone
Mohsin Hakim
 
PDF
Mohsin Hakim summery
Mohsin Hakim
 
PDF
History and Kings in India
Mohsin Hakim
 
PDF
For freshers presentation
Mohsin Hakim
 
PPTX
Engineering - Iinformation for teenagers
Mohsin Hakim
 
PDF
Job help
Mohsin Hakim
 
MohsinHakim
Mohsin Hakim
 
Mohsin hakim
Mohsin Hakim
 
Iphone
Mohsin Hakim
 
Mohsin Hakim summery
Mohsin Hakim
 
History and Kings in India
Mohsin Hakim
 
For freshers presentation
Mohsin Hakim
 
Engineering - Iinformation for teenagers
Mohsin Hakim
 
Job help
Mohsin Hakim
 

Recently uploaded (20)

PDF
49784907924775488180_LRN2959_Data_Pump_23ai.pdf
Abilash868456
 
PPTX
slidesgo-unlocking-the-code-the-dynamic-dance-of-variables-and-constants-2024...
kr2589474
 
PDF
Download iTop VPN Free 6.1.0.5882 Crack Full Activated Pre Latest 2025
imang66g
 
PDF
On Software Engineers' Productivity - Beyond Misleading Metrics
Romén Rodríguez-Gil
 
PDF
lesson-2-rules-of-netiquette.pdf.bshhsjdj
jasmenrojas249
 
PPTX
oapresentation.pptx
mehatdhavalrajubhai
 
PPTX
Presentation about Database and Database Administrator
abhishekchauhan86963
 
PDF
What to consider before purchasing Microsoft 365 Business Premium_PDF.pdf
Q-Advise
 
PPTX
Visualising Data with Scatterplots in IBM SPSS Statistics.pptx
Version 1 Analytics
 
PDF
Balancing Resource Capacity and Workloads with OnePlan – Avoid Overloading Te...
OnePlan Solutions
 
PDF
WatchTraderHub - Watch Dealer software with inventory management and multi-ch...
WatchDealer Pavel
 
PPTX
Explanation about Structures in C language.pptx
Veeral Rathod
 
PDF
Protecting the Digital World Cyber Securit
dnthakkar16
 
PPTX
Can You Build Dashboards Using Open Source Visualization Tool.pptx
Varsha Nayak
 
PDF
Jenkins: An open-source automation server powering CI/CD Automation
SaikatBasu37
 
PDF
Enhancing Healthcare RPM Platforms with Contextual AI Integration
Cadabra Studio
 
PDF
Adobe Illustrator Crack Full Download (Latest Version 2025) Pre-Activated
imang66g
 
PDF
Applitools Platform Pulse: What's New and What's Coming - July 2025
Applitools
 
PPT
Why Reliable Server Maintenance Service in New York is Crucial for Your Business
Sam Vohra
 
PPTX
AI-Ready Handoff: Auto-Summaries & Draft Emails from MQL to Slack in One Flow
bbedford2
 
49784907924775488180_LRN2959_Data_Pump_23ai.pdf
Abilash868456
 
slidesgo-unlocking-the-code-the-dynamic-dance-of-variables-and-constants-2024...
kr2589474
 
Download iTop VPN Free 6.1.0.5882 Crack Full Activated Pre Latest 2025
imang66g
 
On Software Engineers' Productivity - Beyond Misleading Metrics
Romén Rodríguez-Gil
 
lesson-2-rules-of-netiquette.pdf.bshhsjdj
jasmenrojas249
 
oapresentation.pptx
mehatdhavalrajubhai
 
Presentation about Database and Database Administrator
abhishekchauhan86963
 
What to consider before purchasing Microsoft 365 Business Premium_PDF.pdf
Q-Advise
 
Visualising Data with Scatterplots in IBM SPSS Statistics.pptx
Version 1 Analytics
 
Balancing Resource Capacity and Workloads with OnePlan – Avoid Overloading Te...
OnePlan Solutions
 
WatchTraderHub - Watch Dealer software with inventory management and multi-ch...
WatchDealer Pavel
 
Explanation about Structures in C language.pptx
Veeral Rathod
 
Protecting the Digital World Cyber Securit
dnthakkar16
 
Can You Build Dashboards Using Open Source Visualization Tool.pptx
Varsha Nayak
 
Jenkins: An open-source automation server powering CI/CD Automation
SaikatBasu37
 
Enhancing Healthcare RPM Platforms with Contextual AI Integration
Cadabra Studio
 
Adobe Illustrator Crack Full Download (Latest Version 2025) Pre-Activated
imang66g
 
Applitools Platform Pulse: What's New and What's Coming - July 2025
Applitools
 
Why Reliable Server Maintenance Service in New York is Crucial for Your Business
Sam Vohra
 
AI-Ready Handoff: Auto-Summaries & Draft Emails from MQL to Slack in One Flow
bbedford2
 

Real Time Analytics

  • 1. Framework for Real time Analytics By Mohsin Hakim Real Time Analytics
  • 2. Index Introduction Evolving BI and Analytics for Big Data Impacts to Traditional BI Databases Challenges MongoDB with Hadoop Case Studies Current Scenario
  • 3. Introduction  Analytics falls along a spectrum. On one end of the spectrum sit batch analytical applications, which are used for complex, long-running analyses. They tend to have slower response times (up to minutes, hours, or days) and lower requirements for availability. Examples of batch analytics include Hadoop-based workloads  On the other end of the spectrum sit real-time analytical applications, which provide lighter-weight analytics very quickly. Latency is low (sub-second) and availability requirements are high (e.g., 99.99%). MongoDB is typically used for real-time analytics. Example applications include: Business Intelligence (BI) and analytics provides an essential set of technologies and processes that organizations have relied upon over many years to guide strategic business decisions.
  • 4. Introduction 1. Predictable Frequency. Data is extracted from source systems at regular intervals - typically measured in days, months and quarters 2. Static Sources. Data is sourced from controlled, internal systems supporting established and well-defined back-office processes 3. Fixed Models. Data structures are known and modeled in advance of analysis. This enables the development of a single schema to accommodate data from all of the source systems, but adds significant time to the upfront design 4. Defined Queries. Questions to be asked of the data (i.e., the reporting queries) are pre-defined. If not all of the query requirements are known upfront, or requirements change, then the schema has to be modified to accommodate changes 5. Slow-changing requirements. Rigorous change-control is enforced before the introduction of new data sources or reporting requirements 6. Limited users. The consumers of BI reports are typically business managers and senior executives
  • 5. Evolving BI and Analytics for Big Data Higher Uptime Requirements The immediacy of real-time analytics accessed from multiple fixed and mobile devices places additional demands on the continuous availability of BI systems. Batch-based systems can often tolerate a certain level of downtime, for example for scheduled maintenance. Online systems on the other hand need to maintain operations during both failures and planned upgrades. The Need for Speed & Scale Time to value is everything. For example, having access to real-time customer sentiment or logistics tracking is of little benefit unless the data can be analyzed and reported in real-time. As a consequence, the frequency of data acquisition, integration and analysis must increase from days to minutes or less, placing significant operational overhead on BI systems. Agile Analytics and Reporting With such a diversity of new data sources, business analysts can not know all of the questions they need to ask in advance. Therefore an essential requirement is that the data can be stored before knowing how it will be processed and queried. The Changing Face of Data Data generated by such workloads as social, mobile, sensor and logging, is much more complex and variably structured than traditional transaction data from back-office systems such as ERP, CRM, PoS (Point of Sale) and Accounts Receivable. Taking BI to the Cloud The drive to embrace cloud computing to reduce costs and improve agility means BI components that have traditionally relied on databases deployed on monolithic, scale-up systems have to be re-designed for the elastic scale-out, service-oriented architectures of cloud.
  • 6. Impacts to Traditional BI Databases The relational databases underpinning many of today’s traditional BI platforms are not well suited to the requirements of big data: • Semi-structured and unstructured data typical in mobile, social and sensor-driven applications cannot be efficiently represented as rows and columns in a relational database table • Rapid evolution of database schema to support new data sources and rapidly changing data structures is not possible in relational databases, which rely on costly ALTER TABLE operations to add or modify table attributes • Performance overhead of JOINs and transaction semantics prevents relational databases from keeping pace with the ingestion of high-velocity data sources • Quickly growing data volumes require scaling databases out across commodity hardware, rather than the scale-up approach typical of most relational databases Relational databases’ inability to handle the speed, size and diversity of rapidly changing data generated by modern applications is already driving the enterprise adoption of NoSQL and Big Data technologies in both operational and analytical roles.
  • 7. The purpose • Flume in Hadoop, for batch processing, which make the data relevant time-wise; it can be used for real time because it would be too fresh, only from several min to even a second late. • Flume engine, using server side in order to make decisions regarding the current state of affairs. • Decisions Making are made based on whatever data is received from customers’ current condition without all of the history in their user profiles, which would enable a much more informed decision. • State of Art Auto updating charting and report creation with Dashboard UI. Increase scalability and performance of Organizations using Real Time Analysis platform with a focus on storing, processing and analyzing the exponentially growing data using big data technologies.
  • 8. Challenges 1. Getting data metrics to the right people Often, social media is treated like the ugly stepchild within the marketing department and real-time social media analytics are either absent or ignored. 2. Visualization Visualizing real-time social media analytics is another key element involved in developing insights that matter. Simply displaying values graphically helps in making the kinds of fast interpretations necessary for making decisions with real-time data, but adding more complex algorithms and using models provides deeper insights, especially when visualized. 3. Unstructured data is challenging Unlike the survey data firms are used to dealing with, most (IBM estimates 80%) is unstructured — meaning it consists of words rather than numbers. And, text analytics lags seriously behind numeric analysis. 4. Increasing signal to noise Social media data is inherently noisy. Reducing noise to even detect signal is challenging — especially in real time. Sure, with enough time, new analytics tools can ferret out the few meaningful
  • 9. Top 10 Priorities 1 Enable new fast-paced business practices 2 Don’t expect the new stuff to replace the old stuff 3 Do not assume that all the data needs to be in real time, all the time 4 Correlate real-time data with data from other sources and latencies 5 Start with a proof of value with measurable outcomes 6 As a safe starter project, accelerate successful latent processes into near real time 7 Think about operationalizing analytics 8 Think about the skills you need 9 Examine application business rules to ensure they are ready for real-time data flows 10 Evaluate technology platforms and expertise for availability and reliability
  • 10. Challenges Real-Time Analytics is Hard Can’t Stay Ahead. You need to account for many types of data, including unstructured and semi-structured data. And new sources present themselves unpredictably. Relational databases aren’t capable of handling this, which leaves you hamstrung. Can’t Scale. You need to analyze terabytes or petabytes of data. You need sub-second response times. That’s a lot more than a single server can handle. Relational databases weren’t designed for this Batch. Batch processes are the right approach for some jobs. But in many cases, you need to analyze rapidly changing, multi-structured data in real time. You don’t have the luxury of lengthy ETL processes to cleanse data for later. MongoDB Makes it Easy Do the Impossible. MongoDB can incorporate any kind of data – any structure, any format, any source – no matter how often it changes. Your analytical engines can be comprehensive and real- time. Scale Big. MongoDB is built to scale out on commodity hardware, in your data center or in the cloud. And without complex hardware or extra software. This shouldn’t be hard, and with MongoDB, it isn’t. Real Time. MongoDB can analyze data of any structure directly within the database, giving you results in real time, and without expensive data warehouse loads.
  • 11. Why Other Databases Fall Short and MangoDB Most databases make you chose between a flexible data model, low latency at scale, and powerful access. But increasingly you need all three at the same time.  Rigid Schemas. You should be able to analyze unstructured, semi-structured, and polymorphic data. And it should be easy to add new data. But this data doesn’t belong in relational rows and columns. Plus, relational schemas are hard to change incrementally, especially without impacting performance or taking the database offline.  Scaling Problems. Relational databases were designed for single-server configurations, not for horizontal scale-out. They were meant to serve 100s of ops per second, not 100,000s of ops per second. Even with a lot of engineering hours, custom sharding layers, and caches, scaling an RDBMS is hard at best and impossible at worst.  Takes Too Long. Analyzing data in real time requires a break from the familiar ETL and data warehouse approach. You don’t have time for lengthy load schedules, or to build new query models. You need to run aggregation queries against variably structured data. And you should be able to do so in place, in real time. Organizations are using MongoDB for analytics because it lets them store any kind of data, analyze it in real time, and change the schema as they go. New Data. MongoDB’s document model enables you to store and process data of any structure: events, time series data, geospatial coordinates, text and binary data, and anything else. You can adapt the structure of a document’s schema just by adding new fields, making it simple to bring in new data as it becomes available. Horizontal Scalability. MongoDB’s automatic sharding distributes data across fleets of commodity servers, with complete application transparency. With multiple options for scaling – including range-based, hash-based and location- aware sharding – MongoDB can support thousands of nodes, petabytes of data, and hundreds of thousands of ops per second without requiring you to build custom partitioning and caching layers. Powerful Analytics, In Place, In Real Time. With rich index and query support – including secondary, geospatial and text search indexes – as well as the aggregation framework and native MapReduce, MongoDB can run complex ad-hoc analytics and reporting in place.
  • 12. MongoDB with Hadoop MongoDB Hadoop Ebay User data and metadata management for product catalog User analysis for personalized search & recommendations Orbitz Management of hotel data and pricing Hotel segmentation to support building search facets Pearson Student identity and access control. Content management of course materials Student analytics to create adaptive learning programs Foursquare User data, check-ins, reviews, venue content management User analysis, segmentation and personalization Tier 1 Investment Bank Tick data, quants analysis, reference data distribution Risk modeling, security and fraud detection Industrial Machinery Manufactur er Storage and real-time analytics of sensor data collected from connected vehicles Preventive maintenance programs for fleet optimization. In-field monitoring of vehicle components for design enhancements SFR Customer service applications accessed via online portals and call centers Analysis of customer usage, devices & pricing to optimize plans The following table provides examples of customers using MongoDB together with Hadoop to power big data applications. Whether improving customer service, supporting cross-sell and upsell, enhancing business efficiency or reducing risk, MongoDB and Hadoop provide the foundation to operationalize big data.
  • 13. Future Trends in Real-Time Data, BI, and Analytics Data types handled in real time today. Numerous TDWI surveys have shown that structured data (which includes relational data) is by far the most common class of data types handled for BI and analytic purposes, as well as many operational and transactional ones. It’s no surprise that structured data bubbled to the top of Figure 16. Other data types and sources commonly handled in real time today include application logs (33%), event data (26%), semi-structured data (26%), and hierarchical and raw data (24% each). Data types to be handled in real time within three years. Looking ahead, a number of data types are poised for greater real-time usage. Some are in limited use today but will experience aggressive adoption within three years, namely social media data (38%), Web logs and clickstreams (34%), and unstructured data (34%). Others are handled in real time today and will become even more so, namely event (36%), semi-structured (33%), structured (31%), and hierarchical (30%) data.
  • 15. MongoDB Integration with BI and Analytics Tools To make online big data actionable through dashboards, reports, visualizations and integration with other data sources, it must be accessible to established BI and analytics tools. MongoDB offers integration with more of the leading BI tools than any other NoSQL or online big data technology, including: Actuate Alteryx Informatica Jaspersoft Logi Analytics MicroStrategy Pentaho Qliktech SAP Lumira
  • 16. WindyGrid’s One person, one laptop, and MongoDB’s technology jumpstarted a project that, with other people joining in, went from prototype to one of the nation’s pioneering projects to analyze and act on municipal data in real time. In just four months. WindyGrid put Chicago on the path of revolutionizing how it operates not by replacing the administrative systems already in place, but by using MongoDB to bring that data together into a new application. With MongoDB’s flexible data model, WindyGrid doesn’t have to go back and redo the schema for each new piece of data. Instead, it can evolve schemas in real time. Which is crucial as WindyGrid expands and adds predictive analytics, growing by millions of pieces of structured and unstructured data each day.
  • 17. Crittercism is A Mobile Pioneer Crittercism doesn’t just monitor apps or gather information. Using MongoDB’s powerful built in query functions, it analyzes avalanches of unstructured and non-uniform data in real time. It recognizes patterns, identifies trends, and diagnoses problems. That means that Cirttercism’s customers immediately understand the root cause of problems and the impact they’re having on business. So they know how to prioritize and correct the problems they’re facing and improve performance The kind of real time analysis that Crittercism provides customers would also be impossible with traditional databases. Crittercism is using MongoDB’s powerful query functions to analyze the broad variety of data it collects, in real time, within the database. A more traditional data warehouse approach, with ETLs and long loading times, can’t match this type of speed. At the same time, MongoDB lets Crittercism efficiently handle the tons of data it’s collecting. During the past two years, the number of requests that Crittercism gathers and analyzes has jumped from 700 to 45,000 per second. Relational databases have a hard time scaling to meet these kinds of demands, typically requiring expensive add-on software, or additional layers of proprietary code, to keep up. With MongoDB, horizontal scalability across multiple data centers is a native function.
  • 18. McAfee - Global Cybersecurity GTI analyzes cyberthreats from all angles, identifying threat relationships, such as malware used in network intrusions, websites hosting malware, botnet associations, and more. Threat information is extremely time sensitive; knowing about a threat from weeks ago is useless. In order to provide up to date, comprehensive threat information, needs to quickly process terabytes of different data types (such as IP address or domain) into meaningful relationships: e.g. Is this web site good or bad? What other sites have been interacting with it? The success of the cloud-based system also depends on a bidirectional data flow: GTI gathers data from millions of client sensors and provides real-time intelligence back to these end products, at a rate of 100 billion queries per month. Was unable to address these needs and effectively scale out to millions of records with their existing solutions. For example, the HBase / Hadoop setup made it difficult to run interesting, complex queries, and experienced bugs with the Java garbage collector running out of memory. Another issue was with sharding and syncing; Lucene was able to index in interesting ways, but required too much customization. compensated for all the rebuilding and redeploying of Katta shards with “the usual scripting duct tape,” but what they really needed was a solution that could seamlessly handle the sharding and updating on its own. selected MongoDB, which had excellent documentation and a growing community that was “on fire.”
  • 19. Power Journalism BuzzFeed, the social news and entertainment company, relies on MongoDB to analyze all performance data for its content across the social web. A core part of BuzzFeed’s publishing platform, MongoDB exposes metrics to editors and writers in real time, to help them understand how its content is performing and to optimize for the social web. The company has been using MongoDB since 2010. Here’s why. 1.Analytics provide more insight, more quickly. relies on MongoDB for its strategic analytics platform. With apps and dashboards built on MongoDB, can pinpoint when content is viewed and how it is shared. With this approach, is able to quickly gain insight on how its content performs, nimbly optimize user’s experience for posts that are performing best and is able to deliver critical feedback to its writers and editors. 2.BuzzFeed is data-driven. At BuzzFeed, data drives decision-making and powers the company. MongoDB enables to effectively analyze, track and expose a range of metrics to writers and employees. This includes: the number of clicks; how often and where posts are being shared; which views on different social media properties lead to the most shares; and how views differ across mobile and desktop. 3.Successful web journalism demands scale. processes large volumes of data and this is increasing each year as the site’s traffic continues to grow. Originally built on a relational data store, decided to use MongoDB, a more scalable solution, to collect and track the data they need with a richer functionality than a standard key-value store. 4.Editors gain edge with access to data in minutes. Fast, easy access to data is critical to helping editors determine what content will be most shareable in the social media world. With MongoDB, is able to expose performance data shortly after publication, enabling editors to quickly respond by tweaking headlines and determine the best way to promote. 5.Setting the infrastructure for new applications. As continues its efforts to leverage stats and optimization, MongoDB will feature prominently in the new infrastructure. MongoDB makes it easy to build apps quickly – a requirement as rolls out additional products.