SlideShare a Scribd company logo
BI/Analytics for NoSQL:
Review of Architectures
What we'll answer in 50 minutes
•   Who is this guy?
•   How do I enable AdHoc, self
    service reporting on NoSQL?
•   How do I improve the
    performance of dashboards
    on top of NoSQL?
•   How do I integrate NoSQL
    data with my other data not
    inside NoSQL?
•   How do I enable, easy to build
    simple reports but also
    preserve the ability for rich
    NoSQL queries?
Nicholas Goodman

•    Open Source BI thought leader
       –    50+ Open Source BI customer projects
       –    Blogger, whitepapers, etc
•    Entrepreneur
       –    DynamoBI Corporation
       –    Bayon Technologies, Inc.
•    Data Geek, hacker, tinkerer, committer



    GOAL: Share perspectives,
    research, opinions.
    DISCLAIMER: Your Mileage ...
How do we answer those Q's?
Promise of “Big Data”
•   NoSQL/Hadoop/MapReduce Systems
     –   Keep more of it
     –   Cost effective analysis
     –   “Massive scale” data, now accessible to everyone (elastic)
     –   Not just SQL queries, more complex analysis




     ACCOMPLISHED: WEB SCALE, MASSIVE
     NEVER BEFORE SEEN SCALE OF DATA
     STORAGE AND PROCESSING
Reality Check!


•   Petabytes? Y                  •   Fast Queries? N
•   Cheap Storage? Y              •   Ad Hoc access? N
•   Raw Processing? Y             •   Accessibility to commodity BI
                                      tools? N
•   Rich Query Languages? Y
•   Flexible data structures? Y•      Easy report authoring? N

•   Reliable, Fault Tolerant? Y•      Levels of Aggregation? N
                               •      Integrated Data? N

     Big Data has solved the INFRASTRUCTURE of
     raw/core data storage but has provided less value
     to what BUSINESS users want for analytics.
Data Gaps too!



•   Code, Developers             •   Analysts w/ Excel, Dashboards
•   MR, Rich Graph/Access        •   Simple 2D (tables, charts)
•   Hierarchical, Unstructured   •   Filtering and easy analytics
Levels of Aggregation

SAME DATA AT VARIOUS
LEVELS OF AGGREGATION
HUGELY IMPORTANT IN REAL
LIFE IMPLEMENTATIONS!

                               10K
1 ROW                       1 MILLION
TO                         100 MILLION
1 BILLION ROWS
                           100 BILLION
Architectures

•   NoSQL   reports
•   NoSQL   thru and thru
•   NoSQL   + MySQL
•   NoSQL   as ETL Source
•   NoSQL   programs in BI Tools
•   NoSQL   via BI Database (SQL)
NoSQL reports
•   Pay Developer to build applications for reports



                                              Apps




•      100% Richness of NoSQL           •     $$, developer driven process
•      Up to date, current              •     No commodity BI tools
•      Excellent performance on         •     Managing rollups/summaries
       large datasets                   •     Schema-less = Harder!
•      Custom built, beautiful          •     Hard to integrate other
       reports/dashboards                     reporting information
•      Single system to manage
NoSQL thru and thru
•   Pay Developer to build FLEXIBLE applications for reports


      Indices                                 Advanced
       Aggs                                   Apps




•      All of NoSQL report              •     $$, developer driven process
       advantages                       •     $$, app required for aggs
•      Managed aggregations,            •     No commodity BI tools
       rollups
                                        •     Hard to integrate other
•      “Guided Adhoc” available               reporting information
       inside application
                                        •     Limited AdHoc (only
•      Higher performance for                 developer built
       dashboards/summaries                   combinations)
NoSQL + MySQL
•   Pay Developer to build FLEXIBLE applications for reports


                         ETL
                         App                MySQL




•      Less IT $$ since developers      •     Data freshness (24 hrs old)
       aren't “building reports”        •     Once into MySQL no rich
•      Rich, NoSQL analysis left in           NoSQL application use (M/R)
       place (ETL + NoSQL)              •     BI Tool can connect ONLY to
•       Easy, Ad Hoc reporting via            data in MySQL, not NoSQL
       commodity BI tools               •     Aggregations still self
•      Easier to understand data for          managed in MySQL
       self service reports
NoSQL as ETL Data Source
•   NoSQL treated like any other data source


                    Informatica         Teradata




•   Allows use of consolidated,     •     ETL Development Expense
    BI tool for AdHoc               •     Data Latency
•   Enables integrated              •     Loss of NoSQL language
    (combined) datasets for               richness
    reporting
                                    •     Traditional DW tools are $$
•   Aggregations Often
    “managed”                       •     Scaling issues with DW
                                          Database
•   Best of Breed tools
NoSQL programs in BI Tools
•   Write a program in BI tool that flattens data, output into report




•   Rich use of NoSQL native         •      Developer required to write
    language                                program ($$)
•   Direct, up to date access        •      Slow-er (aggs, summaries)
•   Access to 100% of dataset        •      Lacks integration with other
•   Leverage “guided” report                datasets
    parameter pages                  •      Still (usually) no AdHoc
•   Less expensive than apps                access
NoSQL via BI Database (SQL)
•   Enable NoSQL data access via SQL (gasp!)            Live Query
                                                        Cached, 24hr data




•      Easy reports, easy (SQL)      •         Another system in between
•      Integration with other data   •         Still needs to be refreshed,
•      ETL is simple INSERT/MERGEs             nightly
•      Live, up to date access       •         Not all capabilities for NoSQL
                                               richness available via SQL
•      High performance, cached data
•      AdHoc access to Live + Cached
•      Aggregations/Summaries
Mozilla: NoSQL thru and thru(DB)
•   Socorro Project: Crash reports, optionally sent to Mozilla
•   https://crash-stats.mozilla.com
X: NoSQL via SQL
•   Using “Splunk” (ie, a commercial NoSQL-eee data aggregator/etc)
•   Desire to use Tableau for advanced analytics/visualization
Meteor Solutions:
        NoSQL thru and thru
•   Using Cloudant BigCouch solution (SaaS)
•   High performance set of multi purpose indices on pre defined
    aggregations
•   Up to date aggregation/reports
•   Better fit for Social Media graph structures over relational DB
•   Custom built BI applications (dashboards/reports) providing a
    flexible guided view through data


                                          Advanced
                                          Apps
A,B,C: NoSQL + MySQL
•   Many Many companies (3 we've worked with)
•   All “web related” companies (semi structured, some, mostly
    volume)
•   Heavy lifting and storage, and “ETL/Data prepartion” inside
    Hadoop
•   Push summarized, aggregated data into MySQL for analysis by
    easy, dashboarding/BI Tools




                     ETL
                     App              MySQL

More Related Content

PDF
Webinar: Hybrid Cloud Integration - Why It's Different and Why It Matters
SnapLogic
 
PPTX
Worldwide Hybrid Cloud Computing Market – Drivers, Opportunities, Trends, and...
Infoholic Research
 
PDF
EMC APAC State of Hybrid Cloud
Ai-Ling See
 
PDF
On Demand BI
Darren Cunningham
 
PDF
[Infographic] Cloud Integration Drivers and Requirements in 2015
SnapLogic
 
PPTX
The SnapLogic Integration Cloud for ServiceNow
SnapLogic
 
PPT
Informatica Cloud Overview
Darren Cunningham
 
PPTX
PgConf 2018 - Postgres in a World of DevOps
EDB
 
Webinar: Hybrid Cloud Integration - Why It's Different and Why It Matters
SnapLogic
 
Worldwide Hybrid Cloud Computing Market – Drivers, Opportunities, Trends, and...
Infoholic Research
 
EMC APAC State of Hybrid Cloud
Ai-Ling See
 
On Demand BI
Darren Cunningham
 
[Infographic] Cloud Integration Drivers and Requirements in 2015
SnapLogic
 
The SnapLogic Integration Cloud for ServiceNow
SnapLogic
 
Informatica Cloud Overview
Darren Cunningham
 
PgConf 2018 - Postgres in a World of DevOps
EDB
 

What's hot (19)

PDF
Aws based digital_transformation_platform
Slobodan Sipcic
 
PPTX
Native Spark Executors on Kubernetes: Diving into the Data Lake - Chicago Clo...
Mariano Gonzalez
 
PDF
Webinar: It's the 21st Century - Why Isn't Your Data Integration Loosely Coup...
SnapLogic
 
PDF
Postgres Vision 2018: AI Needs IA
EDB
 
PDF
Toyota Financial Services Digital Transformation - Think 2019
Slobodan Sipcic
 
PDF
Intelligent Business Process Management Suites (iBPMS) - The Next-Generation ...
Kai Wähner
 
PPTX
Informatica Cloud Data Replication for Salesforce
Darren Cunningham
 
PPTX
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020
Mariano Gonzalez
 
PDF
Framework and Product Comparison for Big Data Log Analytics and ITOA
Kai Wähner
 
PDF
20181212 AWS NL - Informatica Cloud Overview
Greg Rakers
 
PDF
5 Pillars of API Management
Rich Graham
 
PPT
Business Intelligence in the Cloud I
RightScale
 
PDF
RightScale Webinar: Hybrid Cloud Fundamentals and Lessons Learned
RightScale
 
PDF
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
Igor De Souza
 
PPTX
Powering the Enterprise Cloud with CSC and Hitachi Data Systems
Hitachi Vantara
 
PDF
Roman Pavlyuk, Yaroslav Ravlinko, Intellias. Enterprise IT Transformation and...
IT Arena
 
PPTX
Migrating to the Cloud – Is Application Performance Monitoring still required?
eG Innovations
 
PPTX
Informatica Cloud Winter 2016 Release Webinar
Informatica Cloud
 
PPTX
Webinar: Building a Multi-Cloud Strategy with Data Autonomy featuring 451 Res...
DataStax
 
Aws based digital_transformation_platform
Slobodan Sipcic
 
Native Spark Executors on Kubernetes: Diving into the Data Lake - Chicago Clo...
Mariano Gonzalez
 
Webinar: It's the 21st Century - Why Isn't Your Data Integration Loosely Coup...
SnapLogic
 
Postgres Vision 2018: AI Needs IA
EDB
 
Toyota Financial Services Digital Transformation - Think 2019
Slobodan Sipcic
 
Intelligent Business Process Management Suites (iBPMS) - The Next-Generation ...
Kai Wähner
 
Informatica Cloud Data Replication for Salesforce
Darren Cunningham
 
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020
Mariano Gonzalez
 
Framework and Product Comparison for Big Data Log Analytics and ITOA
Kai Wähner
 
20181212 AWS NL - Informatica Cloud Overview
Greg Rakers
 
5 Pillars of API Management
Rich Graham
 
Business Intelligence in the Cloud I
RightScale
 
RightScale Webinar: Hybrid Cloud Fundamentals and Lessons Learned
RightScale
 
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
Igor De Souza
 
Powering the Enterprise Cloud with CSC and Hitachi Data Systems
Hitachi Vantara
 
Roman Pavlyuk, Yaroslav Ravlinko, Intellias. Enterprise IT Transformation and...
IT Arena
 
Migrating to the Cloud – Is Application Performance Monitoring still required?
eG Innovations
 
Informatica Cloud Winter 2016 Release Webinar
Informatica Cloud
 
Webinar: Building a Multi-Cloud Strategy with Data Autonomy featuring 451 Res...
DataStax
 
Ad

Viewers also liked (9)

PDF
Cómo usar pentaho report design
Javier Garcia Lopez
 
DOC
Anubhav Jain
Anubhav Jain
 
PDF
Webinar: Attaining Excellence in Big Data Integration
SnapLogic
 
ODP
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Roland Bouman
 
PDF
The Impact of SMACT on the Data Management Stack
SnapLogic
 
PDF
Kettle: Pentaho Data Integration tool
Alex Rayón Jerez
 
ODP
Pentaho Data Integration Introduction
mattcasters
 
PDF
Couchbase 3.0.2 d1
Sachin Kumar Kansal
 
PDF
Technical workshops during #disummit in Brussels 30 March
DigitYser
 
Cómo usar pentaho report design
Javier Garcia Lopez
 
Anubhav Jain
Anubhav Jain
 
Webinar: Attaining Excellence in Big Data Integration
SnapLogic
 
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Roland Bouman
 
The Impact of SMACT on the Data Management Stack
SnapLogic
 
Kettle: Pentaho Data Integration tool
Alex Rayón Jerez
 
Pentaho Data Integration Introduction
mattcasters
 
Couchbase 3.0.2 d1
Sachin Kumar Kansal
 
Technical workshops during #disummit in Brussels 30 March
DigitYser
 
Ad

Similar to No sql now2011_review_of_adhoc_architectures (20)

PDF
Presentation big dataappliance-overview_oow_v3
xKinAnx
 
PPTX
Big Data Warehousing Meetup with Riak
Caserta
 
PPTX
Power BI vs Tableau
Don Hyun
 
PPTX
Power bi vs tableau
Affirma Consulting
 
PDF
SQL Server Konferenz 2014 - SSIS & HDInsight
Tillmann Eitelberg
 
PPTX
Oracle big data appliance and solutions
solarisyougood
 
PPTX
Apache Hadoop Hive
Some corner at the Laboratory
 
PPTX
Data Visualization_ Power BI vs. Tableau.pptx
HakimAlHuribi
 
PPTX
Preparing for BI in the Cloud with Windows Azure
Perficient, Inc.
 
PPTX
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
Alex Gorbachev
 
PDF
The Best Local Database for React Native Application Development .pdf
Techugo
 
PDF
Sql Saturday Jacksonville- Power BI Report Server Enterprise Architecture, to...
Vishal Pawar
 
PDF
QuerySurge Slide Deck for Big Data Testing Webinar
RTTS
 
PPTX
FatDB Intro
Justin Weiler
 
PPTX
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
Caserta
 
PPTX
Oracle OpenWo2014 review part 03 three_paa_s_database
Getting value from IoT, Integration and Data Analytics
 
PDF
Bake-off Power BI
SoHo Dragon
 
PPT
Big data
R prasad
 
PPT
Big Data Paris : Hadoop and NoSQL
Tugdual Grall
 
PDF
The Crown Jewels: Is Enterprise Data Ready for the Cloud?
Inside Analysis
 
Presentation big dataappliance-overview_oow_v3
xKinAnx
 
Big Data Warehousing Meetup with Riak
Caserta
 
Power BI vs Tableau
Don Hyun
 
Power bi vs tableau
Affirma Consulting
 
SQL Server Konferenz 2014 - SSIS & HDInsight
Tillmann Eitelberg
 
Oracle big data appliance and solutions
solarisyougood
 
Apache Hadoop Hive
Some corner at the Laboratory
 
Data Visualization_ Power BI vs. Tableau.pptx
HakimAlHuribi
 
Preparing for BI in the Cloud with Windows Azure
Perficient, Inc.
 
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
Alex Gorbachev
 
The Best Local Database for React Native Application Development .pdf
Techugo
 
Sql Saturday Jacksonville- Power BI Report Server Enterprise Architecture, to...
Vishal Pawar
 
QuerySurge Slide Deck for Big Data Testing Webinar
RTTS
 
FatDB Intro
Justin Weiler
 
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
Caserta
 
Oracle OpenWo2014 review part 03 three_paa_s_database
Getting value from IoT, Integration and Data Analytics
 
Bake-off Power BI
SoHo Dragon
 
Big data
R prasad
 
Big Data Paris : Hadoop and NoSQL
Tugdual Grall
 
The Crown Jewels: Is Enterprise Data Ready for the Cloud?
Inside Analysis
 

More from Nicholas Goodman (16)

PPT
Module Owb Targets
Nicholas Goodman
 
PPT
Module Owb External Execution
Nicholas Goodman
 
PPT
Module Owb Mappings
Nicholas Goodman
 
PPT
Module Owb Tuning
Nicholas Goodman
 
PPT
Module Owb Source Metadata
Nicholas Goodman
 
PPT
Module Owb Basics
Nicholas Goodman
 
PPT
Module Owb Execute Mappings
Nicholas Goodman
 
PPT
Module Owb Web Browsers
Nicholas Goodman
 
PPT
Module Owb Process Flows
Nicholas Goodman
 
PPT
Module Owb Deploying Objects
Nicholas Goodman
 
PPT
Module Owb Metadata
Nicholas Goodman
 
PPT
Module Owb Security
Nicholas Goodman
 
PPT
Module Owb Lifecycle
Nicholas Goodman
 
PPT
Module Owb Repositories
Nicholas Goodman
 
PPT
Module Owb Advanced Features
Nicholas Goodman
 
PDF
Data Warehouse 101 - U W Guest Lecture
Nicholas Goodman
 
Module Owb Targets
Nicholas Goodman
 
Module Owb External Execution
Nicholas Goodman
 
Module Owb Mappings
Nicholas Goodman
 
Module Owb Tuning
Nicholas Goodman
 
Module Owb Source Metadata
Nicholas Goodman
 
Module Owb Basics
Nicholas Goodman
 
Module Owb Execute Mappings
Nicholas Goodman
 
Module Owb Web Browsers
Nicholas Goodman
 
Module Owb Process Flows
Nicholas Goodman
 
Module Owb Deploying Objects
Nicholas Goodman
 
Module Owb Metadata
Nicholas Goodman
 
Module Owb Security
Nicholas Goodman
 
Module Owb Lifecycle
Nicholas Goodman
 
Module Owb Repositories
Nicholas Goodman
 
Module Owb Advanced Features
Nicholas Goodman
 
Data Warehouse 101 - U W Guest Lecture
Nicholas Goodman
 

Recently uploaded (20)

PDF
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
PDF
Doc9.....................................
SofiaCollazos
 
PDF
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
PDF
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
PDF
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
PDF
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
PPTX
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
PDF
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
PDF
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
PDF
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
PDF
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
PDF
The Future of Artificial Intelligence (AI)
Mukul
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
PDF
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
Doc9.....................................
SofiaCollazos
 
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
The Future of Artificial Intelligence (AI)
Mukul
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 

No sql now2011_review_of_adhoc_architectures

  • 2. What we'll answer in 50 minutes • Who is this guy? • How do I enable AdHoc, self service reporting on NoSQL? • How do I improve the performance of dashboards on top of NoSQL? • How do I integrate NoSQL data with my other data not inside NoSQL? • How do I enable, easy to build simple reports but also preserve the ability for rich NoSQL queries?
  • 3. Nicholas Goodman • Open Source BI thought leader – 50+ Open Source BI customer projects – Blogger, whitepapers, etc • Entrepreneur – DynamoBI Corporation – Bayon Technologies, Inc. • Data Geek, hacker, tinkerer, committer GOAL: Share perspectives, research, opinions. DISCLAIMER: Your Mileage ...
  • 4. How do we answer those Q's?
  • 5. Promise of “Big Data” • NoSQL/Hadoop/MapReduce Systems – Keep more of it – Cost effective analysis – “Massive scale” data, now accessible to everyone (elastic) – Not just SQL queries, more complex analysis ACCOMPLISHED: WEB SCALE, MASSIVE NEVER BEFORE SEEN SCALE OF DATA STORAGE AND PROCESSING
  • 6. Reality Check! • Petabytes? Y • Fast Queries? N • Cheap Storage? Y • Ad Hoc access? N • Raw Processing? Y • Accessibility to commodity BI tools? N • Rich Query Languages? Y • Flexible data structures? Y• Easy report authoring? N • Reliable, Fault Tolerant? Y• Levels of Aggregation? N • Integrated Data? N Big Data has solved the INFRASTRUCTURE of raw/core data storage but has provided less value to what BUSINESS users want for analytics.
  • 7. Data Gaps too! • Code, Developers • Analysts w/ Excel, Dashboards • MR, Rich Graph/Access • Simple 2D (tables, charts) • Hierarchical, Unstructured • Filtering and easy analytics
  • 8. Levels of Aggregation SAME DATA AT VARIOUS LEVELS OF AGGREGATION HUGELY IMPORTANT IN REAL LIFE IMPLEMENTATIONS! 10K 1 ROW 1 MILLION TO 100 MILLION 1 BILLION ROWS 100 BILLION
  • 9. Architectures • NoSQL reports • NoSQL thru and thru • NoSQL + MySQL • NoSQL as ETL Source • NoSQL programs in BI Tools • NoSQL via BI Database (SQL)
  • 10. NoSQL reports • Pay Developer to build applications for reports Apps • 100% Richness of NoSQL • $$, developer driven process • Up to date, current • No commodity BI tools • Excellent performance on • Managing rollups/summaries large datasets • Schema-less = Harder! • Custom built, beautiful • Hard to integrate other reports/dashboards reporting information • Single system to manage
  • 11. NoSQL thru and thru • Pay Developer to build FLEXIBLE applications for reports Indices Advanced Aggs Apps • All of NoSQL report • $$, developer driven process advantages • $$, app required for aggs • Managed aggregations, • No commodity BI tools rollups • Hard to integrate other • “Guided Adhoc” available reporting information inside application • Limited AdHoc (only • Higher performance for developer built dashboards/summaries combinations)
  • 12. NoSQL + MySQL • Pay Developer to build FLEXIBLE applications for reports ETL App MySQL • Less IT $$ since developers • Data freshness (24 hrs old) aren't “building reports” • Once into MySQL no rich • Rich, NoSQL analysis left in NoSQL application use (M/R) place (ETL + NoSQL) • BI Tool can connect ONLY to • Easy, Ad Hoc reporting via data in MySQL, not NoSQL commodity BI tools • Aggregations still self • Easier to understand data for managed in MySQL self service reports
  • 13. NoSQL as ETL Data Source • NoSQL treated like any other data source Informatica Teradata • Allows use of consolidated, • ETL Development Expense BI tool for AdHoc • Data Latency • Enables integrated • Loss of NoSQL language (combined) datasets for richness reporting • Traditional DW tools are $$ • Aggregations Often “managed” • Scaling issues with DW Database • Best of Breed tools
  • 14. NoSQL programs in BI Tools • Write a program in BI tool that flattens data, output into report • Rich use of NoSQL native • Developer required to write language program ($$) • Direct, up to date access • Slow-er (aggs, summaries) • Access to 100% of dataset • Lacks integration with other • Leverage “guided” report datasets parameter pages • Still (usually) no AdHoc • Less expensive than apps access
  • 15. NoSQL via BI Database (SQL) • Enable NoSQL data access via SQL (gasp!) Live Query Cached, 24hr data • Easy reports, easy (SQL) • Another system in between • Integration with other data • Still needs to be refreshed, • ETL is simple INSERT/MERGEs nightly • Live, up to date access • Not all capabilities for NoSQL richness available via SQL • High performance, cached data • AdHoc access to Live + Cached • Aggregations/Summaries
  • 16. Mozilla: NoSQL thru and thru(DB) • Socorro Project: Crash reports, optionally sent to Mozilla • https://crash-stats.mozilla.com
  • 17. X: NoSQL via SQL • Using “Splunk” (ie, a commercial NoSQL-eee data aggregator/etc) • Desire to use Tableau for advanced analytics/visualization
  • 18. Meteor Solutions: NoSQL thru and thru • Using Cloudant BigCouch solution (SaaS) • High performance set of multi purpose indices on pre defined aggregations • Up to date aggregation/reports • Better fit for Social Media graph structures over relational DB • Custom built BI applications (dashboards/reports) providing a flexible guided view through data Advanced Apps
  • 19. A,B,C: NoSQL + MySQL • Many Many companies (3 we've worked with) • All “web related” companies (semi structured, some, mostly volume) • Heavy lifting and storage, and “ETL/Data prepartion” inside Hadoop • Push summarized, aggregated data into MySQL for analysis by easy, dashboarding/BI Tools ETL App MySQL