SlideShare a Scribd company logo
The Briefing Room
As You Seek—How Search Enables Big Data Analytics
Twitter Tag: #briefr The Briefing Room
Welcome
Host:
Eric Kavanagh
eric.kavanagh@bloorgroup.com
Twitter Tag: #briefr The Briefing Room
!   Reveal the essential characteristics of enterprise software,
good and bad
!   Provide a forum for detailed analysis of today s innovative
technologies
!   Give vendors a chance to explain their product to savvy
analysts
!   Allow audience members to pose serious questions... and get
answers!
Mission
Twitter Tag: #briefr The Briefing Room
JUNE: Database
July: CLOUD
August: HIGH PERFORMANCE ANALYTICS
September: ANALYTICS
Twitter Tag: #briefr The Briefing Room
Database
Better SEARCH
Faster INSIGHT
Twitter Tag: #briefr The Briefing Room
Analyst: Robin Bloor
Robin Bloor is
Chief Analyst at
The Bloor Group	
	
robin.bloor@bloorgroup.com
Twitter Tag: #briefr The Briefing Room
! MarkLogic is an enterprise-class NoSQL database company
!   Key features of its database include ACID transactions,
horizontal scaling, real-time indexing, high availability,
disaster recovery, and government-grade security
!   Its platform provides full-text query and search capabilities,
application services and big data analytics
MarkLogic
Twitter Tag: #briefr The Briefing Room
David Gorbet
David Gorbet is Vice President of Engineering for
MarkLogic, where he also runs the Support
organization. Gorbet brings two decades of
experience delivering some of the highest-volume
applications and enterprise software in the world.
Prior to MarkLogic, Gorbet helped pioneer
Microsoft’s business online services strategy by
founding and leading the SharePoint Online
team. Gorbet holds a Bachelor of Applied
Science degree in Systems Design Engineering
with an additional major in Psychology from the
University of Waterloo, and an MBA from the
University of Washington Foster School of
Business.
MarkLogic: What it is, how it works
David Gorbet, VP Engineering
Slide 2 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
WE ARE THE
NEW GENERATION
DATABASE
Any Structure Era
“For all your data!”
• Schema-agnostic
• Massive scale
• Query and search
• Analytics
• Application services
• Faster time-to-results
Relational Era
“For all your structured
data!”
• Normalized, tabular
model
• Application-
independent query
• User control
Hierarchical Era
For your application
data!
• Application- and
hardware-specific
Slide 3 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Real Value From Big Data
Make The World More Secure
Provide Access To Valuable Information
Create New Revenue Streams
Gain Insights to Increase Market Share
Reduce Bottom Line Expense
Slide 4 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
The MarkLogic Advantage
Only Enterprise NoSQL Database
 ACID compliant
 Big data search
 High availability
 Replication
 Point in-time recovery
 Government-grade security
 Real-time your Hadoop
 Proven customer success
Slide 5 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
How Does It Work?
Schema-agnostic design
Real-time indexing and query
Event processing and alerting
Scale-out shared-nothing cluster topology
Analytics and Visualization
High availability and disaster recovery
Slide 6 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Hierarchical Data Model
 MarkLogic Server is a document-centric database
 Supports any-structured data via hierarchical data model
Document
Title
Author
Section
Section Section Section Section
First
Last
Metadata
Trade
Cashflows
Party
Identifier
Net
Payment
Payment
Date
Party
Reference
Payer
Party
trade
ID
Payment
Amount
Receiver
Party
Slide 7 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
MarkLogic is Schema Agnostic
JSON and XML are self-describing
<article>
<title>MarkLogic Server:… </title>
<author>
<first-name>John</first-name>
<last-name>Doe</last-name>
</author>
<abstract>
. . . .<company>MarkLogic</company>. . . .
</abstract>
<body>
<section>
<section>. . . .</section>
</section>
<section>…index…</section>
</body>
<copyright>Copyright © … </copyright>
</article>
Slide 8 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
MarkLogic is Schema Agnostic
JSON and XML are self-describing
<article>
<title>
MarkLogic Server:…
<author>
<first-name>
John
<last-name>
Doe
<abstract>
. . . .
<company>
MarkLogic
. . . .
<body>
<section>
<section>
. . . .
<section>
…index…
<copyright>
Copyright © …
Slide 9 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
“brown” 123, 125, 129, 152, 344, 491, …
“mice” 123, 125, 126, 129, 130, 152, …
“brown mice” 125, 152, 516, 522, 765, 890, …
STEM “mouse” 123, 125, 126, 129, 130, 152, …
STEM “brown mouse” 125, 152, 516, 522, 765, 890, …
<article> …
<article>/<abstract> …
<section>/<paragraph> …
<animal>mouse</animal> …
<year>1950</year> …
Collection:Draft …
Role:Editor + Action:Read …
… …
… …
… …
Universal Index
Term Term List
MarkLogic indexes…
 Words
 Phrases
 Stemming
 Structure
 Values
 Collections
 Security Permissions
Document
References
125, 516, 890, …
Which draft articles contain the phrase brown mice?
Slide 10 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
“brown” 123, 125, 129, 152, 344, 491, …
“mice” 123, 125, 126, 129, 130, 152, …
“brown mice” 125, 152, 516, 522, 765, 890, …
STEM “mouse” 123, 125, 126, 129, 130, 152, …
STEM “brown mouse” 125, 152, 516, 522, 765, 890, …
<article> …
<article>/<abstract> …
<section>/<paragraph> …
<animal>mouse</animal> …
<year>1950</year> …
Collection:Draft …
Role:Editor + Action:Read …
… …
… …
… …
Scalar Queries
Term Term List Document
References
125, 516, 890, …
Which draft articles that contain the phrase brown mice were written before 2010?
Slide 11 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Range Indexes
Value ID
2002 3
2003 10
2004 5
2004 11
2007 4
2007 17
2009 1
2011 8
… …
… …
… …
ID Value
1 2009
3 2002
4 2007
5 2004
8 2011
10 2003
11 2004
17 2007
… …
… …
… …
Map document IDs to
values, and vice-versa in
a compact in-memory
representation
Slide 12 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Geospatial Index:
A 2-Dimensional Range Index
Fully composable with all other indexes!
 Built-in support for:
 Point
 Box
 Circle
 Polygon
 Complex Polygon
 Polygon Intersection
 Polygon Containment
Slide 13 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Reverse Indexes (Alerting)
1. Load serialized queries as query documents
2. For a given data document, find all queries that match
 Can provide real-time alerts during loads
 With no significant performance impact!
 Can let documents store values as "ranges"
 Documents about cities self-defining their geo boundaries
 Person documents defining birthdays as ranges, sequences
 Can power classifiers and "matchmaker" queries
Slide 14 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Value ID
2002 3
2003 10
2004 5
2004 11
2007 4
2007 17
2009 1
2011 8
… …
… …
… …
ID Value
1 2009
3 2002
4 2007
5 2004
8 2011
10 2003
11 2004
17 2007
… …
… …
… …
Range Indexes
Map document IDs to
values, and vice-versa in
a compact in-memory
representation
Range Indexes work like
a built-in in-memory
column store
Slide 15 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Facets and Aggregation
Slide 16 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Interactive Visualization
Slide 17 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
In-database Analytic Functions
Leverage ready-made
analytic built-ins for
commonly-used numeric
applications
 Variance
 Covariance
 Correlation
 Standard deviation
 Linear model
 Median
 Mode
 Percentile
 Rank
 Percent-rank
Benefits
 Faster analytics-based application
development
 Supports more users & more data
 Eliminates costs associated with
writing custom code
Slide 18 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
User-defined Functions
class InfluenceRank : public AggregateUDF
{
public:
struct Value {
double sum, sum_sq, count;
Value() : sum(0), sum_sq(0), count(0) {}
} value;
public:
AggregateUDF* clone() const { return new InfluenceRank (*this); }
void close() { delete this; }
void start(Sequence&, Reporter&) {}
void finish(OutputSequence& os, Reporter& reporter);
void map(TupleIterator& values, Reporter& reporter);
void reduce(const AggregateUDF* _o, Reporter& reporter);
void encode(Encoder& e, Reporter& reporter);
void decode(Decoder& d, Reporter& reporter);
};
Slide 19 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
• • •
• • •
In-database MapReduce
start
encode
decode
reduce
finish
decode
map
reduce
encode
Slide 20 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
SQL and BI Tools
ODBC
SQL
Range Indexes
Slide 21 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
SQL and BI Tools
Slide 22 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
HA/DR Features of MarkLogic
Slide 23 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
MarkLogic 6
Flexible
Indexes
Full Text
Search
Schema-
Agnostic
Scalable
Analytic
Functions
Hadoop
Distribution
Alerting
& Event
Processing
Geospatial
Query
In-
database
MapReduce
Visualization
Widgets
Transactions
Role-based
Security
Automated
Failover
Replication Journal
Archiving
Point-in-
time
Recovery
Database
Rollback
Backup/
Restore
Distributed
Transactions
Super-
clusters
Powerful
Everything you
need to deliver
business value
Trusted
Enterprise-
ready for
mission-critical
apps
REST &
Java APIs
JSON
Storage
Application
Builder
Information
Studio
Hadoop
Connector
Content
Pump
BI
Integration
SQL
Support
Monitoring
&
Management
OS
Support
Accessible
Leverage existing
tools, knowledge,
skills
Slide 24 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Any Questions?
Slide 25 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
What is Semantics Technology?
Slide 26 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Elasticity
 New tools to characterize and monitor the
resource requirements of your applications and
loads.
 Dynamic provisioning system that can add or
subtract resources on-the-fly to match the
loads.
 Distributed & virtualized environments including
VMWare, Amazon AWS and Hadoop are
supported to scale-out.
 Make the cloud a first-class citizen: Use Hadoop
HDFS or Amazon S3 for backup
Aligning infrastructure + demand, continually
Slide 27 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Tiered storage
ML
SSD
local
HDFS
amzn s3
Benefits
 Keep data on tiers appropriate to
access needs = lower costs
 Detach and reattach storage when
needed. Fewer compute nodes
required = lower costs
 Leverage Hadoop HDFS investment
Choose infrastructure based on
value of data stored.
 100% online with different tiers
at different SLAs/topologies
 On-line/near-line mix utilizing
mount on-demand and
dynamic node spin-up.
Tiered Storage New Constructs
• Range partitions by Date/Scalar
manage group of forests by
range (“Q1” or “1990-1995”)
• Super Databases federate
queries across multiple
databases
Slide 28 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Tiered Storage
96 504 1,044
592 2,066 2,080
Total Size (TB)
Total Cost ($000)
Operational
$25
Effective Unit Cost ($/GB)
$4
Compliance
$1.50
Analytic
Twitter Tag: #briefr The Briefing Room
Perceptions & Questions
Analyst:
Robin Bloor
The Bloor Group
The Bloor Group
Database Innovation
Database used to be a “zero-innovation market.”
Now it is the opposite.
Traditional (relational)
database is now seen
(rightly) as inadequate
in many respects
Big Data is, mainly, new
data posing new
problems
New products are
emerging and some
older products are
being given a make-over
(and gaining popularity)
Hadoop has changed
perceptions and
thinking about database
The Bloor Group
Multiple Database Roles
HAVE INCREASED SIGNIFICANTLY…
The Bloor Group
The Analytics Issue
The Bloor Group
The Origin of Big Data
The Bloor Group
NoSQL Confusion
As the graph indicates
NoSQL is a very
confusing descriptor.
WHAT CAN A GIVEN
DATABASE ACTUALLY
DO?
The important question is
The Bloor Group
The Joys and Sorrows of SQL
SQL:
Very good for set manipulation
Works for OLTP and many query
environments
Not good for nested data structures
(documents, web pages, etc.)
Not good for ordered data sets
Not good for data graphs (networks of
values)
The Bloor Group
!   In my view we have reached a situation where
there will be multiple “data engines.” Is that
MarkLogic’s view?
!   Specifically, are there data structures or
database contexts for which MarkLogic is
inappropriate?
!   What new features or capabilities are on the
MarkLogic roadmap?
!   In your view, is the “age of the data
warehouse” over?
The Bloor Group
!   Which sectors/businesses are currently in
MarkLogic’s “sweet spot”?
!   Data analytics involves much more than having
analytical functions in the database. It is more
than 50% data prep (merging, cleansing, joining,
transformation, etc.). How does MarkLogic
accommodate that?
!   What is MarkLogic’s attitude to the cloud?
Specifically, where would it recommend cloud
deployment?
Twitter Tag: #briefr The Briefing Room
Twitter Tag: #briefr The Briefing Room
July: CLOUD
August: HIGH PERFORMANCE ANALYTICS
September: ANALYTICS
Upcoming Topics
www.insideanalysis.com
Twitter Tag: #briefr The Briefing Room
Thank You
for Your
Attention

More Related Content

Similar to As You Seek – How Search Enables Big Data Analytics (20)

PPTX
Big Data on Azure Tutorial
rustd
 
PDF
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
confluent
 
PPTX
Splunk live london_grs
jenny_splunk
 
PDF
Databases, CAP, ACID, BASE, NoSQL... oh my!
DATAVERSITY
 
PDF
High-performance database technology for rock-solid IoT solutions
Clusterpoint
 
PPTX
Azure Data Explorer deep dive - review 04.2020
Riccardo Zamana
 
PPTX
Webinar: Scaling MongoDB
MongoDB
 
PPTX
Segment 002 of Reply - Innovation Tour 2025.pptx
jasonliu2002
 
PPTX
Log I am your father
DataWorks Summit/Hadoop Summit
 
PPTX
Big Data: It’s all about the Use Cases
James Serra
 
PDF
The New Database Frontier: Harnessing the Cloud
Inside Analysis
 
PDF
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
James Anderson
 
PDF
Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...
Mark Tabladillo
 
PDF
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
Karen Thompson
 
PDF
[「RDB技術者のためのNoSQLガイド」出版記念セミナー] Azure DocumentDB
Naoki (Neo) SATO
 
PDF
Data Virtualization: From Zero to Hero
Denodo
 
PDF
Microsoft .NET Portfolio
Enterra
 
PPTX
Introduction to Azure DocumentDB
Denny Lee
 
PDF
The Trinity in Exponential Technologies: Open Source, Blockchain and Microsof...
Juarez Junior
 
PPT
Making Hadoop Ready for the Enterprise
DataWorks Summit
 
Big Data on Azure Tutorial
rustd
 
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
confluent
 
Splunk live london_grs
jenny_splunk
 
Databases, CAP, ACID, BASE, NoSQL... oh my!
DATAVERSITY
 
High-performance database technology for rock-solid IoT solutions
Clusterpoint
 
Azure Data Explorer deep dive - review 04.2020
Riccardo Zamana
 
Webinar: Scaling MongoDB
MongoDB
 
Segment 002 of Reply - Innovation Tour 2025.pptx
jasonliu2002
 
Log I am your father
DataWorks Summit/Hadoop Summit
 
Big Data: It’s all about the Use Cases
James Serra
 
The New Database Frontier: Harnessing the Cloud
Inside Analysis
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
James Anderson
 
Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...
Mark Tabladillo
 
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
Karen Thompson
 
[「RDB技術者のためのNoSQLガイド」出版記念セミナー] Azure DocumentDB
Naoki (Neo) SATO
 
Data Virtualization: From Zero to Hero
Denodo
 
Microsoft .NET Portfolio
Enterra
 
Introduction to Azure DocumentDB
Denny Lee
 
The Trinity in Exponential Technologies: Open Source, Blockchain and Microsof...
Juarez Junior
 
Making Hadoop Ready for the Enterprise
DataWorks Summit
 

More from Inside Analysis (20)

PDF
An Ounce of Prevention: Forging Healthy BI
Inside Analysis
 
PDF
Agile, Automated, Aware: How to Model for Success
Inside Analysis
 
PDF
First in Class: Optimizing the Data Lake for Tighter Integration
Inside Analysis
 
PDF
Fit For Purpose: Preventing a Big Data Letdown
Inside Analysis
 
PDF
To Serve and Protect: Making Sense of Hadoop Security
Inside Analysis
 
PDF
The Hadoop Guarantee: Keeping Analytics Running On Time
Inside Analysis
 
PDF
Introducing: A Complete Algebra of Data
Inside Analysis
 
PDF
The Role of Data Wrangling in Driving Hadoop Adoption
Inside Analysis
 
PDF
Ahead of the Stream: How to Future-Proof Real-Time Analytics
Inside Analysis
 
PDF
All Together Now: Connected Analytics for the Internet of Everything
Inside Analysis
 
PDF
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Inside Analysis
 
PDF
The Biggest Picture: Situational Awareness on a Global Level
Inside Analysis
 
PDF
Structurally Sound: How to Tame Your Architecture
Inside Analysis
 
PDF
SQL In Hadoop: Big Data Innovation Without the Risk
Inside Analysis
 
PDF
The Perfect Fit: Scalable Graph for Big Data
Inside Analysis
 
PDF
A Revolutionary Approach to Modernizing the Data Warehouse
Inside Analysis
 
PDF
The Maturity Model: Taking the Growing Pains Out of Hadoop
Inside Analysis
 
PDF
Rethinking Data Availability and Governance in a Mobile World
Inside Analysis
 
PDF
DisrupTech - Dave Duggal
Inside Analysis
 
PPTX
Modus Operandi
Inside Analysis
 
An Ounce of Prevention: Forging Healthy BI
Inside Analysis
 
Agile, Automated, Aware: How to Model for Success
Inside Analysis
 
First in Class: Optimizing the Data Lake for Tighter Integration
Inside Analysis
 
Fit For Purpose: Preventing a Big Data Letdown
Inside Analysis
 
To Serve and Protect: Making Sense of Hadoop Security
Inside Analysis
 
The Hadoop Guarantee: Keeping Analytics Running On Time
Inside Analysis
 
Introducing: A Complete Algebra of Data
Inside Analysis
 
The Role of Data Wrangling in Driving Hadoop Adoption
Inside Analysis
 
Ahead of the Stream: How to Future-Proof Real-Time Analytics
Inside Analysis
 
All Together Now: Connected Analytics for the Internet of Everything
Inside Analysis
 
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Inside Analysis
 
The Biggest Picture: Situational Awareness on a Global Level
Inside Analysis
 
Structurally Sound: How to Tame Your Architecture
Inside Analysis
 
SQL In Hadoop: Big Data Innovation Without the Risk
Inside Analysis
 
The Perfect Fit: Scalable Graph for Big Data
Inside Analysis
 
A Revolutionary Approach to Modernizing the Data Warehouse
Inside Analysis
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
Inside Analysis
 
Rethinking Data Availability and Governance in a Mobile World
Inside Analysis
 
DisrupTech - Dave Duggal
Inside Analysis
 
Modus Operandi
Inside Analysis
 
Ad

Recently uploaded (20)

PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PPTX
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
PDF
The Future of Artificial Intelligence (AI)
Mukul
 
PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
PDF
Build with AI and GDG Cloud Bydgoszcz- ADK .pdf
jaroslawgajewski1
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PPTX
AVL ( audio, visuals or led ), technology.
Rajeshwri Panchal
 
PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
PDF
Market Insight : ETH Dominance Returns
CIFDAQ
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PDF
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PPTX
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
PPTX
The Future of AI & Machine Learning.pptx
pritsen4700
 
PDF
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PDF
Per Axbom: The spectacular lies of maps
Nexer Digital
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PPTX
Farrell_Programming Logic and Design slides_10e_ch02_PowerPoint.pptx
bashnahara11
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
The Future of Artificial Intelligence (AI)
Mukul
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
Build with AI and GDG Cloud Bydgoszcz- ADK .pdf
jaroslawgajewski1
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
AVL ( audio, visuals or led ), technology.
Rajeshwri Panchal
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
Market Insight : ETH Dominance Returns
CIFDAQ
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
The Future of AI & Machine Learning.pptx
pritsen4700
 
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
Per Axbom: The spectacular lies of maps
Nexer Digital
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
Farrell_Programming Logic and Design slides_10e_ch02_PowerPoint.pptx
bashnahara11
 
Ad

As You Seek – How Search Enables Big Data Analytics

  • 1. The Briefing Room As You Seek—How Search Enables Big Data Analytics
  • 2. Twitter Tag: #briefr The Briefing Room Welcome Host: Eric Kavanagh [email protected]
  • 3. Twitter Tag: #briefr The Briefing Room !   Reveal the essential characteristics of enterprise software, good and bad !   Provide a forum for detailed analysis of today s innovative technologies !   Give vendors a chance to explain their product to savvy analysts !   Allow audience members to pose serious questions... and get answers! Mission
  • 4. Twitter Tag: #briefr The Briefing Room JUNE: Database July: CLOUD August: HIGH PERFORMANCE ANALYTICS September: ANALYTICS
  • 5. Twitter Tag: #briefr The Briefing Room Database Better SEARCH Faster INSIGHT
  • 6. Twitter Tag: #briefr The Briefing Room Analyst: Robin Bloor Robin Bloor is Chief Analyst at The Bloor Group [email protected]
  • 7. Twitter Tag: #briefr The Briefing Room ! MarkLogic is an enterprise-class NoSQL database company !   Key features of its database include ACID transactions, horizontal scaling, real-time indexing, high availability, disaster recovery, and government-grade security !   Its platform provides full-text query and search capabilities, application services and big data analytics MarkLogic
  • 8. Twitter Tag: #briefr The Briefing Room David Gorbet David Gorbet is Vice President of Engineering for MarkLogic, where he also runs the Support organization. Gorbet brings two decades of experience delivering some of the highest-volume applications and enterprise software in the world. Prior to MarkLogic, Gorbet helped pioneer Microsoft’s business online services strategy by founding and leading the SharePoint Online team. Gorbet holds a Bachelor of Applied Science degree in Systems Design Engineering with an additional major in Psychology from the University of Waterloo, and an MBA from the University of Washington Foster School of Business.
  • 9. MarkLogic: What it is, how it works David Gorbet, VP Engineering
  • 10. Slide 2 Copyright © 2013 MarkLogic® Corporation. All rights reserved. WE ARE THE NEW GENERATION DATABASE Any Structure Era “For all your data!” • Schema-agnostic • Massive scale • Query and search • Analytics • Application services • Faster time-to-results Relational Era “For all your structured data!” • Normalized, tabular model • Application- independent query • User control Hierarchical Era For your application data! • Application- and hardware-specific
  • 11. Slide 3 Copyright © 2013 MarkLogic® Corporation. All rights reserved. Real Value From Big Data Make The World More Secure Provide Access To Valuable Information Create New Revenue Streams Gain Insights to Increase Market Share Reduce Bottom Line Expense
  • 12. Slide 4 Copyright © 2013 MarkLogic® Corporation. All rights reserved. The MarkLogic Advantage Only Enterprise NoSQL Database  ACID compliant  Big data search  High availability  Replication  Point in-time recovery  Government-grade security  Real-time your Hadoop  Proven customer success
  • 13. Slide 5 Copyright © 2013 MarkLogic® Corporation. All rights reserved. How Does It Work? Schema-agnostic design Real-time indexing and query Event processing and alerting Scale-out shared-nothing cluster topology Analytics and Visualization High availability and disaster recovery
  • 14. Slide 6 Copyright © 2013 MarkLogic® Corporation. All rights reserved. Hierarchical Data Model  MarkLogic Server is a document-centric database  Supports any-structured data via hierarchical data model Document Title Author Section Section Section Section Section First Last Metadata Trade Cashflows Party Identifier Net Payment Payment Date Party Reference Payer Party trade ID Payment Amount Receiver Party
  • 15. Slide 7 Copyright © 2013 MarkLogic® Corporation. All rights reserved. MarkLogic is Schema Agnostic JSON and XML are self-describing <article> <title>MarkLogic Server:… </title> <author> <first-name>John</first-name> <last-name>Doe</last-name> </author> <abstract> . . . .<company>MarkLogic</company>. . . . </abstract> <body> <section> <section>. . . .</section> </section> <section>…index…</section> </body> <copyright>Copyright © … </copyright> </article>
  • 16. Slide 8 Copyright © 2013 MarkLogic® Corporation. All rights reserved. MarkLogic is Schema Agnostic JSON and XML are self-describing <article> <title> MarkLogic Server:… <author> <first-name> John <last-name> Doe <abstract> . . . . <company> MarkLogic . . . . <body> <section> <section> . . . . <section> …index… <copyright> Copyright © …
  • 17. Slide 9 Copyright © 2013 MarkLogic® Corporation. All rights reserved. “brown” 123, 125, 129, 152, 344, 491, … “mice” 123, 125, 126, 129, 130, 152, … “brown mice” 125, 152, 516, 522, 765, 890, … STEM “mouse” 123, 125, 126, 129, 130, 152, … STEM “brown mouse” 125, 152, 516, 522, 765, 890, … <article> … <article>/<abstract> … <section>/<paragraph> … <animal>mouse</animal> … <year>1950</year> … Collection:Draft … Role:Editor + Action:Read … … … … … … … Universal Index Term Term List MarkLogic indexes…  Words  Phrases  Stemming  Structure  Values  Collections  Security Permissions Document References 125, 516, 890, … Which draft articles contain the phrase brown mice?
  • 18. Slide 10 Copyright © 2013 MarkLogic® Corporation. All rights reserved. “brown” 123, 125, 129, 152, 344, 491, … “mice” 123, 125, 126, 129, 130, 152, … “brown mice” 125, 152, 516, 522, 765, 890, … STEM “mouse” 123, 125, 126, 129, 130, 152, … STEM “brown mouse” 125, 152, 516, 522, 765, 890, … <article> … <article>/<abstract> … <section>/<paragraph> … <animal>mouse</animal> … <year>1950</year> … Collection:Draft … Role:Editor + Action:Read … … … … … … … Scalar Queries Term Term List Document References 125, 516, 890, … Which draft articles that contain the phrase brown mice were written before 2010?
  • 19. Slide 11 Copyright © 2013 MarkLogic® Corporation. All rights reserved. Range Indexes Value ID 2002 3 2003 10 2004 5 2004 11 2007 4 2007 17 2009 1 2011 8 … … … … … … ID Value 1 2009 3 2002 4 2007 5 2004 8 2011 10 2003 11 2004 17 2007 … … … … … … Map document IDs to values, and vice-versa in a compact in-memory representation
  • 20. Slide 12 Copyright © 2013 MarkLogic® Corporation. All rights reserved. Geospatial Index: A 2-Dimensional Range Index Fully composable with all other indexes!  Built-in support for:  Point  Box  Circle  Polygon  Complex Polygon  Polygon Intersection  Polygon Containment
  • 21. Slide 13 Copyright © 2013 MarkLogic® Corporation. All rights reserved. Reverse Indexes (Alerting) 1. Load serialized queries as query documents 2. For a given data document, find all queries that match  Can provide real-time alerts during loads  With no significant performance impact!  Can let documents store values as "ranges"  Documents about cities self-defining their geo boundaries  Person documents defining birthdays as ranges, sequences  Can power classifiers and "matchmaker" queries
  • 22. Slide 14 Copyright © 2013 MarkLogic® Corporation. All rights reserved. Value ID 2002 3 2003 10 2004 5 2004 11 2007 4 2007 17 2009 1 2011 8 … … … … … … ID Value 1 2009 3 2002 4 2007 5 2004 8 2011 10 2003 11 2004 17 2007 … … … … … … Range Indexes Map document IDs to values, and vice-versa in a compact in-memory representation Range Indexes work like a built-in in-memory column store
  • 23. Slide 15 Copyright © 2013 MarkLogic® Corporation. All rights reserved. Facets and Aggregation
  • 24. Slide 16 Copyright © 2013 MarkLogic® Corporation. All rights reserved. Interactive Visualization
  • 25. Slide 17 Copyright © 2013 MarkLogic® Corporation. All rights reserved. In-database Analytic Functions Leverage ready-made analytic built-ins for commonly-used numeric applications  Variance  Covariance  Correlation  Standard deviation  Linear model  Median  Mode  Percentile  Rank  Percent-rank Benefits  Faster analytics-based application development  Supports more users & more data  Eliminates costs associated with writing custom code
  • 26. Slide 18 Copyright © 2013 MarkLogic® Corporation. All rights reserved. User-defined Functions class InfluenceRank : public AggregateUDF { public: struct Value { double sum, sum_sq, count; Value() : sum(0), sum_sq(0), count(0) {} } value; public: AggregateUDF* clone() const { return new InfluenceRank (*this); } void close() { delete this; } void start(Sequence&, Reporter&) {} void finish(OutputSequence& os, Reporter& reporter); void map(TupleIterator& values, Reporter& reporter); void reduce(const AggregateUDF* _o, Reporter& reporter); void encode(Encoder& e, Reporter& reporter); void decode(Decoder& d, Reporter& reporter); };
  • 27. Slide 19 Copyright © 2013 MarkLogic® Corporation. All rights reserved. • • • • • • In-database MapReduce start encode decode reduce finish decode map reduce encode
  • 28. Slide 20 Copyright © 2013 MarkLogic® Corporation. All rights reserved. SQL and BI Tools ODBC SQL Range Indexes
  • 29. Slide 21 Copyright © 2013 MarkLogic® Corporation. All rights reserved. SQL and BI Tools
  • 30. Slide 22 Copyright © 2013 MarkLogic® Corporation. All rights reserved. HA/DR Features of MarkLogic
  • 31. Slide 23 Copyright © 2013 MarkLogic® Corporation. All rights reserved. MarkLogic 6 Flexible Indexes Full Text Search Schema- Agnostic Scalable Analytic Functions Hadoop Distribution Alerting & Event Processing Geospatial Query In- database MapReduce Visualization Widgets Transactions Role-based Security Automated Failover Replication Journal Archiving Point-in- time Recovery Database Rollback Backup/ Restore Distributed Transactions Super- clusters Powerful Everything you need to deliver business value Trusted Enterprise- ready for mission-critical apps REST & Java APIs JSON Storage Application Builder Information Studio Hadoop Connector Content Pump BI Integration SQL Support Monitoring & Management OS Support Accessible Leverage existing tools, knowledge, skills
  • 32. Slide 24 Copyright © 2013 MarkLogic® Corporation. All rights reserved. Any Questions?
  • 33. Slide 25 Copyright © 2013 MarkLogic® Corporation. All rights reserved. What is Semantics Technology?
  • 34. Slide 26 Copyright © 2013 MarkLogic® Corporation. All rights reserved. Elasticity  New tools to characterize and monitor the resource requirements of your applications and loads.  Dynamic provisioning system that can add or subtract resources on-the-fly to match the loads.  Distributed & virtualized environments including VMWare, Amazon AWS and Hadoop are supported to scale-out.  Make the cloud a first-class citizen: Use Hadoop HDFS or Amazon S3 for backup Aligning infrastructure + demand, continually
  • 35. Slide 27 Copyright © 2013 MarkLogic® Corporation. All rights reserved. Tiered storage ML SSD local HDFS amzn s3 Benefits  Keep data on tiers appropriate to access needs = lower costs  Detach and reattach storage when needed. Fewer compute nodes required = lower costs  Leverage Hadoop HDFS investment Choose infrastructure based on value of data stored.  100% online with different tiers at different SLAs/topologies  On-line/near-line mix utilizing mount on-demand and dynamic node spin-up. Tiered Storage New Constructs • Range partitions by Date/Scalar manage group of forests by range (“Q1” or “1990-1995”) • Super Databases federate queries across multiple databases
  • 36. Slide 28 Copyright © 2013 MarkLogic® Corporation. All rights reserved. Tiered Storage 96 504 1,044 592 2,066 2,080 Total Size (TB) Total Cost ($000) Operational $25 Effective Unit Cost ($/GB) $4 Compliance $1.50 Analytic
  • 37. Twitter Tag: #briefr The Briefing Room Perceptions & Questions Analyst: Robin Bloor
  • 39. The Bloor Group Database Innovation Database used to be a “zero-innovation market.” Now it is the opposite. Traditional (relational) database is now seen (rightly) as inadequate in many respects Big Data is, mainly, new data posing new problems New products are emerging and some older products are being given a make-over (and gaining popularity) Hadoop has changed perceptions and thinking about database
  • 40. The Bloor Group Multiple Database Roles HAVE INCREASED SIGNIFICANTLY…
  • 41. The Bloor Group The Analytics Issue
  • 42. The Bloor Group The Origin of Big Data
  • 43. The Bloor Group NoSQL Confusion As the graph indicates NoSQL is a very confusing descriptor. WHAT CAN A GIVEN DATABASE ACTUALLY DO? The important question is
  • 44. The Bloor Group The Joys and Sorrows of SQL SQL: Very good for set manipulation Works for OLTP and many query environments Not good for nested data structures (documents, web pages, etc.) Not good for ordered data sets Not good for data graphs (networks of values)
  • 45. The Bloor Group !   In my view we have reached a situation where there will be multiple “data engines.” Is that MarkLogic’s view? !   Specifically, are there data structures or database contexts for which MarkLogic is inappropriate? !   What new features or capabilities are on the MarkLogic roadmap? !   In your view, is the “age of the data warehouse” over?
  • 46. The Bloor Group !   Which sectors/businesses are currently in MarkLogic’s “sweet spot”? !   Data analytics involves much more than having analytical functions in the database. It is more than 50% data prep (merging, cleansing, joining, transformation, etc.). How does MarkLogic accommodate that? !   What is MarkLogic’s attitude to the cloud? Specifically, where would it recommend cloud deployment?
  • 47. Twitter Tag: #briefr The Briefing Room
  • 48. Twitter Tag: #briefr The Briefing Room July: CLOUD August: HIGH PERFORMANCE ANALYTICS September: ANALYTICS Upcoming Topics www.insideanalysis.com
  • 49. Twitter Tag: #briefr The Briefing Room Thank You for Your Attention