SlideShare a Scribd company logo
Enterprise Architect, MongoDB
Buzz Moschetti
buzz.moschetti@mongodb.com
#ConferenceHashTag
Creating a Single View Part 2:
Data Design & Loading
Strategies
Who Is Talking To You?
• Yes, I use “Buzz” on my business cards
• Former Investment Bank Chief Architect at
JPMorganChase and Bear Stearns before that
• Over 27 years of designing and building systems
• Big and small
• Super-specialized to broadly useful in any vertical
• “Traditional” to completely disruptive
• Advocate of language leverage and strong factoring
• Inventor of perl DBI/DBD
• Still programming – using emacs, of course
What Is He Going To Talk About?
Historic Challenges
New Strategy for Success
Technical examples and tips
Overview &
Data Analysis
Data Design &
Loading
Strategies
Securing Your
Deployment
ç
Ω
Creating A Single View
Part
1
Part
2
Part
3
Historic Challenges
It’s 2014: Why is this still hard to
do?
• Business / Technical / Information Challenges
• Missteps in evolution of data transfer technology
A X
We wish this “just worked”
A
Query objects from A
with great performance
Query objects from B
with great performance
X
Query objects from
merged A and B with
great performance
B
…but Beware The Blue Arrow!
A X
• Extracting many tables into many files
• Some tables require more than one file to capture representation
• Encoding/formatting clever tricks
• Reconciliation
• Different extracts for different consumers
• Different extracts for different versions of data to same consumer
Loss of fidelity exposed
class Product {
String productName;
List<Features> ff;
Date introDate;
List<Date>
versDates;
int[] unitBundles;
//…
}
widget1,,3,,good texture,retains value,,,20142304,102.3,201401
widget2,XS,6,,,,not fragile,,,20132304,73,87653
widget3,XT,,,4,,dense,shiny,mysterious,,,19990304,73,87653,,
widget4,,,3,4,,,,,,20040101,,999999,,
A
ORM
What happened to XML?
class Product {
String productName;
List<Features> ff;
Date introDate;
List<Date>
versDates;
int[] unitBundles;
//…
}
<product>
<name>widget1</name>
<features>
<feature>
<text>good texture</text>
<type>A</type>
</feature>
</features>
<introDate>20140204</introDate>
<versDates>
<versDate>20100103</versDate>
<versDate>20100601</versDate>
</versDates>
<unitBundles>1,3,9</unitBun…
ç
Ω
XML: Created More Issues Than
Solved
<product>
<name>widget1</name>
<features>
<feature>
<text>good texture</text>
<type>A</type>
</feature>
</features>
<introDate>20140204</introDate>
<versDates>
<versDate>20100103</versDate>
<versDate>20100601</versDate>
</versDates>
<unitBundles>1,3,9</unitBun…
• No native handling of
arrays
• Attribute vs. nested tag
rules/conventions widely
variable
• Generic parsing (DOM)
yields a tree of Nodes of
Strings – not very friendly
• SAX is fast but too low
level
… and it eventually became this
<p name=“widget1” ftxt1=“good texture” ftyp1=“A” idt=“20140203” …
<p name=“widget2” ftxt1=“not fragile” ftyp1=“A” idt=“20110117” …
<p name=“widget3” ftxt1=“dense” idt=“20140203” …
<p name=“widget4” idt=“20140203” versD=“20130403,20130104,20100605” …
• Short, cryptic, conflated tag names
• Everything is a string attribute
• Mix of flattened arrays and delimited strings
• Irony: org.xml.sax.Attributes easier to deal with than rest of
DOM
Schema Change Challenges:
Multiplied & Concentrated!
X
Alter table(s)
split() more data
A
Alter table(s)
Extract more data
LOE = x1
Alter table(s)
split() more data
Alter table(s)
split() more data
B
Alter table(s)
Extract more
data
LOE = x2
C
Alter table(s)
Extract more
data
LOE = x3
LOE = xn
1
n
å + f (n)
where f() is nonlinear wrt n
SLAs & Security: Tough to
Combine
A
B
User 1 entitled to see X
User 2 entitled to see Y
User 1 entitled to see Z
User 2 entitled to see V
X
Entitlements managed per-
system/per-application here….
…are lost in the
low-fidelity transfer
of data….
…and have to be
reconstituted here
…somehow…
Solving The Problem with
mongoDB
What We Are Building Today
Overall Strategy For Success
• Let the source systems entities drive the
data design, not the physical database
• Capture data in full fidelity
• Perform cross-ref and additional logic at the
single point of view
Don’t forget the power of the API
class Product {
String productName;
List<Features> ff;
Date introDate;
List<Date> versDates;
int[] unitBundles;
//…
}
If you can, avoid files altogether!
Haskell
ç
Ω
But if you are creating files: emit
JSON
class Product {
String productName;
List<Features> ff;
Date introDate;
List<Date> versDates;
int[] unitBundles;
//…
}
{
“name”: “widget1”,
“features”: [
{ “text”: “good texture”,
“type”: “A” }
],
“introDate”: “20140204”,
“versDates”: [
“20100103”, “20100601”
],
“unitBundles”: [1,3,7,9]
// …
}
ç
Ω
Let The Feeding System Express
itself
A
B
C
{ “name”: “widget1”,
“features”: [
{ “text”: “good texture”,
“type”: “A” }
]
}
{ “myColors”: [“red”,”blue”],
“myFloats”: [ 3.14159, 2.71828 ],
“nest”: { “as”: { “deep”: true }}}
}
{ “myBlob”: { “$binary”: “aGVsbG8K”},
“myDate”: { “$date”: “20130405” }
}
What if you forgot something?
{
“name”: “widget1”,
“features”: [
{ “text”: “good texture”,
“type”: “A” }
],
“introDate”: “20140204”,
“versDates”: [
“20100103”, “20100601”
],
“versMinorNum”: [1,3,7,9]
// …
}
{
“name”: “widget1”,
“features”: [
{ “text”: “good texture”,
“type”: “A” }
],
“coverage”: [ “NY”, “NJ” ],
“introDate”: “20140204”,
“versDates”: [
“20100103”, “20100601”
],
“versMinorNum”: [1,3,7,9]
// …
}
ç
Ω
The Joy (and value) of mongoDB
A
Alter table(s)
Extract more
data
LOE = .25x1
B
Alter table(s)
Extract more data
LOE = .25x2
C
Alter table(s)
Extract more data
LOE = .25x3
LOE =O(1)
Helpful Hint: Use the APIs
curs.execute("select A.did, A.fullname, B.number from contact A
left outer join phones B on A.did = B.did order by A.did")
for q in curs.fetchall():
if q[0] != lastDID:
if lastDID != None:
coll.insert(contact)
contact = { "did": q[0], "name": q[1]}
lastDID = q[0]
if q[2] is not None:
if 'phones' not in contact:
contact['phones'] = []
contact['phones'].append({"number”:q[2]})
if lastDID != None:
coll.insert(contact)
{
"did" : ”D159308",
"phones" : [
{"number”: "1-666-444-3333”},
{"number”: "1-999-444-3333”},
{"number”: "1-999-444-9999”}
],
"name" : ”Buzz"
}
ç
Ω
Helpful Hint: Declare Types
Use mongoDB conventions for dates and binary data:
{“dateA”: {“$date”:“2014-05-16T09:42:57.112-0000”}}
{“dateB”: {“$date”:1400617865438}}
{“someBlob”: { "$binary" : "YmxhIGJsYSBibGE=",
"$type" : "00" }
Helpful Hint: Keep the file flexible
Use CR-delimited JSON:
{ “name”: “buzz”, “locale”: “NY”}
{ “name”: “steve”, “locale”: “UK”}
{ “name”: “john”, “locale”: “NY”}
…instead of a giant array:
records = [
{ “name”: “buzz”, “locale”: “NY”},
{ “name”: “steve”, “locale”: “UK”},
{ “name”: “john”, “locale”: “NY”},
]
Helpful Hint: Don’t be afraid of metadata
Use a version number in each document:
{ “v”: 1, “name”: “buzz”, “locale”: “NY”}
{ “v”: 1, “name”: “steve”, “locale”: “UK”}
{ “v”: 2, “name”: “john”, “region”: “NY”}
…or get fancier and use a header record:
{ “vers”: 1, “creator”: “ID”, “createDate”: …}
{ “name”: “buzz”, “locale”: “NY”}
{ “name”: “steve”, “locale”: “UK”}
{ “name”: “john”, “locale”: “NY”}
Helpful Hints: Use batch ID
{ “vers”: 1, “batchID”: “B213W”, “createDate”:…}
{ “name”: “buzz”, “locale”: “NY”}
{ “name”: “steve”, “locale”: “UK”}
{ “name”: “john”, “locale”: “NY”}
Now that we have the data…
You’re well on your way to a single view
consolidation…but first:
– Data Work
• Cross-reference important keys
• Potential scrubbing/cleansing
– Software Stack Work
You’ve Built a Great Data Asset;
leverage it!
DON’T Build This!
Giant
Glom
Of
GUI-biased
code
http://yourcompany/yourapp
Build THIS!
http://yourcompany/yourapp
Data Access Layer
Object Constructon Layer
Basic Functional Layer
Portal Functional Layer
GUI adapter Layer
Web Service Layer
Other Regular
Performance
Applications
Higher Performance
Applications
Special
Generic Applications
What Is Happening Next?
Access Control
Data Protection
Auditing
Overview &
Data Analysis
Data Design &
Loading
Strategies
ç
Ω
Creating A Single View
Part
1
Part
2
Securing Your
Deployment
Part
3
Enterprise Architect, MongoDB
Buzz Moschetti
buzz.moschetti@mongodb.com
#ConferenceHashTag
Thank You

More Related Content

What's hot (20)

PPTX
N1QL workshop: Indexing & Query turning.
Keshav Murthy
 
PDF
MongoDB .local Chicago 2019: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
PDF
Scalability and Real-time Queries with Elasticsearch
Ivo Andreev
 
PPTX
Webinar: How Banks Use MongoDB as a Tick Database
MongoDB
 
PPTX
Back to Basics 1: Thinking in documents
MongoDB
 
PDF
MongoDB Schema Design (Event: An Evening with MongoDB Houston 3/11/15)
MongoDB
 
PDF
MongoDB Meetup
Maxime Beugnet
 
PDF
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
Matias Cascallares
 
POTX
Content Management with MongoDB by Mark Helmstetter
MongoDB
 
PPT
How Retail Banks Use MongoDB
MongoDB
 
PDF
Webinar: User Data Management with MongoDB
MongoDB
 
PDF
MongoDB Schema Design (Richard Kreuter's Mongo Berlin preso)
MongoDB
 
PDF
How to survive in a BASE world
Uwe Friedrichsen
 
PPTX
MongoDB Schema Design: Practical Applications and Implications
MongoDB
 
PDF
Inferring Versioned Schemas from NoSQL Databases and its Applications
Diego Sevilla Ruiz
 
PDF
MongoDB .local Munich 2019: A Complete Methodology to Data Modeling for MongoDB
MongoDB
 
PPTX
Webinar: Best Practices for Getting Started with MongoDB
MongoDB
 
PPTX
Html5 and web technology update
Doug Domeny
 
PPTX
Webinar: Position and Trade Management with MongoDB
MongoDB
 
PDF
Introduction to CouchDB
Bogdan Sabău
 
N1QL workshop: Indexing & Query turning.
Keshav Murthy
 
MongoDB .local Chicago 2019: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
Scalability and Real-time Queries with Elasticsearch
Ivo Andreev
 
Webinar: How Banks Use MongoDB as a Tick Database
MongoDB
 
Back to Basics 1: Thinking in documents
MongoDB
 
MongoDB Schema Design (Event: An Evening with MongoDB Houston 3/11/15)
MongoDB
 
MongoDB Meetup
Maxime Beugnet
 
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
Matias Cascallares
 
Content Management with MongoDB by Mark Helmstetter
MongoDB
 
How Retail Banks Use MongoDB
MongoDB
 
Webinar: User Data Management with MongoDB
MongoDB
 
MongoDB Schema Design (Richard Kreuter's Mongo Berlin preso)
MongoDB
 
How to survive in a BASE world
Uwe Friedrichsen
 
MongoDB Schema Design: Practical Applications and Implications
MongoDB
 
Inferring Versioned Schemas from NoSQL Databases and its Applications
Diego Sevilla Ruiz
 
MongoDB .local Munich 2019: A Complete Methodology to Data Modeling for MongoDB
MongoDB
 
Webinar: Best Practices for Getting Started with MongoDB
MongoDB
 
Html5 and web technology update
Doug Domeny
 
Webinar: Position and Trade Management with MongoDB
MongoDB
 
Introduction to CouchDB
Bogdan Sabău
 

Viewers also liked (15)

PPTX
Single view with_mongo_db_(lo)
MongoDB
 
PPTX
Big Data : a 360° Overview
Juvénal CHOKOGOUE
 
PDF
Single View of the Customer
MongoDB
 
PPTX
Data Integration and Advanced Analytics for MongoDB: Blend, Enrich and Analyz...
MongoDB
 
PDF
Advanced Reporting and ETL for MongoDB: Easily Build a 360-Degree View of You...
MongoDB
 
PDF
Multi-Channel Analytics: The Answer to the "Big Data" Challenge and Key to Im...
Dr. Cedric Alford
 
PPT
Webinar: Making A Single View of the Customer Real with MongoDB
MongoDB
 
PDF
Big_data for marketing and sales
CMR WORLD TECH
 
PDF
Single View of Customer in Banking
Rajeev Krishnan
 
PPTX
Single Customer View: The Missing Piece
Retail Pro International, LLC
 
PDF
Parquet Strata/Hadoop World, New York 2013
Julien Le Dem
 
PDF
Efficient Data Storage for Analytics with Apache Parquet 2.0
Cloudera, Inc.
 
PDF
Distributed stream processing with Apache Kafka
confluent
 
PPTX
Using Big Data to Drive Customer 360
Cloudera, Inc.
 
PDF
From Customer Insights to Action
Capgemini
 
Single view with_mongo_db_(lo)
MongoDB
 
Big Data : a 360° Overview
Juvénal CHOKOGOUE
 
Single View of the Customer
MongoDB
 
Data Integration and Advanced Analytics for MongoDB: Blend, Enrich and Analyz...
MongoDB
 
Advanced Reporting and ETL for MongoDB: Easily Build a 360-Degree View of You...
MongoDB
 
Multi-Channel Analytics: The Answer to the "Big Data" Challenge and Key to Im...
Dr. Cedric Alford
 
Webinar: Making A Single View of the Customer Real with MongoDB
MongoDB
 
Big_data for marketing and sales
CMR WORLD TECH
 
Single View of Customer in Banking
Rajeev Krishnan
 
Single Customer View: The Missing Piece
Retail Pro International, LLC
 
Parquet Strata/Hadoop World, New York 2013
Julien Le Dem
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Cloudera, Inc.
 
Distributed stream processing with Apache Kafka
confluent
 
Using Big Data to Drive Customer 360
Cloudera, Inc.
 
From Customer Insights to Action
Capgemini
 
Ad

Similar to Creating a Single View Part 2: Loading Disparate Source Data and Creating a Single Enterprise-Wide View (20)

PPTX
Super spike
Michael Falanga
 
PPTX
Schema design mongo_boston
MongoDB
 
PPTX
Jumpstart: Introduction to Schema Design
MongoDB
 
PDF
Schema Design
MongoDB
 
PPTX
Crafting Evolvable Api Responses
darrelmiller71
 
PDF
How to Get Started with Your MongoDB Pilot Project
DATAVERSITY
 
PPTX
Schema Design
MongoDB
 
KEY
mongoDB at Visibiz
Mike Brocious
 
PDF
Data_Modeling_MongoDB.pdf
jill734733
 
PDF
Schema Design
MongoDB
 
PPTX
Mongo - an intermediate introduction
nklmish
 
PPTX
[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features
Andrew Liu
 
PDF
Mongo db data-models guide
Deysi Gmarra
 
PDF
Mongo db data-models-guide
Dan Llimpe
 
PPTX
MVP Cloud OS Week Track 1 9 Sept: Data liberty
csmyth501
 
PPTX
MVP Cloud OS Week: 9 Sept, Track 1 Data Liberty
csmyth501
 
PDF
MongoDB in FS
MongoDB
 
PPTX
Creating a Single View: Overview and Analysis
MongoDB
 
PPTX
Webinar: Getting Started with MongoDB - Back to Basics
MongoDB
 
PPTX
Lessons Learned from Building a Multi-Tenant Saas Content Management System o...
MongoDB
 
Super spike
Michael Falanga
 
Schema design mongo_boston
MongoDB
 
Jumpstart: Introduction to Schema Design
MongoDB
 
Schema Design
MongoDB
 
Crafting Evolvable Api Responses
darrelmiller71
 
How to Get Started with Your MongoDB Pilot Project
DATAVERSITY
 
Schema Design
MongoDB
 
mongoDB at Visibiz
Mike Brocious
 
Data_Modeling_MongoDB.pdf
jill734733
 
Schema Design
MongoDB
 
Mongo - an intermediate introduction
nklmish
 
[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features
Andrew Liu
 
Mongo db data-models guide
Deysi Gmarra
 
Mongo db data-models-guide
Dan Llimpe
 
MVP Cloud OS Week Track 1 9 Sept: Data liberty
csmyth501
 
MVP Cloud OS Week: 9 Sept, Track 1 Data Liberty
csmyth501
 
MongoDB in FS
MongoDB
 
Creating a Single View: Overview and Analysis
MongoDB
 
Webinar: Getting Started with MongoDB - Back to Basics
MongoDB
 
Lessons Learned from Building a Multi-Tenant Saas Content Management System o...
MongoDB
 
Ad

More from MongoDB (20)

PDF
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
PDF
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
PDF
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
PDF
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
PDF
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
PDF
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
PDF
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
PDF
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
PDF
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
PDF
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
PDF
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 

Recently uploaded (20)

PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
PDF
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PPTX
Digital Circuits, important subject in CS
contactparinay1
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PPT
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
Digital Circuits, important subject in CS
contactparinay1
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 

Creating a Single View Part 2: Loading Disparate Source Data and Creating a Single Enterprise-Wide View

  • 1. Enterprise Architect, MongoDB Buzz Moschetti [email protected] #ConferenceHashTag Creating a Single View Part 2: Data Design & Loading Strategies
  • 2. Who Is Talking To You? • Yes, I use “Buzz” on my business cards • Former Investment Bank Chief Architect at JPMorganChase and Bear Stearns before that • Over 27 years of designing and building systems • Big and small • Super-specialized to broadly useful in any vertical • “Traditional” to completely disruptive • Advocate of language leverage and strong factoring • Inventor of perl DBI/DBD • Still programming – using emacs, of course
  • 3. What Is He Going To Talk About? Historic Challenges New Strategy for Success Technical examples and tips Overview & Data Analysis Data Design & Loading Strategies Securing Your Deployment ç Ω Creating A Single View Part 1 Part 2 Part 3
  • 5. It’s 2014: Why is this still hard to do? • Business / Technical / Information Challenges • Missteps in evolution of data transfer technology A X
  • 6. We wish this “just worked” A Query objects from A with great performance Query objects from B with great performance X Query objects from merged A and B with great performance B
  • 7. …but Beware The Blue Arrow! A X • Extracting many tables into many files • Some tables require more than one file to capture representation • Encoding/formatting clever tricks • Reconciliation • Different extracts for different consumers • Different extracts for different versions of data to same consumer
  • 8. Loss of fidelity exposed class Product { String productName; List<Features> ff; Date introDate; List<Date> versDates; int[] unitBundles; //… } widget1,,3,,good texture,retains value,,,20142304,102.3,201401 widget2,XS,6,,,,not fragile,,,20132304,73,87653 widget3,XT,,,4,,dense,shiny,mysterious,,,19990304,73,87653,, widget4,,,3,4,,,,,,20040101,,999999,, A ORM
  • 9. What happened to XML? class Product { String productName; List<Features> ff; Date introDate; List<Date> versDates; int[] unitBundles; //… } <product> <name>widget1</name> <features> <feature> <text>good texture</text> <type>A</type> </feature> </features> <introDate>20140204</introDate> <versDates> <versDate>20100103</versDate> <versDate>20100601</versDate> </versDates> <unitBundles>1,3,9</unitBun… ç Ω
  • 10. XML: Created More Issues Than Solved <product> <name>widget1</name> <features> <feature> <text>good texture</text> <type>A</type> </feature> </features> <introDate>20140204</introDate> <versDates> <versDate>20100103</versDate> <versDate>20100601</versDate> </versDates> <unitBundles>1,3,9</unitBun… • No native handling of arrays • Attribute vs. nested tag rules/conventions widely variable • Generic parsing (DOM) yields a tree of Nodes of Strings – not very friendly • SAX is fast but too low level
  • 11. … and it eventually became this <p name=“widget1” ftxt1=“good texture” ftyp1=“A” idt=“20140203” … <p name=“widget2” ftxt1=“not fragile” ftyp1=“A” idt=“20110117” … <p name=“widget3” ftxt1=“dense” idt=“20140203” … <p name=“widget4” idt=“20140203” versD=“20130403,20130104,20100605” … • Short, cryptic, conflated tag names • Everything is a string attribute • Mix of flattened arrays and delimited strings • Irony: org.xml.sax.Attributes easier to deal with than rest of DOM
  • 12. Schema Change Challenges: Multiplied & Concentrated! X Alter table(s) split() more data A Alter table(s) Extract more data LOE = x1 Alter table(s) split() more data Alter table(s) split() more data B Alter table(s) Extract more data LOE = x2 C Alter table(s) Extract more data LOE = x3 LOE = xn 1 n å + f (n) where f() is nonlinear wrt n
  • 13. SLAs & Security: Tough to Combine A B User 1 entitled to see X User 2 entitled to see Y User 1 entitled to see Z User 2 entitled to see V X Entitlements managed per- system/per-application here…. …are lost in the low-fidelity transfer of data…. …and have to be reconstituted here …somehow…
  • 14. Solving The Problem with mongoDB
  • 15. What We Are Building Today
  • 16. Overall Strategy For Success • Let the source systems entities drive the data design, not the physical database • Capture data in full fidelity • Perform cross-ref and additional logic at the single point of view
  • 17. Don’t forget the power of the API class Product { String productName; List<Features> ff; Date introDate; List<Date> versDates; int[] unitBundles; //… } If you can, avoid files altogether! Haskell ç Ω
  • 18. But if you are creating files: emit JSON class Product { String productName; List<Features> ff; Date introDate; List<Date> versDates; int[] unitBundles; //… } { “name”: “widget1”, “features”: [ { “text”: “good texture”, “type”: “A” } ], “introDate”: “20140204”, “versDates”: [ “20100103”, “20100601” ], “unitBundles”: [1,3,7,9] // … } ç Ω
  • 19. Let The Feeding System Express itself A B C { “name”: “widget1”, “features”: [ { “text”: “good texture”, “type”: “A” } ] } { “myColors”: [“red”,”blue”], “myFloats”: [ 3.14159, 2.71828 ], “nest”: { “as”: { “deep”: true }}} } { “myBlob”: { “$binary”: “aGVsbG8K”}, “myDate”: { “$date”: “20130405” } }
  • 20. What if you forgot something? { “name”: “widget1”, “features”: [ { “text”: “good texture”, “type”: “A” } ], “introDate”: “20140204”, “versDates”: [ “20100103”, “20100601” ], “versMinorNum”: [1,3,7,9] // … } { “name”: “widget1”, “features”: [ { “text”: “good texture”, “type”: “A” } ], “coverage”: [ “NY”, “NJ” ], “introDate”: “20140204”, “versDates”: [ “20100103”, “20100601” ], “versMinorNum”: [1,3,7,9] // … } ç Ω
  • 21. The Joy (and value) of mongoDB A Alter table(s) Extract more data LOE = .25x1 B Alter table(s) Extract more data LOE = .25x2 C Alter table(s) Extract more data LOE = .25x3 LOE =O(1)
  • 22. Helpful Hint: Use the APIs curs.execute("select A.did, A.fullname, B.number from contact A left outer join phones B on A.did = B.did order by A.did") for q in curs.fetchall(): if q[0] != lastDID: if lastDID != None: coll.insert(contact) contact = { "did": q[0], "name": q[1]} lastDID = q[0] if q[2] is not None: if 'phones' not in contact: contact['phones'] = [] contact['phones'].append({"number”:q[2]}) if lastDID != None: coll.insert(contact) { "did" : ”D159308", "phones" : [ {"number”: "1-666-444-3333”}, {"number”: "1-999-444-3333”}, {"number”: "1-999-444-9999”} ], "name" : ”Buzz" } ç Ω
  • 23. Helpful Hint: Declare Types Use mongoDB conventions for dates and binary data: {“dateA”: {“$date”:“2014-05-16T09:42:57.112-0000”}} {“dateB”: {“$date”:1400617865438}} {“someBlob”: { "$binary" : "YmxhIGJsYSBibGE=", "$type" : "00" }
  • 24. Helpful Hint: Keep the file flexible Use CR-delimited JSON: { “name”: “buzz”, “locale”: “NY”} { “name”: “steve”, “locale”: “UK”} { “name”: “john”, “locale”: “NY”} …instead of a giant array: records = [ { “name”: “buzz”, “locale”: “NY”}, { “name”: “steve”, “locale”: “UK”}, { “name”: “john”, “locale”: “NY”}, ]
  • 25. Helpful Hint: Don’t be afraid of metadata Use a version number in each document: { “v”: 1, “name”: “buzz”, “locale”: “NY”} { “v”: 1, “name”: “steve”, “locale”: “UK”} { “v”: 2, “name”: “john”, “region”: “NY”} …or get fancier and use a header record: { “vers”: 1, “creator”: “ID”, “createDate”: …} { “name”: “buzz”, “locale”: “NY”} { “name”: “steve”, “locale”: “UK”} { “name”: “john”, “locale”: “NY”}
  • 26. Helpful Hints: Use batch ID { “vers”: 1, “batchID”: “B213W”, “createDate”:…} { “name”: “buzz”, “locale”: “NY”} { “name”: “steve”, “locale”: “UK”} { “name”: “john”, “locale”: “NY”}
  • 27. Now that we have the data… You’re well on your way to a single view consolidation…but first: – Data Work • Cross-reference important keys • Potential scrubbing/cleansing – Software Stack Work
  • 28. You’ve Built a Great Data Asset; leverage it!
  • 30. Build THIS! http://yourcompany/yourapp Data Access Layer Object Constructon Layer Basic Functional Layer Portal Functional Layer GUI adapter Layer Web Service Layer Other Regular Performance Applications Higher Performance Applications Special Generic Applications
  • 31. What Is Happening Next? Access Control Data Protection Auditing Overview & Data Analysis Data Design & Loading Strategies ç Ω Creating A Single View Part 1 Part 2 Securing Your Deployment Part 3
  • 32. Enterprise Architect, MongoDB Buzz Moschetti [email protected] #ConferenceHashTag Thank You

Editor's Notes

  • #4: Blblblb
  • #6: AND WHY ARE WE DOING IT AT ALL! Federation? Managed QoS? Because traditional RDBMS dynamics make it difficult to well-serve a number of access patterns The single most important part of this that will make you successful is the simplest – and is part of the mongoDB data environment
  • #8: ETL fabric fidelity of data typically LCD CSV still carries the day because easy to make and technically parse (but difficult to change or express things) XML / XSD “too hard” to technically make, parse/consume, and harder still to create consistent list/array conventions Anecdote about getting screwered by the arrow The arrow is disingenuous! This is LOSS OF FIDELITY
  • #9: Most people use an ORM to get from DB to good objects – and mongoDB has a story around that too! But for the moment, assume we use it.
  • #10: XML was supposed to be The Thing.
  • #11: XML / XSD “too hard” to technically make, parse/consume, and harder still to create consistent list/array conventions No one runs schema validation in production because of performance Schemas became too complicated anyway….. JAXB, JAXP are compile-time bound
  • #12: XML set us back about 10 years Leads to this: Can you please just send me a CSV again?
  • #13: Changes to data in source system imply DB schema upgrade in data hub – with X source systems, this starts to become unscalable Hub Data storage scalability In summary: traditionally, common data hubs are harder to manage than the sum of their source systems – which themselves are not so easy to manage! Remember this formula; we’ll see how we improve upon this in just a bit.
  • #14: Data entitlement implicit to system access Fast moving businesses cannot be held up by naturally more slowing moving ones (Andreas will cover this in greater detail later)  
  • #17: How did we get here, examples from past? Anecdotal reinforcement. Knowing legacy problems and experience, here are the 3 things that work. Don’t think about transfering tables’ think about transfering products, logs, trades, customers ----- Meeting Notes (5/19/14 13:31) -----
  • #18: A zillion APIs. This does not necessarily mean REALTIME. We can do realtime with “microbatching”. We can do EOD batch with a filefree API. It’s all about how producer and consumer agree to capture the data – we’ll see more about this context later in the presentation. ----- Meeting Notes (5/19/14 13:31) ----- Our most successful customers do this or use microbatching. The Green Arrow
  • #19: JSON is the new leader in highly interoperable, ASCII structured data format ASCII interop is critical so GPB, Avro, and other formats are out. Better than XML because Strings, numbers, maps, and arrays natively supported Simpler data model (no attributes or unnested content) Easier to programmatically construct (Much!) better than CSV because Rich detail is preserved Content can be expanded later without struggling with “comma hell” Warning: JSON does NOT have Date or binary (BLOB) types! We’ll come back to a strategy on that….
  • #20: The Basic Rules: Let feeder systems drive the data design Do not dilute, format, or otherwise mess with the data
  • #21: JUST ADD IT. Not talking about doubles turning into lists of dates – but there’s a hint coming up that could help there too.
  • #22: MUCH easier to update JSON feed handler for new data Essentially constant time to ingest new or changed data!
  • #23: Build the rich structure! You have to do this anyway to produce a JSON file so if you can, go the extra distance and just directly insert the content. Don’t worry about transactions; you should be using batchID which we’ll get to in a moment.
  • #24: mongoDB does not extend JSON per se. Rather, within the JSON spec, we have a structural-naming convention that allows us to clearly hint at the true intended type of the string value.
  • #25: Easy to grep and use jq too Std unix utils work nicely too: Same format as mongoimport and mongoexport Does not force large memory footprint on loader
  • #26: Don’t be afraid to make mistakes – for the same reason we explored on slide 21.
  • #27: Context is an identifier for a set of data: ABC123 Dates are dangerous For global systems, two (or more!) local dates possible. System processing date can be misleading Context has additional benefits Easy to associate other information with context ID like functional ID
  • #28: Single View of Customer does not mean Single Technical visualization of Customer thru GUI!!
  • #31: Examples: Fin svc who uses this stack and how.
  • #32: Blblblb