SlideShare a Scribd company logo
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
DATA MANAGEMENT FOR
HIGH-PERFORMANCE ANALYTICS
DAN SOCEANU
SENIOR SOLUTIONS ARCHITECT
DATA MANAGEMENT
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
BEFORE WE BEGIN SAS ACKNOWLEDGEMENTS
Ron Agresta, Product Director, Data Management
Lisa Dodson, Global Technology Practice Manager, Data Management
David Pope, Pre-Sales Manager, Energy & Manufacturing
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
DATA MANAGEMENT WHY ARE WE HERE?
• Data is rarely fit for analytic
purposes
• End-users are overwhelmed
o What data do I use?
o How do I load data?
o How can I find only the data I
need?
• Real-time needs
• The rise of “self-service
analytics”
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
CAN YOU LEVERAGE OPEN SOURCE
ANALYTICS?
CAN YOU
SCALE YOUR
DATA AND YOUR
ANALYTICS?
DO YOU GROW
A CULTURE OF
INNOVATION?
CAN YOU ANALYZE ALL
OF YOUR DATA?
CAN YOU MODERNIZE
YOUR LEGACY BI
STRATEGY?
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
Data Management for
High Performance
Analytics
0
IoT
Operational
Unstructured
Web
Text
Optimization
Forecasting
Mining
High Performance
Analytics
Data Sources
DATA MANAGEMENT BRIDGING THE GAP
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
Data Access Tier
Analytical Tier
Visualization Tier
Data Preparation Tier
Visualization
Analytics
Preparation
Access
DATA MANAGEMENT
CONVERGENCE OF DATA PREP, ANALYTICAL
PROCESSING AND PROVISIONING
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
DATA MANAGEMENT DATA FLOW FOR HIGH PERFORMANCE ANALYTICS
Data Management
Data
Warehouse
Dynamic
ReportingRead
ETL
Dynamic
Visualization
ACCESS
DataManagement
Analytical
Data
Warehouse
DataMonitoring
ExplorationQualityIntegration
MDM
Data
Marts
Model
Development
Operational
MQ
XML
Cloud
SOURCES
Repository
High
Performance
Analytics
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
ANALYTICS HISTORICAL VS. ADVANCED
Descriptive
 What happened?
 When?
 Why?
• Frequency
Distributions
• Correlation Measures
• Event Study
• Association Rules
Predictive
 What will happen?
 When?
 Why?
 How does that effect us?
 What actions should I
take?
• Estimation & Forecasting
• Segmentation
• Optimization
ANALYTICS
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
HIGH-PERFORMANCE
ANALYTICS
SAS SOLUTIONS
SAS High-Performance Data Mining
Predictive models using thousands of variables to produce more accurate and timely insights
SAS High-Performance Econometrics
Analytical models using complete data, not just a subset
SAS High-Performance Optimization
Model and solve optimization problems that are very large or cumbersome to solve
SAS High-Performance Statistics
Statistical models using big data to produce more accurate and timely insights
SAS High-Performance Text Mining
Better understand communications and create new value from big text data
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
HIGH-PERFORMANCE
ANALYTICS
SAS ANALYTIC PROCESSING APPROACHES
Traditional
Move data from source to the SAS server, process it and write back results (single server or SAS
Grid Manager)
In-Database
Move SAS processing to the data source and allow SAS processing to occur under the control of
the source environment (e.g. relational database or Hadoop). The analytic code executes in the
database process.
In-memory “Alongside” the Database
Move SAS processing to the data source but allow a SAS process to run "along-side”. The analytic
processes and the database processes are co-located and share resources.
In-memory “Next to” the Database
Move data from source to a dedicated SAS environment for processing. Does not require making
a physical copy of the data before processing and, once the processing is complete, the data is
not required to be kept in the dedicated SAS environment. This separates the resources
associated with data storage & processing and the SAS advanced analytical processing.
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
DATA MANAGEMENT VS. DATA PREPARATION?
Business Need
• Support analytical methods for decision
making, use cases and required actions
Data Governance
• Gap assessment; people, process and
technology
• Auditability, traceability, automated rules,
monitoring, collaboration
Productivity
• Data preparation, provisioning, reporting
DATA MANAGEMENT
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
DATA MANAGEMENT VS. DATA PREPARATION?
Business Need
• Support analytical methods for decision
making, use cases and required actions
Data Governance
• Gap assessment; people, process and
technology
• Auditability, traceability, automated rules,
monitoring, collaboration
Productivity
• Data preparation, provisioning, reporting
DATA MANAGEMENT DATA PREPARATION
Identify
• Profile
• Data types
• Numeric
• Character
• Contextual
• Cardinality
Access
• ETL
• Batch
• Real-time
• Latency
• Data Movement
• Connectivity
• Data Sources
Data Quality
• De-duplicate
• Standardize
• Missing values
• Imputation
• Enrich
• Binning
• Matching
• Identify
anomalies
Reshape
• Wide & flat
• Long & lean
• Transformation
logic
• Transpositions
• Frequency
analysis
• Appending data
• Partitioning
data
• Summarization
Metadata
• Lineage
• Semantic
glossary
• Data
relationships
• Impact analysis
• Hierarchy
management
• Collaboration
• Repeatability
• Entity
management
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
DATA MANAGEMENT THE ROLE OF DATA GOVERNANCE
Data Lifecycle
Reference and
Master Data
Data Security
Data
Architecture
Metadata Data Quality
Data
Administration
Data Warehousing
& BI/Analytics
DATA MANAGEMENT
DataStewardship
Roles&Tasks
Decision-making Bodies
Guiding Principles
Program Objectives
Decision Rights
DATA GOVERNANCE
DG without DM = only an academic exercise
DM without DG = the continued culture of “I know a guy”
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
DATA MANAGEMENT THE IMPORTANCE OF DATA GOVERNANCE
POSITIONS ENTERPRISE DATA ISSUES AS CROSS-FUNCTIONAL
• Establishes guiding principles for data sharing
• Eliminates data ownership issues and “turf wars”
• Ensures appropriate stakeholders have a say in decision making
ESTABLISHES BUSINESS STAKEHOLDERS AS INFORMATION OWNERS
• Aligns data policy with business strategies and priorities
• Aligns data quality with business measures and acceptance
• Helps to Identify ROI for data related activity
FORMALIZES DATA STEWARDSHIP
• Clarifies accountability for data definitions, rules, and quality
• Ensures data is managed separately from applications
• Formalizes monitoring and measurement of critical data
FOSTERS IMPROVED ALIGNMENT BETWEEN BUSINESS AND IT
• Links IT-driven data management activities with business unit activity
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
PARADIGM SHIFT
DATA PREPARATION IS ABOUT THE
BUSINESS NEED & USE CASE
80% 20%
Identify Access Data Quality Reshape Metadata Business Use
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
DATA PREPARATION FIVE KEY FOCUS AREAS
DATA PREPARATION
Identify
•Profile
•Data types
•Numeric
•Character
•Contextual
•Cardinality
Access
•ETL
•Batch
•Real-time
•Latency
•Data Movement
•Connectivity
•Data Sources
Data Quality
•De-duplicate
•Standardize
•Missing values
•Imputation
•Enrich
•Binning
•Matching
•Identify
anomalies
Reshape
•Wide & flat
•Long & lean
•Transformation
logic
•Transpositions
•Frequency
analysis
•Appending data
•Partitioning data
•Summarization
Metadata
•Lineage
•Semantic
glossary
•Data
relationships
•Impact analysis
•Hierarchy
management
•Collaboration
•Repeatability
•Entity
management
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
IDENTIFY WHAT DO I HAVE AND HOW USEFUL IS IT?
Is my data
consistent?
Is my data
complete?
Is my data
highly
unique?
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
IDENTIFY WHAT DO I HAVE AND HOW USEFUL IS IT?
Is my data
normal?
Is my data
linear?
What are the
associations in
the data?
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
ACCESS SO MANY DATA TYPES AND SOURCES
Access Excel SQLServer Oracle MySQL
Boolean Yes/No Bit Byte N/A Boolean
integer Number Int Number Int Int
float Number
(single)
Float Number Float Numeric
currency Currency Money NA NA Money
string NA Char Char Char Char
string Text VarChar VarChar VarChar VarChar
binary OLE Obj
Memo
Binary
Varbinary
Image
Long
Raw
Blob
Text
Binary
Varbinary
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
DATA QUALITY THE FOUNDATION
• Standardization
• Parsing
• Casing
• Identification
• De-duplication
• “Fuzzy” matching
• Clustering
• Entity resolution
• Survivorship
• Gender Analysis
• Locale Guessing
• Address Verification
• Address Enrichment (geocoding)
Business
Logic &
Rules
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
DATA QUALITY FILLING IN THE GAPS AND STANDARDIZING
Standardizing
Text
De-duplication
Standardizing Numeric
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
FILLING IN THE GAPS AND STANDARDIZING
Dropping outliers
Grouping or binning data
DATA QUALITY
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
RESHAPE FIT FOR PURPOSE?
Schema/view
Or
Flat Table?
Format of data
Data quality
dimensions?
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
RESHAPE FLATTENING THE DATA
• Efficient storage
• Fast retrieval
• Defined
schema
• WIDE tables /Time series data
• Iteration (build, test, repeat)
• Schema-less
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
RESHAPE SUMMARIZATION
Each product category will become its own row, with each
product purchased its own distinct category column.
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
RESHAPE TRANSPOSITION FOR DATA MINING
Add up the quantities for
each product purchased,
in each product category.
Copyright © 2013, SAS Institute Inc. All rights reserved.
METADATA MANAGE DATA HIERARCHIES AND RELATIONSHIPS
Customer
Types
Hierarchy
Coverage
Products
Financial
Accounts
Address
Inquiries
Product Party
Accounts
Transactions
Authorizations
Individual Organization
Inquiries
Loans
Terms
Collaterals
Ratings
External
Assets
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
METADATA ENTITY RESOLUTION
EMPLOYER_NA
ME_GRPID
EMPLOYER_NAME = Name of the client employer
(SOL0003n_Employer_Name)
cnt
28296ČESKOSLOVENSKÁ OBCHODNÍ BANKA, A. S 6
ČESKOSLOVENSKÁ OBCHODNÍ BANKA A.S. 182
ČSKOSLOVENSKÁ OBCHODNÍ BANKA A.S. 1
ČESKOSLOVENSKÁ OBCHODNÍ BANKA. A.S. 2
ČESKOSLOVENSKÁ OBCHODNÍ BANKA,A.S. 78
ČESKOSLOVENSKÁ OBCHODNÍ BANKA A. S. 9
ČESKOSLOVENSKÁ OBCHODNÍ BANKA ,A.S. 2
ČESKOSLOVENSKÁ OBCHODNÍ BANKA, A.S 6
ČESKOSLOVENSKÁ OBCHODNÍ BANKA, A.S . 1
ČESKOSLOVENSKÁ OBCHODNÍ BANKA, A.S. 717
ČESKOSLOVENSKUÁ OBCHODNÍ BANKA A.S. 1
ČESKOSLOVENSKÁ OBCHODNÍ BANKA A.S 1
ČESKOSLOVENSKÁ OBCHODNÍ BANKA, S.R.O. 3
ČESKOSLOVENSKÁ OBCHODNÍ BAŃKA, A.S. 1
ČESKOSLOVENSKÁ OBCHODNÍ BANKA, A. S. 587
ČESKOSLOVENSKÁOBCHODNÍBANKA, A.S. 1
ČESKOSLOVENSKÁ OBCHODNÍBANKA A.S. 1
ČESKOSLOVENSKÁ OBCHODNÍ BANLA 1
ČESKOSLOVENSKÁ OBCHODNÍ BÁNKA, A.S. 1
ČESKOSLOVENSKÁ OBCHODNÍ BANKA,A.S 2
ČESKOSLOVENSKÁ OBCHODNÍ BANKA 27
ČESKOSLOVENSKÁOBCHODNÍBANKA,A.S. 1
Example:
Entity Resolution
Employer Name
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
METADATA SEMANTIC RECONCILIATION AND BUSINESS GLOSSARY
Business Glossary and
Terms
Technical Architecture Diagram
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
METADATA LINEAGE & TRACEABILITY
A view into existing
data sources/targets,
jobs and the
associated ‘owners’
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
METADATA COLLABORATION AND REPEATABILITY
Collaboration
& Role-based
Dashboarding
Workflow & Data
Remediation
Process Orchestration
Unified Lineage
Job Monitoring
Copyr ight © 2016, SAS Institute Inc. All rights reser ved.
Decision MakingCustomer Focus
Compliance
Mandates
Mergers &
Acquisitions
At-Risk Projects
Operational
Efficiencies
CORPORATE DRIVERS
Data Quality
Data
Integration
Reference Data
Management
Master Data
Management
Data
Visualization
Data
Monitoring
Metadata
Management
Business
Glossary
SOLUTIONS
Data Lifecycle
Reference and
Master Data
Data Security
Data
Architecture
Metadata Data Quality
Data
Administration
Data Warehousing
& BI/Analytics
DATA MANAGEMENT
DataStewardship
Roles&Tasks
Decision-making Bodies
Guiding Principles
Program Objectives
Decision Rights
DATA GOVERNANCE
People
Process
Technology
METHODS
SAS DATA
MANAGEMENT
FRAMEWORK FOR SUCCESS
Data
Virtualization
Data Profiling
& Exploration
Copyright © 2013, SAS Institute Inc. All rights reserved.
QUESTIONS & ANSWERS THANK YOU!
DAN.SOCEANU@SAS.COM

More Related Content

PPTX
Lowering the entry point to getting going with Hadoop and obtaining business ...
DataWorks Summit
 
PDF
"Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovatio...
Dataconomy Media
 
PDF
SAS Visual Analytics
MarketingArrowECS_CZ
 
PPTX
What is the Value of SAS Analytics?
SAS Canada
 
PPTX
SAS Modernization architectures - Big Data Analytics
Deepak Ramanathan
 
PPTX
Sas visual analytics training presentation
bidwhm
 
PPTX
Introduction to SAS Forecasting
SAS Canada
 
PPT
Cooperation and insight presentation
RoseBud Technologies
 
Lowering the entry point to getting going with Hadoop and obtaining business ...
DataWorks Summit
 
"Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovatio...
Dataconomy Media
 
SAS Visual Analytics
MarketingArrowECS_CZ
 
What is the Value of SAS Analytics?
SAS Canada
 
SAS Modernization architectures - Big Data Analytics
Deepak Ramanathan
 
Sas visual analytics training presentation
bidwhm
 
Introduction to SAS Forecasting
SAS Canada
 
Cooperation and insight presentation
RoseBud Technologies
 

What's hot (18)

PDF
SAS Visual Analytics Overview
SAS Institute India Pvt. Ltd
 
PDF
SAS Analytics In Action - The New BI
SAS Canada
 
PDF
The Model Enterprise: A Blueprint for Enterprise Data Governance
Eric Kavanagh
 
PDF
You're the New CDO, Now What?
Caserta
 
PDF
SAS Presentation
Kali Howard
 
PPTX
Combining SAS Office Analytics, SAS Visual Analytics, and SAS Studio.
SAS Canada
 
PDF
Big Data Services at YASH
YASH Technologies
 
PDF
SAS Visual Analytics
Evan Greenberg
 
PPTX
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
DataWorks Summit
 
PDF
SAS - Visual Analytics a živá ukázka
MarketingArrowECS_CZ
 
PDF
Choosing the Right Database for My Workload: Purpose-Built Databases
AWS Germany
 
PDF
451 Research Report on Avalon Big Data Capabilities - 2017
Tom Reidy
 
PDF
Paraccel/Database Architechs Press Release
Database Architechs
 
PPTX
The Future of Data Warehousing and Data Integration
Eric Kavanagh
 
PDF
Pentaho Healthcare Solutions
Pentaho
 
PPTX
JSBI Presentation Big Data Hyperion OBIEE Integration16 2
Jeff Shauer
 
PDF
Oracle big data discovery 994294
Edgar Alejandro Villegas
 
PDF
Data donderdag data quality sas
Cre-Aid
 
SAS Visual Analytics Overview
SAS Institute India Pvt. Ltd
 
SAS Analytics In Action - The New BI
SAS Canada
 
The Model Enterprise: A Blueprint for Enterprise Data Governance
Eric Kavanagh
 
You're the New CDO, Now What?
Caserta
 
SAS Presentation
Kali Howard
 
Combining SAS Office Analytics, SAS Visual Analytics, and SAS Studio.
SAS Canada
 
Big Data Services at YASH
YASH Technologies
 
SAS Visual Analytics
Evan Greenberg
 
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
DataWorks Summit
 
SAS - Visual Analytics a živá ukázka
MarketingArrowECS_CZ
 
Choosing the Right Database for My Workload: Purpose-Built Databases
AWS Germany
 
451 Research Report on Avalon Big Data Capabilities - 2017
Tom Reidy
 
Paraccel/Database Architechs Press Release
Database Architechs
 
The Future of Data Warehousing and Data Integration
Eric Kavanagh
 
Pentaho Healthcare Solutions
Pentaho
 
JSBI Presentation Big Data Hyperion OBIEE Integration16 2
Jeff Shauer
 
Oracle big data discovery 994294
Edgar Alejandro Villegas
 
Data donderdag data quality sas
Cre-Aid
 
Ad

Viewers also liked (20)

PDF
Partnership checklist
Score Orange County
 
PDF
Consolidate SAS 9.4 workloads with Intel Xeon processor E7 v3 and Intel SSD t...
Principled Technologies
 
PDF
Install SAS 9.2 presentation
Shane Gibson
 
PPTX
SAS Modernization Webinar
d-Wise Technologies
 
PDF
Introduction To Sas
halasti
 
PDF
SAS and Netezza Enzee universe presentation_20_june2011
Pavel Zhivulin
 
PDF
Migrating To SAS 9.2 by Bill Gibson
simienc
 
PPTX
Netezza integration with SAS software
Pavel Zhivulin
 
PPTX
Administrative Reporting of SAS Visual Analytics 7.1 and Integration with E...
Francesco Marelli
 
PPTX
Sas Grid Migration and Roadmap
d-Wise Technologies
 
PDF
Proc sql tips
Naresh Kumar Gamidi
 
PDF
Sas Presentation
Asli Yazagan
 
PDF
SAS/Tableau integration
Patrick Spedding
 
PDF
Sas visual-analytics-startup-guide
CMR WORLD TECH
 
PPT
SAS Proc SQL
guest2160992
 
PPT
Understanding SAS Data Step Processing
guest2160992
 
PPTX
Sas demo
rvmfinishingschool
 
PPTX
Machine learning overview (with SAS software)
Longhow Lam
 
PPT
Basics Of SAS Programming Language
guest2160992
 
PPTX
SAS MDM TRAINING ,SAS MDM SYLLABUS
bidwhm
 
Partnership checklist
Score Orange County
 
Consolidate SAS 9.4 workloads with Intel Xeon processor E7 v3 and Intel SSD t...
Principled Technologies
 
Install SAS 9.2 presentation
Shane Gibson
 
SAS Modernization Webinar
d-Wise Technologies
 
Introduction To Sas
halasti
 
SAS and Netezza Enzee universe presentation_20_june2011
Pavel Zhivulin
 
Migrating To SAS 9.2 by Bill Gibson
simienc
 
Netezza integration with SAS software
Pavel Zhivulin
 
Administrative Reporting of SAS Visual Analytics 7.1 and Integration with E...
Francesco Marelli
 
Sas Grid Migration and Roadmap
d-Wise Technologies
 
Proc sql tips
Naresh Kumar Gamidi
 
Sas Presentation
Asli Yazagan
 
SAS/Tableau integration
Patrick Spedding
 
Sas visual-analytics-startup-guide
CMR WORLD TECH
 
SAS Proc SQL
guest2160992
 
Understanding SAS Data Step Processing
guest2160992
 
Machine learning overview (with SAS software)
Longhow Lam
 
Basics Of SAS Programming Language
guest2160992
 
SAS MDM TRAINING ,SAS MDM SYLLABUS
bidwhm
 
Ad

Similar to Data Management for High Performance Analytics (20)

PDF
What Data Do You Have and Where is It?
Caserta
 
PDF
Incorporating the Data Lake into Your Analytic Architecture
Caserta
 
PDF
DAS Slides: Metadata Management From Technical Architecture & Business Techni...
DATAVERSITY
 
PDF
Cloud and Analytics - From Platforms to an Ecosystem
Databricks
 
PPTX
Cloud and Analytics -- 2020 sparksummit
Ming Yuan
 
PPTX
Introduction to Data Science
Caserta
 
PDF
Why an AI-Powered Data Catalog Tool is Critical to Business Success
Informatica
 
PDF
BAR360 open data platform presentation at DAMA, Sydney
Sai Paravastu
 
PPTX
ExpertsLive NL 2022 - Microsoft Purview - What's in it for my organization?
Albert Hoitingh
 
PDF
Setting Up the Data Lake
Caserta
 
PDF
The Data Lake - Balancing Data Governance and Innovation
Caserta
 
PDF
Building a New Platform for Customer Analytics
Caserta
 
PPTX
Big Data: Setting Up the Big Data Lake
Caserta
 
PDF
Rev_3 Components of a Data Warehouse
Ryan Andhavarapu
 
PDF
Unleashing the Power of your Data
Itai Yaffe
 
PDF
SAS Data Management for Analytics: potenzia le tue analisi e sostieni l’innov...
SAS Italy
 
PDF
Complement Your Existing Data Warehouse with Big Data & Hadoop
Datameer
 
PPTX
Business Visualization: Dashboard & Storyboarding
NMIMS Global Access School of Continuing Education (NGA-SCE)
 
PDF
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Denodo
 
PPTX
SAP D&A Give Data Purpose Deck incl L0-L2 White May 2023 for Sales.pptx
Ashwin Katkar
 
What Data Do You Have and Where is It?
Caserta
 
Incorporating the Data Lake into Your Analytic Architecture
Caserta
 
DAS Slides: Metadata Management From Technical Architecture & Business Techni...
DATAVERSITY
 
Cloud and Analytics - From Platforms to an Ecosystem
Databricks
 
Cloud and Analytics -- 2020 sparksummit
Ming Yuan
 
Introduction to Data Science
Caserta
 
Why an AI-Powered Data Catalog Tool is Critical to Business Success
Informatica
 
BAR360 open data platform presentation at DAMA, Sydney
Sai Paravastu
 
ExpertsLive NL 2022 - Microsoft Purview - What's in it for my organization?
Albert Hoitingh
 
Setting Up the Data Lake
Caserta
 
The Data Lake - Balancing Data Governance and Innovation
Caserta
 
Building a New Platform for Customer Analytics
Caserta
 
Big Data: Setting Up the Big Data Lake
Caserta
 
Rev_3 Components of a Data Warehouse
Ryan Andhavarapu
 
Unleashing the Power of your Data
Itai Yaffe
 
SAS Data Management for Analytics: potenzia le tue analisi e sostieni l’innov...
SAS Italy
 
Complement Your Existing Data Warehouse with Big Data & Hadoop
Datameer
 
Business Visualization: Dashboard & Storyboarding
NMIMS Global Access School of Continuing Education (NGA-SCE)
 
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Denodo
 
SAP D&A Give Data Purpose Deck incl L0-L2 White May 2023 for Sales.pptx
Ashwin Katkar
 

Recently uploaded (20)

PDF
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PPTX
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PDF
Software Development Methodologies in 2025
KodekX
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
PDF
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
PDF
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PDF
Doc9.....................................
SofiaCollazos
 
PDF
Advances in Ultra High Voltage (UHV) Transmission and Distribution Systems.pdf
Nabajyoti Banik
 
PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
PDF
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
PDF
The Evolution of KM Roles (Presented at Knowledge Summit Dublin 2025)
Enterprise Knowledge
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
Software Development Methodologies in 2025
KodekX
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
Doc9.....................................
SofiaCollazos
 
Advances in Ultra High Voltage (UHV) Transmission and Distribution Systems.pdf
Nabajyoti Banik
 
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
The Evolution of KM Roles (Presented at Knowledge Summit Dublin 2025)
Enterprise Knowledge
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 

Data Management for High Performance Analytics

  • 1. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. DATA MANAGEMENT FOR HIGH-PERFORMANCE ANALYTICS DAN SOCEANU SENIOR SOLUTIONS ARCHITECT DATA MANAGEMENT
  • 2. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. BEFORE WE BEGIN SAS ACKNOWLEDGEMENTS Ron Agresta, Product Director, Data Management Lisa Dodson, Global Technology Practice Manager, Data Management David Pope, Pre-Sales Manager, Energy & Manufacturing
  • 3. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. DATA MANAGEMENT WHY ARE WE HERE? • Data is rarely fit for analytic purposes • End-users are overwhelmed o What data do I use? o How do I load data? o How can I find only the data I need? • Real-time needs • The rise of “self-service analytics”
  • 4. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. CAN YOU LEVERAGE OPEN SOURCE ANALYTICS? CAN YOU SCALE YOUR DATA AND YOUR ANALYTICS? DO YOU GROW A CULTURE OF INNOVATION? CAN YOU ANALYZE ALL OF YOUR DATA? CAN YOU MODERNIZE YOUR LEGACY BI STRATEGY?
  • 5. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. Data Management for High Performance Analytics 0 IoT Operational Unstructured Web Text Optimization Forecasting Mining High Performance Analytics Data Sources DATA MANAGEMENT BRIDGING THE GAP
  • 6. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. Data Access Tier Analytical Tier Visualization Tier Data Preparation Tier Visualization Analytics Preparation Access DATA MANAGEMENT CONVERGENCE OF DATA PREP, ANALYTICAL PROCESSING AND PROVISIONING
  • 7. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. DATA MANAGEMENT DATA FLOW FOR HIGH PERFORMANCE ANALYTICS Data Management Data Warehouse Dynamic ReportingRead ETL Dynamic Visualization ACCESS DataManagement Analytical Data Warehouse DataMonitoring ExplorationQualityIntegration MDM Data Marts Model Development Operational MQ XML Cloud SOURCES Repository High Performance Analytics
  • 8. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. ANALYTICS HISTORICAL VS. ADVANCED Descriptive  What happened?  When?  Why? • Frequency Distributions • Correlation Measures • Event Study • Association Rules Predictive  What will happen?  When?  Why?  How does that effect us?  What actions should I take? • Estimation & Forecasting • Segmentation • Optimization ANALYTICS
  • 9. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. HIGH-PERFORMANCE ANALYTICS SAS SOLUTIONS SAS High-Performance Data Mining Predictive models using thousands of variables to produce more accurate and timely insights SAS High-Performance Econometrics Analytical models using complete data, not just a subset SAS High-Performance Optimization Model and solve optimization problems that are very large or cumbersome to solve SAS High-Performance Statistics Statistical models using big data to produce more accurate and timely insights SAS High-Performance Text Mining Better understand communications and create new value from big text data
  • 10. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. HIGH-PERFORMANCE ANALYTICS SAS ANALYTIC PROCESSING APPROACHES Traditional Move data from source to the SAS server, process it and write back results (single server or SAS Grid Manager) In-Database Move SAS processing to the data source and allow SAS processing to occur under the control of the source environment (e.g. relational database or Hadoop). The analytic code executes in the database process. In-memory “Alongside” the Database Move SAS processing to the data source but allow a SAS process to run "along-side”. The analytic processes and the database processes are co-located and share resources. In-memory “Next to” the Database Move data from source to a dedicated SAS environment for processing. Does not require making a physical copy of the data before processing and, once the processing is complete, the data is not required to be kept in the dedicated SAS environment. This separates the resources associated with data storage & processing and the SAS advanced analytical processing.
  • 11. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. DATA MANAGEMENT VS. DATA PREPARATION? Business Need • Support analytical methods for decision making, use cases and required actions Data Governance • Gap assessment; people, process and technology • Auditability, traceability, automated rules, monitoring, collaboration Productivity • Data preparation, provisioning, reporting DATA MANAGEMENT
  • 12. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. DATA MANAGEMENT VS. DATA PREPARATION? Business Need • Support analytical methods for decision making, use cases and required actions Data Governance • Gap assessment; people, process and technology • Auditability, traceability, automated rules, monitoring, collaboration Productivity • Data preparation, provisioning, reporting DATA MANAGEMENT DATA PREPARATION Identify • Profile • Data types • Numeric • Character • Contextual • Cardinality Access • ETL • Batch • Real-time • Latency • Data Movement • Connectivity • Data Sources Data Quality • De-duplicate • Standardize • Missing values • Imputation • Enrich • Binning • Matching • Identify anomalies Reshape • Wide & flat • Long & lean • Transformation logic • Transpositions • Frequency analysis • Appending data • Partitioning data • Summarization Metadata • Lineage • Semantic glossary • Data relationships • Impact analysis • Hierarchy management • Collaboration • Repeatability • Entity management
  • 13. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. DATA MANAGEMENT THE ROLE OF DATA GOVERNANCE Data Lifecycle Reference and Master Data Data Security Data Architecture Metadata Data Quality Data Administration Data Warehousing & BI/Analytics DATA MANAGEMENT DataStewardship Roles&Tasks Decision-making Bodies Guiding Principles Program Objectives Decision Rights DATA GOVERNANCE DG without DM = only an academic exercise DM without DG = the continued culture of “I know a guy”
  • 14. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. DATA MANAGEMENT THE IMPORTANCE OF DATA GOVERNANCE POSITIONS ENTERPRISE DATA ISSUES AS CROSS-FUNCTIONAL • Establishes guiding principles for data sharing • Eliminates data ownership issues and “turf wars” • Ensures appropriate stakeholders have a say in decision making ESTABLISHES BUSINESS STAKEHOLDERS AS INFORMATION OWNERS • Aligns data policy with business strategies and priorities • Aligns data quality with business measures and acceptance • Helps to Identify ROI for data related activity FORMALIZES DATA STEWARDSHIP • Clarifies accountability for data definitions, rules, and quality • Ensures data is managed separately from applications • Formalizes monitoring and measurement of critical data FOSTERS IMPROVED ALIGNMENT BETWEEN BUSINESS AND IT • Links IT-driven data management activities with business unit activity
  • 15. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. PARADIGM SHIFT DATA PREPARATION IS ABOUT THE BUSINESS NEED & USE CASE 80% 20% Identify Access Data Quality Reshape Metadata Business Use
  • 16. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. DATA PREPARATION FIVE KEY FOCUS AREAS DATA PREPARATION Identify •Profile •Data types •Numeric •Character •Contextual •Cardinality Access •ETL •Batch •Real-time •Latency •Data Movement •Connectivity •Data Sources Data Quality •De-duplicate •Standardize •Missing values •Imputation •Enrich •Binning •Matching •Identify anomalies Reshape •Wide & flat •Long & lean •Transformation logic •Transpositions •Frequency analysis •Appending data •Partitioning data •Summarization Metadata •Lineage •Semantic glossary •Data relationships •Impact analysis •Hierarchy management •Collaboration •Repeatability •Entity management
  • 17. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. IDENTIFY WHAT DO I HAVE AND HOW USEFUL IS IT? Is my data consistent? Is my data complete? Is my data highly unique?
  • 18. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. IDENTIFY WHAT DO I HAVE AND HOW USEFUL IS IT? Is my data normal? Is my data linear? What are the associations in the data?
  • 19. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. ACCESS SO MANY DATA TYPES AND SOURCES Access Excel SQLServer Oracle MySQL Boolean Yes/No Bit Byte N/A Boolean integer Number Int Number Int Int float Number (single) Float Number Float Numeric currency Currency Money NA NA Money string NA Char Char Char Char string Text VarChar VarChar VarChar VarChar binary OLE Obj Memo Binary Varbinary Image Long Raw Blob Text Binary Varbinary
  • 20. Copyr ight © 2012, SAS Institute Inc. All rights reser ved. DATA QUALITY THE FOUNDATION • Standardization • Parsing • Casing • Identification • De-duplication • “Fuzzy” matching • Clustering • Entity resolution • Survivorship • Gender Analysis • Locale Guessing • Address Verification • Address Enrichment (geocoding) Business Logic & Rules
  • 21. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. DATA QUALITY FILLING IN THE GAPS AND STANDARDIZING Standardizing Text De-duplication Standardizing Numeric
  • 22. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. FILLING IN THE GAPS AND STANDARDIZING Dropping outliers Grouping or binning data DATA QUALITY
  • 23. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. RESHAPE FIT FOR PURPOSE? Schema/view Or Flat Table? Format of data Data quality dimensions?
  • 24. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. RESHAPE FLATTENING THE DATA • Efficient storage • Fast retrieval • Defined schema • WIDE tables /Time series data • Iteration (build, test, repeat) • Schema-less
  • 25. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. RESHAPE SUMMARIZATION Each product category will become its own row, with each product purchased its own distinct category column.
  • 26. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. RESHAPE TRANSPOSITION FOR DATA MINING Add up the quantities for each product purchased, in each product category.
  • 27. Copyright © 2013, SAS Institute Inc. All rights reserved. METADATA MANAGE DATA HIERARCHIES AND RELATIONSHIPS Customer Types Hierarchy Coverage Products Financial Accounts Address Inquiries Product Party Accounts Transactions Authorizations Individual Organization Inquiries Loans Terms Collaterals Ratings External Assets
  • 28. Copyr ight © 2012, SAS Institute Inc. All rights reser ved. METADATA ENTITY RESOLUTION EMPLOYER_NA ME_GRPID EMPLOYER_NAME = Name of the client employer (SOL0003n_Employer_Name) cnt 28296ČESKOSLOVENSKÁ OBCHODNÍ BANKA, A. S 6 ČESKOSLOVENSKÁ OBCHODNÍ BANKA A.S. 182 ČSKOSLOVENSKÁ OBCHODNÍ BANKA A.S. 1 ČESKOSLOVENSKÁ OBCHODNÍ BANKA. A.S. 2 ČESKOSLOVENSKÁ OBCHODNÍ BANKA,A.S. 78 ČESKOSLOVENSKÁ OBCHODNÍ BANKA A. S. 9 ČESKOSLOVENSKÁ OBCHODNÍ BANKA ,A.S. 2 ČESKOSLOVENSKÁ OBCHODNÍ BANKA, A.S 6 ČESKOSLOVENSKÁ OBCHODNÍ BANKA, A.S . 1 ČESKOSLOVENSKÁ OBCHODNÍ BANKA, A.S. 717 ČESKOSLOVENSKUÁ OBCHODNÍ BANKA A.S. 1 ČESKOSLOVENSKÁ OBCHODNÍ BANKA A.S 1 ČESKOSLOVENSKÁ OBCHODNÍ BANKA, S.R.O. 3 ČESKOSLOVENSKÁ OBCHODNÍ BAŃKA, A.S. 1 ČESKOSLOVENSKÁ OBCHODNÍ BANKA, A. S. 587 ČESKOSLOVENSKÁOBCHODNÍBANKA, A.S. 1 ČESKOSLOVENSKÁ OBCHODNÍBANKA A.S. 1 ČESKOSLOVENSKÁ OBCHODNÍ BANLA 1 ČESKOSLOVENSKÁ OBCHODNÍ BÁNKA, A.S. 1 ČESKOSLOVENSKÁ OBCHODNÍ BANKA,A.S 2 ČESKOSLOVENSKÁ OBCHODNÍ BANKA 27 ČESKOSLOVENSKÁOBCHODNÍBANKA,A.S. 1 Example: Entity Resolution Employer Name
  • 29. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. METADATA SEMANTIC RECONCILIATION AND BUSINESS GLOSSARY Business Glossary and Terms Technical Architecture Diagram
  • 30. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. METADATA LINEAGE & TRACEABILITY A view into existing data sources/targets, jobs and the associated ‘owners’
  • 31. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. METADATA COLLABORATION AND REPEATABILITY Collaboration & Role-based Dashboarding Workflow & Data Remediation Process Orchestration Unified Lineage Job Monitoring
  • 32. Copyr ight © 2016, SAS Institute Inc. All rights reser ved. Decision MakingCustomer Focus Compliance Mandates Mergers & Acquisitions At-Risk Projects Operational Efficiencies CORPORATE DRIVERS Data Quality Data Integration Reference Data Management Master Data Management Data Visualization Data Monitoring Metadata Management Business Glossary SOLUTIONS Data Lifecycle Reference and Master Data Data Security Data Architecture Metadata Data Quality Data Administration Data Warehousing & BI/Analytics DATA MANAGEMENT DataStewardship Roles&Tasks Decision-making Bodies Guiding Principles Program Objectives Decision Rights DATA GOVERNANCE People Process Technology METHODS SAS DATA MANAGEMENT FRAMEWORK FOR SUCCESS Data Virtualization Data Profiling & Exploration
  • 33. Copyright © 2013, SAS Institute Inc. All rights reserved. QUESTIONS & ANSWERS THANK YOU! [email protected]