SlideShare a Scribd company logo
Leading Practices and Insights for Managing Data Integration Initiatives May 7, 2010
Agenda Introductions Overview Key Drivers Approaches / Strategy Tools  Case Studies Success Factors / Lessons Learned Q&A 03/09/11
About Allin Wakefield, MA, based service provider of enterprise-quality solutions and services to small and large companies Consulting practice provides technical and business expertise that helps businesses implement strategic solutions, integrate key functions, and extend technical capability. Services include:  Integrative Application Development and Performance Tuning Systems Integration and Management Data Migration to comprehensive Design Services . Solution focused in two key areas : Microsoft® SharePoint® Virtualization software from Microsoft® and VMware® Mark Bramhall, CTO Over 35 years of experience working with small and large firms Designed and implemented numerous data integration solutions Experienced technologist and integrator  03/09/11
About Optima Middleton, MA, based technology consulting firm providing IT leadership for both strategic and tactical projects Consulting practice specializes in advising small- to mid-sized firms on optimizing their technology investments and improving business processing. Key services include: Application Implementations Data Integrations  Technology Optimization  Business Process Transformation Irving Burday, President Experienced technology leader (CIO at several companies) Led several data integration and data warehouse projects  Directed / managed many complex integration efforts 03/09/11
Data Integration Overview 03/09/11 Definition  - combining data residing in different sources and providing users with a unified view of these data Mediated Schema Example Data Warehouse Example
Business / Technical Drivers Application Integrations Legacy to new system migration and conversion Legacy system feeds to new system New application / Web site processing Retirement of legacy application Business Intelligence / Analytics Data aggregation for data mining  Supporting predictive models and analytics Feeding decision support and other business intelligence needs 03/09/11
Business / Technical Drivers Data Architecture Improvements Managing feeds to/from Data Warehouse Reorganization of Operational Data Stores and Marts Coordinating feeds to/from Legacy systems Coordinating feeds to/from External entities External Factors Supporting Acquisitions / Divestitures  Managing Mergers  Facilitating new Channels and/or Sales Growth  03/09/11
Key Variables / Considerations 03/09/11 Business Related Scale of Effort Availability / Skills of Resources Funding Timing Technical Factors Platforms and operating systems Data management software Data models, schemas, and data semantics Middleware User interfaces  Frequency of integrations Business rules and integrity constraints
Approaches / Strategies 03/09/11
03/09/11 Integration approach depends on architectural level Approaches / Strategies
03/09/11 Several strategies for integrating data: Manual Integration  – users directly interact with all relevant information systems and manually integrate data. Requires users to have detailed knowledge of logical data representation and data semantics as well as dealing with different interfaces and query languages. Application Specific  – modifying applications or layering application specific code around an application to enable it to take data from or give data to external data stores. Data Propagation  - involves replicating data in different locations from different sources. Technologies include replication, database log scrapers and change data capture software.  Data Federation  - enables a single unified virtual view of one or more source data files. Data federation technique normally employs a metadata reference file to connect related customer information together based on a common key.  Approaches
Application Specific Solutions Application specific tools and utilities are frequently provided by vendors to integrate and manage data. Key considerations for utilizing this approach:  Developing and using requires deep system knowledge Best results for special-purpose applications A new data source requires new code to be written Usually optimal for one-time conversions / migrations Data cleanup frequently requires multiple human interventions Fragile if updating / changing the underlying data sources (may affect the application) Can be expensive - in terms of time and skills
03/09/11 Data Propagation Data Propagation is the distribution of data from one or more source data warehouses to one or more local access databases. Data Propagation methods include:  Bulk Extract  – utilizes copy management tools or unload utilities to extract all or a subset of the operational relational database. The data which has been extracted may be transformed to the format used by the target on the host or target server .  File Compare  – process compares the newly extracted operational data to the previous version. After that, a set of incremental change records is created and are applied. Change Data Propagation  – captures and records the changes to the file as part of the application change process. Techniques include triggers, log exits, or DBMS extensions. A file of incremental changes is created to contain the captured changes.
Data Federation Links data from two or more physically different locations, making the access/linkage appear transparent as if the data was co-located (versus a data warehouse that houses data in one location). Key elements of a data federated approach: Middleware consisting of database management system Uniform access to number of heterogeneous data sources Provides query language used to combine, contrast, analyze and manipulate the data Data integration can be done through database integration Combine data from multiple sources with a single SQL statement Create a master system that relates data elements from all line of business systems
Data Integration Tools 03/09/11
03/09/11 ETL ETL: Extract, Transform and Load ETL tools extract data from a chosen source(s), transform it into new formats according to business rules, and then load it into target data structure(s) Enables rules and processes for managing diverse data sources and processing of high volumes of data  Direct insight into source data before; provides data profiling and quality control capabilities Provides ability to map physical data items with a unique metadata description or create an abstraction layer of common business definitions to map all similar data items to the same definition
ETL Integrated Architecture XYZ Corp Systems  Data Extraction &  Integration Business Process Layer Information Management Presentation Layer Accessible throughout the organization Distribution Ad-hoc Analysis Tools Core Business Systems Financial Systems CRM Systems Business Line systems Other… External Sources Architecture Portals/Dashboards Production Reports Ad-hoc reports Query Extracts Other data analysis and reporting tools Reporting Data   Storage Transformation Subject Areas Transformation Extract, Transformation, & Load (ETL) Layer The “Unified Business Model” and Information Management Analytics and Reporting Tools OLAP Cubes &  Predictive Models Conformed facts and shared dimensions Pre-aggregated data stored in  OLAP models to support variance analysis, exception reporting and drill-down Real-time drill through into relational storage Structured and unstructured content Exception Notifications Mining Extract, Transformation, and Load (ETL) Message Broker Integration of actual data  Metadata (structured data) management Security Management Administration
03/09/11 EAI EAI: Enterprise Application Integration An integration framework composed of a collection of technologies and services A centralized broker that handles security, access, and communication An independent data model based on a standard data structure (e.g., XML)  A connector or agent model where each vendor, application, or interface can build a single component that can speak natively to that application and communicate with the centralized broker A system model that defines the APIs, data flow and rules of engagement to the system such that components can be built to interface with it in a standardized way
03/09/11 EAI Architecture
03/09/11 Tool Comparison ETL versus EAI – what’s the difference?
03/09/11 ETL Tools - Key Features Architecture Parallel Processing Scalability Job Distribution, Pipelining, Partitioning Common Warehouse Model (CWM) compliant Version Control  ETL Functionality Managing Data Streams (multiple targets, splitting) Pivoting, De-pivoting, Unions Lookups Scheduling Error Handling Data Extraction &  Integration Data   Storage Transformation Transformation Extract, Transformation, & Load (ETL) Layer Extract, Transformation, and Load (ETL) Message Broker Integration of actual data  Metadata (structured data) management Security Management Administration
03/09/11 ETL Tools – Key Features  (continued) Reusability Reuse of components Decomposition Debugging  Step by step, row by row, breakpoints Compiler / Validater Connectivity Native connections support (ODBC, OLE DB, Flat Files) Integration to package / application meta data Data Quality, Data Validation Ease-of-Use WYSIWYG Documentation Data Extraction &  Integration Data   Storage Transformation Transformation Extract, Transformation, & Load (ETL) Layer Extract, Transformation, and Load (ETL) Message Broker Integration of actual data  Metadata (structured data) management Security Management Administration
Challenges 03/09/11
03/09/11 Data Preparation / Quality Completeness / Accuracy of Data Records Duplicates Half Match-able Data Freshness of Data Technology Issues Multiple and Mixed Data Formats  Disparate Operating systems and processing platforms Source system constraints  Organizational Business and IT Politics  Ownership / Stewardship of Source Data  Dedication of IT resource to manage daily functions  Challenges
03/09/11 Level of automation Time based Event based Frequency Error handling Reporting Requirements Ownership of Error Remediation Technical failures Data failures Auto-correction versus manual updating Batch integrity Challenges  (continued)
03/09/11 Data Handling  Transformations Manipulations Transmission Magnitude of Effort Number of systems Volume of data Number of runs Integration Challenges  (continued)
Case Studies 03/09/11
Business Case 1 03/09/11 Service Company – Revenue: $200m, Size: 300 FTEs Client’s Business Challenge: Integrating data from customer web sites / CRM systems into operational and financial systems Client’s objective was to build a one-time solution to manage data migrations Solution: Use SSIS to develop a data migration framework that would allows transformation of data Build custom stored procedure scripts to extract data from legacy applications Lessons Learned: Data rules and manipulations required extensive analysis and documentation in order to streamline future state process Created cross tabular map of legacy application tables to facilitate data mappings and data handling procedures during conversion and testing activities
03/09/11 03/09/11 Business Case 2 Education Company – Revenue: $500m, Size: 300 FTEs Client’s Business Challenge: Integrating data from a set of outsourced-function partners Integration needed to be real time as clients transited Web sites, but could not fail in the face of network outages, system failures, etc Solution: Understand who is the data master and who only keep shadow copies for each type of data Design a way to uniquely identify data, even if multiple sources can create it Deploy a publish / subscribe solution using reliable, persistent message queuing Lessons Learned: You cannot know your data too well; subtle relationships must become explicit Multi-partner integration requires extremely simple interfaces and definitions
03/09/11 Business Case 3 Healthcare Company – Revenue: $150m, Size: 200 FTEs Client’s Business Challenge: Integrating data feeds from source systems into new data warehouse Implementing a data hub to manage data feeds from external entities (e.g., customers, banks) into financial and customer support systems Solution: Select / implement a full featured ETL tool to manage and handle data warehouse and miscellaneous data feeds Created data extracts from sources to manage data extract requirements and file formats Deploy a data quality program that cleansed incoming and transferred data prior to loading into destination system Lessons Learned: Error handling required additional time and effort to define error cases and remediation actions Data ownership required executive intervention to staff and manage data management process
Conclusions 03/09/11
The Value of a Data Quality Effort  Data Remediation Data management processes can not allow junk data to be loaded, migrated or transported into a target system Data remediation procedures should be designed into every solution  Key Performance Indicators: Data Quality Compliance Data quality indicators should be defined and monitored at all times The KPIs should be used by the data management operations team to manage data processing and testing The KPIs for management must be business focused and should show how poor data quality is financially effecting their business 03/09/11
The Value of a Data Quality Effort  Check Twice, Load Once Data should be checked for validity prior to being loaded into target Designers and developer MUST log the exact data quality errors and issues that are present in the data being processed The data quality errors and issues must be summarized and reported on. Reports can be used by operations and source data owners to remediate the data and drive data compliance 03/09/11
Importance of Governance Poor Governance and Lack of Communication account for over 85% of the issues in a data integration project Size 03/09/11 Incorrect hardware or software 7% 2% Failure to define objectives 17% Unfamiliarity with scope and complexity 17% Lack of communication 20% Inadequate Project Management 32% Other 5%
Data Stewardship Data stewards act as the conduit between IT and the business and accept accountability for the data management process.  Domain Values Data Standards Business Rule Specifications Data Ownership Rules Data Quality Rules Security Requirements Data Retention Criteria Data Stewards play a the central role in the management of data across the organization and in assuring its usefulness for the business. 03/09/11 Data stewards become the “public face” for data and have the following responsibilities: IT Business Business Data  Stewards
Success Factors Establish and agree upon scope, high level requirements, expected benefits, and architecture Benefits need to be emphasized from the top down and understood from the bottom up Data integrity and Data cleansing cannot be over-emphasized Even well-documented systems are usually prone to poor data quality Common definitions and mapping is crucial A complex business is not made any less complex by documenting the data and putting it in a operational store Knowledgeable use of the data will still require knowledgeable users 03/09/11
Success Factors  (continued) Technology is only part of the answer No mater how sophisticated the implementation, significant process change will be required But, technology is key to success Having a key partner who has done this before will minimize risk Much can be learned from similar efforts This effort requires a full-time dedicated set of highly-skilled resources Both technical and business knowledge are required 03/09/11
Appendices 03/09/11
03/09/11 ETL Vendors ETL Vendors ETL Tools Microsoft   SQL Server Integration Services  Oracle   Oracle Warehouse Builder (OWB) SAP Business Objects Data Integrator & Data Services  IBM IBM Information Server (Datastage) IBM Data Manager/Decision Stream (Cognos) SAS Institute  SAS Data Integration Studio Informatica   PowerCenter Ab Initio Co>Operating System Information Builders Data Migrator Adeptia Adeptia Integration Server CastIron Systems OmniConnect Platform Pitney Bowes Business Insight DataFlow Manager Pervasive Data Integrator Elixir  Elixir Repertoire Javlin   Clover ETL Pentaho   Pentaho Data Integration  Talend  Talend Open Studio
03/09/11 ETL / EAI - Tool Strengths ETL  EAI  Excels at bulk data movement  Limited in data movement capabilities  Provides for complex transformations, aggregation from multiple sources and sophisticated business rules.  Offer less sophisticated transformation and extraction functions  Assumes data delays.  Operates in real time  Are batch-oriented, making them fast and simple for one-time projects and testing  Work better with continuously interacting systems  Offers little in the way of workflow  Workflow-oriented at the core  Works primarily at the session layer  Works primarily at the transport layer

More Related Content

What's hot (20)

PPTX
Master Data Management methodology
Database Architechs
 
PPTX
Data Warehousing Trends, Best Practices, and Future Outlook
James Serra
 
PDF
Master Data Management - Aligning Data, Process, and Governance
DATAVERSITY
 
PPTX
Master Data Management
Sreekanth Narendran
 
PPTX
Data Lakehouse, Data Mesh, and Data Fabric (r2)
James Serra
 
PPTX
ETL Process
Rashmi Bhat
 
PDF
Collibra - Forrester Presentation : Data Governance 2.0
Guillaume LE GALIARD
 
PPS
Data Warehouse 101
PanaEk Warawit
 
PDF
Data Architecture Strategies: Data Architecture for Digital Transformation
DATAVERSITY
 
PDF
Building Lakehouses on Delta Lake with SQL Analytics Primer
Databricks
 
PDF
Data Vault Introduction
Patrick Van Renterghem
 
PDF
Reference master data management
Dr. Hamdan Al-Sabri
 
PDF
Azure Data Factory v2
inovex GmbH
 
PPTX
Big data architectures and the data lake
James Serra
 
PDF
Data engineering design patterns
Valdas Maksimavičius
 
PPTX
Intro to Data Vault 2.0 on Snowflake
Kent Graziano
 
PDF
Data Mesh Part 4 Monolith to Mesh
Jeffrey T. Pollock
 
PPT
Adopting a Process-Driven Approach to Master Data Management
Software AG
 
PPTX
Etl - Extract Transform Load
ABDUL KHALIQ
 
PPTX
Healthcare Data Warehouse Models Explained
Health Catalyst
 
Master Data Management methodology
Database Architechs
 
Data Warehousing Trends, Best Practices, and Future Outlook
James Serra
 
Master Data Management - Aligning Data, Process, and Governance
DATAVERSITY
 
Master Data Management
Sreekanth Narendran
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
James Serra
 
ETL Process
Rashmi Bhat
 
Collibra - Forrester Presentation : Data Governance 2.0
Guillaume LE GALIARD
 
Data Warehouse 101
PanaEk Warawit
 
Data Architecture Strategies: Data Architecture for Digital Transformation
DATAVERSITY
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Databricks
 
Data Vault Introduction
Patrick Van Renterghem
 
Reference master data management
Dr. Hamdan Al-Sabri
 
Azure Data Factory v2
inovex GmbH
 
Big data architectures and the data lake
James Serra
 
Data engineering design patterns
Valdas Maksimavičius
 
Intro to Data Vault 2.0 on Snowflake
Kent Graziano
 
Data Mesh Part 4 Monolith to Mesh
Jeffrey T. Pollock
 
Adopting a Process-Driven Approach to Master Data Management
Software AG
 
Etl - Extract Transform Load
ABDUL KHALIQ
 
Healthcare Data Warehouse Models Explained
Health Catalyst
 

Similar to Managing Data Integration Initiatives (20)

PDF
Scope of Data Integration
HEXANIKA
 
PPT
Datawarehousing & DSS
Deepali Raut
 
PDF
Mdm Is Not Enough, Semantic Enterprise Is
Semyon Axelrod
 
PDF
Mapping Manager Product Overview
Rakesh Kumar
 
PPT
Fusion Middleware Oracle Data Integrator
Mark Rabne
 
PDF
Best Practices for Designing and Building Integrations
Alithya
 
PPTX
Building the enterprise data architecture
Costa Pissaris
 
PPTX
Fi nf068c73aef66f694f31a049aff3f4
Shawn D'souza
 
PPTX
Tera stream for datastreams
치민 최
 
PDF
E-Business Suite 2 _ Ben Davis _ Achieving outstanding optim data management ...
InSync2011
 
PPTX
Advanced Topics In Business Intelligence
guest1a9ef2
 
PPTX
A Complete Guide to Data Integration Tools
IFI Techsolutions Ltd
 
PDF
Big data analytics beyond beer and diapers
Kai Zhao
 
PDF
Data Integration: Creating a Trustworthy Data Foundation for Business Intelli...
FindWhitePapers
 
PPTX
Data junction tool
Sara shall
 
PDF
Bringing Agility and Flexibility to Data Design and Integration
DATAVERSITY
 
PDF
Mergers & Acquisitions
dmurph4
 
PDF
Enterprise 365 - SoftServe presentation
Sergii Alekseev
 
PPT
ETL Market Webcast
mark madsen
 
PDF
20100430 introduction to business objects data services
Junhyun Song
 
Scope of Data Integration
HEXANIKA
 
Datawarehousing & DSS
Deepali Raut
 
Mdm Is Not Enough, Semantic Enterprise Is
Semyon Axelrod
 
Mapping Manager Product Overview
Rakesh Kumar
 
Fusion Middleware Oracle Data Integrator
Mark Rabne
 
Best Practices for Designing and Building Integrations
Alithya
 
Building the enterprise data architecture
Costa Pissaris
 
Fi nf068c73aef66f694f31a049aff3f4
Shawn D'souza
 
Tera stream for datastreams
치민 최
 
E-Business Suite 2 _ Ben Davis _ Achieving outstanding optim data management ...
InSync2011
 
Advanced Topics In Business Intelligence
guest1a9ef2
 
A Complete Guide to Data Integration Tools
IFI Techsolutions Ltd
 
Big data analytics beyond beer and diapers
Kai Zhao
 
Data Integration: Creating a Trustworthy Data Foundation for Business Intelli...
FindWhitePapers
 
Data junction tool
Sara shall
 
Bringing Agility and Flexibility to Data Design and Integration
DATAVERSITY
 
Mergers & Acquisitions
dmurph4
 
Enterprise 365 - SoftServe presentation
Sergii Alekseev
 
ETL Market Webcast
mark madsen
 
20100430 introduction to business objects data services
Junhyun Song
 
Ad

Managing Data Integration Initiatives

  • 1. Leading Practices and Insights for Managing Data Integration Initiatives May 7, 2010
  • 2. Agenda Introductions Overview Key Drivers Approaches / Strategy Tools Case Studies Success Factors / Lessons Learned Q&A 03/09/11
  • 3. About Allin Wakefield, MA, based service provider of enterprise-quality solutions and services to small and large companies Consulting practice provides technical and business expertise that helps businesses implement strategic solutions, integrate key functions, and extend technical capability. Services include: Integrative Application Development and Performance Tuning Systems Integration and Management Data Migration to comprehensive Design Services . Solution focused in two key areas : Microsoft® SharePoint® Virtualization software from Microsoft® and VMware® Mark Bramhall, CTO Over 35 years of experience working with small and large firms Designed and implemented numerous data integration solutions Experienced technologist and integrator 03/09/11
  • 4. About Optima Middleton, MA, based technology consulting firm providing IT leadership for both strategic and tactical projects Consulting practice specializes in advising small- to mid-sized firms on optimizing their technology investments and improving business processing. Key services include: Application Implementations Data Integrations Technology Optimization Business Process Transformation Irving Burday, President Experienced technology leader (CIO at several companies) Led several data integration and data warehouse projects Directed / managed many complex integration efforts 03/09/11
  • 5. Data Integration Overview 03/09/11 Definition - combining data residing in different sources and providing users with a unified view of these data Mediated Schema Example Data Warehouse Example
  • 6. Business / Technical Drivers Application Integrations Legacy to new system migration and conversion Legacy system feeds to new system New application / Web site processing Retirement of legacy application Business Intelligence / Analytics Data aggregation for data mining Supporting predictive models and analytics Feeding decision support and other business intelligence needs 03/09/11
  • 7. Business / Technical Drivers Data Architecture Improvements Managing feeds to/from Data Warehouse Reorganization of Operational Data Stores and Marts Coordinating feeds to/from Legacy systems Coordinating feeds to/from External entities External Factors Supporting Acquisitions / Divestitures Managing Mergers Facilitating new Channels and/or Sales Growth 03/09/11
  • 8. Key Variables / Considerations 03/09/11 Business Related Scale of Effort Availability / Skills of Resources Funding Timing Technical Factors Platforms and operating systems Data management software Data models, schemas, and data semantics Middleware User interfaces Frequency of integrations Business rules and integrity constraints
  • 10. 03/09/11 Integration approach depends on architectural level Approaches / Strategies
  • 11. 03/09/11 Several strategies for integrating data: Manual Integration – users directly interact with all relevant information systems and manually integrate data. Requires users to have detailed knowledge of logical data representation and data semantics as well as dealing with different interfaces and query languages. Application Specific – modifying applications or layering application specific code around an application to enable it to take data from or give data to external data stores. Data Propagation - involves replicating data in different locations from different sources. Technologies include replication, database log scrapers and change data capture software. Data Federation - enables a single unified virtual view of one or more source data files. Data federation technique normally employs a metadata reference file to connect related customer information together based on a common key. Approaches
  • 12. Application Specific Solutions Application specific tools and utilities are frequently provided by vendors to integrate and manage data. Key considerations for utilizing this approach: Developing and using requires deep system knowledge Best results for special-purpose applications A new data source requires new code to be written Usually optimal for one-time conversions / migrations Data cleanup frequently requires multiple human interventions Fragile if updating / changing the underlying data sources (may affect the application) Can be expensive - in terms of time and skills
  • 13. 03/09/11 Data Propagation Data Propagation is the distribution of data from one or more source data warehouses to one or more local access databases. Data Propagation methods include: Bulk Extract – utilizes copy management tools or unload utilities to extract all or a subset of the operational relational database. The data which has been extracted may be transformed to the format used by the target on the host or target server . File Compare – process compares the newly extracted operational data to the previous version. After that, a set of incremental change records is created and are applied. Change Data Propagation – captures and records the changes to the file as part of the application change process. Techniques include triggers, log exits, or DBMS extensions. A file of incremental changes is created to contain the captured changes.
  • 14. Data Federation Links data from two or more physically different locations, making the access/linkage appear transparent as if the data was co-located (versus a data warehouse that houses data in one location). Key elements of a data federated approach: Middleware consisting of database management system Uniform access to number of heterogeneous data sources Provides query language used to combine, contrast, analyze and manipulate the data Data integration can be done through database integration Combine data from multiple sources with a single SQL statement Create a master system that relates data elements from all line of business systems
  • 16. 03/09/11 ETL ETL: Extract, Transform and Load ETL tools extract data from a chosen source(s), transform it into new formats according to business rules, and then load it into target data structure(s) Enables rules and processes for managing diverse data sources and processing of high volumes of data Direct insight into source data before; provides data profiling and quality control capabilities Provides ability to map physical data items with a unique metadata description or create an abstraction layer of common business definitions to map all similar data items to the same definition
  • 17. ETL Integrated Architecture XYZ Corp Systems Data Extraction & Integration Business Process Layer Information Management Presentation Layer Accessible throughout the organization Distribution Ad-hoc Analysis Tools Core Business Systems Financial Systems CRM Systems Business Line systems Other… External Sources Architecture Portals/Dashboards Production Reports Ad-hoc reports Query Extracts Other data analysis and reporting tools Reporting Data Storage Transformation Subject Areas Transformation Extract, Transformation, & Load (ETL) Layer The “Unified Business Model” and Information Management Analytics and Reporting Tools OLAP Cubes & Predictive Models Conformed facts and shared dimensions Pre-aggregated data stored in OLAP models to support variance analysis, exception reporting and drill-down Real-time drill through into relational storage Structured and unstructured content Exception Notifications Mining Extract, Transformation, and Load (ETL) Message Broker Integration of actual data Metadata (structured data) management Security Management Administration
  • 18. 03/09/11 EAI EAI: Enterprise Application Integration An integration framework composed of a collection of technologies and services A centralized broker that handles security, access, and communication An independent data model based on a standard data structure (e.g., XML) A connector or agent model where each vendor, application, or interface can build a single component that can speak natively to that application and communicate with the centralized broker A system model that defines the APIs, data flow and rules of engagement to the system such that components can be built to interface with it in a standardized way
  • 20. 03/09/11 Tool Comparison ETL versus EAI – what’s the difference?
  • 21. 03/09/11 ETL Tools - Key Features Architecture Parallel Processing Scalability Job Distribution, Pipelining, Partitioning Common Warehouse Model (CWM) compliant Version Control ETL Functionality Managing Data Streams (multiple targets, splitting) Pivoting, De-pivoting, Unions Lookups Scheduling Error Handling Data Extraction & Integration Data Storage Transformation Transformation Extract, Transformation, & Load (ETL) Layer Extract, Transformation, and Load (ETL) Message Broker Integration of actual data Metadata (structured data) management Security Management Administration
  • 22. 03/09/11 ETL Tools – Key Features (continued) Reusability Reuse of components Decomposition Debugging Step by step, row by row, breakpoints Compiler / Validater Connectivity Native connections support (ODBC, OLE DB, Flat Files) Integration to package / application meta data Data Quality, Data Validation Ease-of-Use WYSIWYG Documentation Data Extraction & Integration Data Storage Transformation Transformation Extract, Transformation, & Load (ETL) Layer Extract, Transformation, and Load (ETL) Message Broker Integration of actual data Metadata (structured data) management Security Management Administration
  • 24. 03/09/11 Data Preparation / Quality Completeness / Accuracy of Data Records Duplicates Half Match-able Data Freshness of Data Technology Issues Multiple and Mixed Data Formats Disparate Operating systems and processing platforms Source system constraints Organizational Business and IT Politics Ownership / Stewardship of Source Data Dedication of IT resource to manage daily functions Challenges
  • 25. 03/09/11 Level of automation Time based Event based Frequency Error handling Reporting Requirements Ownership of Error Remediation Technical failures Data failures Auto-correction versus manual updating Batch integrity Challenges (continued)
  • 26. 03/09/11 Data Handling Transformations Manipulations Transmission Magnitude of Effort Number of systems Volume of data Number of runs Integration Challenges (continued)
  • 28. Business Case 1 03/09/11 Service Company – Revenue: $200m, Size: 300 FTEs Client’s Business Challenge: Integrating data from customer web sites / CRM systems into operational and financial systems Client’s objective was to build a one-time solution to manage data migrations Solution: Use SSIS to develop a data migration framework that would allows transformation of data Build custom stored procedure scripts to extract data from legacy applications Lessons Learned: Data rules and manipulations required extensive analysis and documentation in order to streamline future state process Created cross tabular map of legacy application tables to facilitate data mappings and data handling procedures during conversion and testing activities
  • 29. 03/09/11 03/09/11 Business Case 2 Education Company – Revenue: $500m, Size: 300 FTEs Client’s Business Challenge: Integrating data from a set of outsourced-function partners Integration needed to be real time as clients transited Web sites, but could not fail in the face of network outages, system failures, etc Solution: Understand who is the data master and who only keep shadow copies for each type of data Design a way to uniquely identify data, even if multiple sources can create it Deploy a publish / subscribe solution using reliable, persistent message queuing Lessons Learned: You cannot know your data too well; subtle relationships must become explicit Multi-partner integration requires extremely simple interfaces and definitions
  • 30. 03/09/11 Business Case 3 Healthcare Company – Revenue: $150m, Size: 200 FTEs Client’s Business Challenge: Integrating data feeds from source systems into new data warehouse Implementing a data hub to manage data feeds from external entities (e.g., customers, banks) into financial and customer support systems Solution: Select / implement a full featured ETL tool to manage and handle data warehouse and miscellaneous data feeds Created data extracts from sources to manage data extract requirements and file formats Deploy a data quality program that cleansed incoming and transferred data prior to loading into destination system Lessons Learned: Error handling required additional time and effort to define error cases and remediation actions Data ownership required executive intervention to staff and manage data management process
  • 32. The Value of a Data Quality Effort Data Remediation Data management processes can not allow junk data to be loaded, migrated or transported into a target system Data remediation procedures should be designed into every solution Key Performance Indicators: Data Quality Compliance Data quality indicators should be defined and monitored at all times The KPIs should be used by the data management operations team to manage data processing and testing The KPIs for management must be business focused and should show how poor data quality is financially effecting their business 03/09/11
  • 33. The Value of a Data Quality Effort Check Twice, Load Once Data should be checked for validity prior to being loaded into target Designers and developer MUST log the exact data quality errors and issues that are present in the data being processed The data quality errors and issues must be summarized and reported on. Reports can be used by operations and source data owners to remediate the data and drive data compliance 03/09/11
  • 34. Importance of Governance Poor Governance and Lack of Communication account for over 85% of the issues in a data integration project Size 03/09/11 Incorrect hardware or software 7% 2% Failure to define objectives 17% Unfamiliarity with scope and complexity 17% Lack of communication 20% Inadequate Project Management 32% Other 5%
  • 35. Data Stewardship Data stewards act as the conduit between IT and the business and accept accountability for the data management process. Domain Values Data Standards Business Rule Specifications Data Ownership Rules Data Quality Rules Security Requirements Data Retention Criteria Data Stewards play a the central role in the management of data across the organization and in assuring its usefulness for the business. 03/09/11 Data stewards become the “public face” for data and have the following responsibilities: IT Business Business Data Stewards
  • 36. Success Factors Establish and agree upon scope, high level requirements, expected benefits, and architecture Benefits need to be emphasized from the top down and understood from the bottom up Data integrity and Data cleansing cannot be over-emphasized Even well-documented systems are usually prone to poor data quality Common definitions and mapping is crucial A complex business is not made any less complex by documenting the data and putting it in a operational store Knowledgeable use of the data will still require knowledgeable users 03/09/11
  • 37. Success Factors (continued) Technology is only part of the answer No mater how sophisticated the implementation, significant process change will be required But, technology is key to success Having a key partner who has done this before will minimize risk Much can be learned from similar efforts This effort requires a full-time dedicated set of highly-skilled resources Both technical and business knowledge are required 03/09/11
  • 39. 03/09/11 ETL Vendors ETL Vendors ETL Tools Microsoft  SQL Server Integration Services Oracle  Oracle Warehouse Builder (OWB) SAP Business Objects Data Integrator & Data Services  IBM IBM Information Server (Datastage) IBM Data Manager/Decision Stream (Cognos) SAS Institute SAS Data Integration Studio Informatica  PowerCenter Ab Initio Co>Operating System Information Builders Data Migrator Adeptia Adeptia Integration Server CastIron Systems OmniConnect Platform Pitney Bowes Business Insight DataFlow Manager Pervasive Data Integrator Elixir Elixir Repertoire Javlin  Clover ETL Pentaho  Pentaho Data Integration Talend Talend Open Studio
  • 40. 03/09/11 ETL / EAI - Tool Strengths ETL EAI Excels at bulk data movement Limited in data movement capabilities Provides for complex transformations, aggregation from multiple sources and sophisticated business rules. Offer less sophisticated transformation and extraction functions Assumes data delays. Operates in real time Are batch-oriented, making them fast and simple for one-time projects and testing Work better with continuously interacting systems Offers little in the way of workflow Workflow-oriented at the core Works primarily at the session layer Works primarily at the transport layer

Editor's Notes

  • #4: 3
  • #5: 3
  • #14: Bulk Extract – utilizes copy management tools or unload utilities to extract all or a subset of the operational relational database. The data which has been extracted may be transformed to the format used by the target on the host or target server . The DBMS system load tools are then used in order to refresh the database target. File Compare – process compares the newly extracted operational data to the previous version. After that, a set of incremental change records is created and are applied as updates to the target server within the scheduled process. Change Data Propagation – captures and records the changes to the file as part of the application change process. Techniques that can be used include triggers, log exits, log post processing or DBMS extensions. A file of incremental changes is created to contain the captured changes.
  • #36: Data stewardship involves taking responsibility for data elements for their end-to-end usage across the enterprise .