Exploiting data quality tools  to meet the expectation of strategic business users
Key Purpose To evaluate the significance of Data Quality. To understand Data Quality and its relationship with business functionalities. To ensure the data reliability required.
What is Data Quality It is the state of completeness, validity, consistency, timeliness, and accuracy that makes data appropriate for a specific use. Data quality should be a concern throughout the design, development, and operational phases of creating a data integration / data driven project.
Key Issues Does the enterprise   vision and mission include the concepts of information as a strategic   business resource that adds value to its products and services for customer satisfaction?  Does Information   Systems include success criteria being measured by knowledge   workers and end   customers?
Benefits of Data Quality I can accomplish my objectives For the right  purpose Where  I need it At the right  place When  I need it At the right  time I can  use  it  easily In the right  format I can  trust  and rely on it With the right  accuracy Whose  meaning  I know In the right  context All  the data I need With the right  completeness The data I  need The  right  data Business User Benefit Quality Characteristics
BI Model
Business Goal To provide the Right Data to the Right People at the Right Time Improved quality of Key Performance Indicators (KPI’s) Greater confidence in analytical systems
Information Management (IM) - Framework Component Control over the IM process of Data Quality that satisfies the business requirement To ensure the availability of complete, accurate and timely data  To improve Customer interaction Revenue and Profits Day to Day decision making Effectiveness of the Strategy Operations
Goals & Metrics Process Goals Tracking and reporting the measures through scorecards / Reports Resolve (Identify, investigate / analyse, fix) the issues Implement the action plans provided by stakeholders to improve Quality Process Goals
Business Data
Business Data –  Graphical Presentation
Exploiting Data Quality Tools
Data profiling is the process of systematically scanning and analyzing the contents of all the columns in tables of interest. Identify data defects Table column analysis Frequency distribution Min / Max / Outlier detection Data type analysis Unique / Null analysis Metadata analysis Data Profiling:  Scoping the benefits
Data Profiling – Example SAS Report
Data Parsing 20 yr. Fixed, 14.5 percent, 2% orig + 1 point Term:  20 years Rate: 14.5%  Points: 1% Origination Fee:  2% Type: Fixed Loan data Mr Khadim A Bukari Shah Name Prefix:  Mr First Name:  Khadim Middle Name: A Last Name: Bukhari Name Suffix: Shah Address data
Data Standardization Company, co., co, comp,  VP Sales, Sales Director, V.P. Sales Avenue, Ave, AVENUE, ave. Pakistan, PK, PAK, pk, pakistan Government, Govt., government, govt Company VP Sales AVE Pakistan Government
Data Clustering and De-duplication 4 1 Standard Chartered Bank 4 1 SCB 4 1 Standard Chartered 1 1 3 3 8 Count 3 Imperial Chemical Industries 3 I C I 3 ICI. 3 I. C. I. 3 ICI Cluster Customer
Householding – Family   Clustering Bus Bus Ind Ind Ind Ind Ind Ind Account Type I I Chundrigarh, Karachi SCB Main I. I. Chundrigarh Rd, Khi Standard Chartered Khan Defence, Karachi Khan, Arif Khan Defence, Karachi Farah Khan Khan Defence, Karachi Khan, Shehzad Kashif Clifton – 5 Kashif, Shazia Clifton – 5 Hse # 5, St # 3, KDA Scheme 1, Lahore Address Kashif Saeed, Kashif Shah Nadir Shah Family Customer
The Data Quality Metrics Record Completeness Measures the percentage of required data fields which is sourced from the Credit Data Marts (CDM) Computed as (Number of Required Data Fields in CDM / Total Number of Required Data Fields)*100 Example Suppose 9 Fields are required for a specific model: Record Completeness: = Fields in Warehouse / Fields Required = (7/ 9) *100 =  78% Note: No source of data available for Fields 7 & 9  9) Behavior Score 8) Total O/s 7) O/s bal dpd 90 days 6) O/s bal dpd 60 days 5) O/s bal dpd 30 days 4) No of Times A/c delinquent 3) Date of Account open 2) Customer Segment 1) Product Code Fields required for Past Due (PD) calculation 7) Total O/s 6) O/s bal dpd 60 days 5) O/s bal dpd 30 days 4) No of Times A/c delinquent 3) Date of Account open 2) Customer Segment 1) Product Code Fields in Warehouse
The Data Quality Metrics  ……Contd. Field Population Measures the Level of Population for each data field Metrics are in Defects per Million Observations (DPMO) which is calculated for each data field as (Number of Records with Missing Values / Total Number of Records)*1,000,000 Example Suppose there exist 10 records with information as given below Field Population: Name: (0/10)*1,000,000 =  0 DPMO   Credit Limit: (1/10)*1,000,000 =  100,000 DPMO Gender: (3/10)*1,000,000 =  300,000 DPMO Address1: (0/10)*1,000,000 =  0 DPMO Address2: (0/10)*1,000,000 =  0 DPMO City 1 Road 2 M 900 Siraj City 2 Road 1 F 800 Ayesha City 1 Road 2 F 300 Maria City 1 Road 2 M 400 Tabrez City 1 Road 2 500 Asif City 1 Road 2 300 Halani City 1 Road 2 100 Pervaiz City 2 Road 1 F Farah City 1 Road 2 F 2000 Sadia City 1 Road 1 M 1000 Naheed Address2 Address1 Gender Credit Limit Name
The Data Quality Metrics  ……Contd.   Validity Measures the Level of Validity for each data field based on specific business rules and reference lookup tables Metrics are in Defects per Million Observations (DPMO) which is calculated for each data field as (Number of Records with Invalid Values / Total Number of Records)*1,000,000 Example Suppose there exist 10 records with information as given below Business Rule used: Customer Name : Should not contain number Credit Limit: should not be blank or less than 0 Gender: Should contain only F or M Validity: Customer Name = (1/10)*1,000,000    =  100,000 DPMO Credit Limit = (3/10)*1,000,000    =  300,000 DPMO Gender = (4/10)*1,000,000  =  400,000 DPMO Country3 Road3 M 900 Siraj Country4 Road3 F 800 Nadia Country4 Road2 F 300 Behnaz Country4 Road1 M Chishti Country5 Road2 G 500 8CV8 Country5 Road2 G 300 Marvi Country4 Road3 G -100 Khayyam Country3 Road3 F -200 Sadia Country2 Road3 G 2000 Ishaq Country1 Road1 M 1000 Shahid Country Address1 Gender Credit Limit Customer Name
The Data Quality Metrics  ……Contd. Consistency Measures if the data stored is trustable or can be taken as true information  Computed as (Sum of Outstanding Amount and Provisions in CDM / Sum of Balance as per GL)*100 Example Data Warehouse Report GL Trial Balance Consistency: = Sum of Balances (CDM) / Sum of Balances (GL) = (300+500) / (310+500) =  99% Data Warehouse Data Warehouse 500 200 100 NFI 001001200 Saving A/c with Cheque book Ali 00200300  Credit Card Siraj 001001200 Current A/c Shahid GL Code Product Customer 500 00200300 300 001001200 Amount GL Code Unsecured Product-NFI Wealth Product-NFI Account Description 500 00200300 310 001001200 Amount Account ID
The Data Quality Metrics  ……Contd. Relevance Measures the degree to which the data has a definition and an explanation Calculated as (Number of Date Fields which have available metadata / Total Number of Data Fields)*100 Example Data Dictionary Fields Data Warehouse Fields Relevance: Data Warehouse  = 2/5*100  =  40% f_bankrupt a_curr_bal a_highest_paymt a_crlmt id_acct For CCMS, scan ACARDDET for the current month for CTXN = Payments.  Compare ATXN with previous A_HIGHEST_PAYMT.  If A_HIGHEST_PAYMT is greater then retain value else set ATXN. Direct Extract PDW Business Rule Direct Extract from PDW Direct Extract Highest Payment Amount Maximum payment since the account opened B_BCDM_CARD_SMRY_HIST_M01 a_highest_paymt Direct Extract Direct Extract Account Common identifier A unique value assigned to each account for masing purposes B_BCDM_CARD_ACCT_HIST_M01 id_acct CDM Business Rule TP Business Rule Attribute Definition Business Definition Table Name DW Field Name
The Data Quality Metrics  ……Contd. Availability Measures the percentage of tables which were completely loaded into warehouse and available for users on scheduled time Computed as (Number of tables loaded on time) / (Number of all the tables) Example Detail of receipt of Input files for warehouse Availability: = Tables Loaded on time / All Tables = (4/6) *100 =  67% Latency Measures the Number of Days that the tables were delayed from month end date From same example above, Latency is  8 days . -1 day 4-Mar 5-Mar 3 days 8-Mar 5-Mar 0 day 5-Mar 5-Mar -1 day 0 day 2 days Delay 4-Mar 5-Mar 5-Mar 5-Mar 7-Mar 5-Mar Actual Date Expected Date
Audit Components Data Quality scorecards Action Plan documentations Country Escalation Process Operational data entry process Audit Guidelines
Audit Population IM Governance Committee members General Manager (GM’s) / Group Product Heads Systems Development & Application Support Team Inputs / Outputs BRD forms for the Data quality targets Data Quality Scorecards / Reports Operational Data Entry Processes Audit Guidelines   ……Contd.
Data Quality Reporting Process Categorization of Data Every Data is categorized into  Regulatory Attribute Flag indicating whether the data is being used for regulatory reporting Analytical Attribute Flag indicating whether the data is required for scorecard , behavior analytics Performance Measures (KPI’s) Flag to indicate whether the data is used for MIS purpose or not
Conclusion Key benefits: Faster and reliable financial and management reporting  Higher Return on Investment (ROI) on marketing automation campaigns Faster migration of legacy data to new ERP systems
Thank You

More Related Content

PPT
DW DIMENSN MODELNG
PPTX
Introduction to Dimesional Modelling
PPTX
Project report aditi paul1
PPTX
Procurement of stock material
PPTX
Dimensional Modeling
PDF
Sap sd quest_answer_2009061511245119496
PDF
Data warehouse project on retail store
PPT
Dimensional modelling-mod-3
DW DIMENSN MODELNG
Introduction to Dimesional Modelling
Project report aditi paul1
Procurement of stock material
Dimensional Modeling
Sap sd quest_answer_2009061511245119496
Data warehouse project on retail store
Dimensional modelling-mod-3

Similar to Exploiting data quality tools to meet the expectation of strategic business users (20)

PDF
Overview of business intelligence
PPT
INWK Overview
PDF
ClearCost Introduction 2015
PPT
Data Quality
PPT
A G S004 Smith 091707
PPTX
Oracle BI Apps training
PDF
Cognitivo - Tackling the enterprise data quality challenge
PPTX
Oracle BI apps Online Training
PPT
La Dove Associates -- CRM/Customer Care Consulting Overview
PDF
When Data Visualizations and Data Imports Just Don’t Work
PPT
Data quality and bi
PDF
Top 10 excel analytic tests to minimize fraud and process risks
PPT
Bw training 3 data modeling
PPTX
Business requirements gathering for bi
PPTX
Day 1 (Lecture 2): Business Analytics
PPTX
20150118 s snet analytics vca
PPT
How to build the business case for Service Catalog
PPTX
Capability Design & Data Sourcing
DOCX
Resume
PPTX
20/10 Vision: Building A 21st Century Market Research Organization
Overview of business intelligence
INWK Overview
ClearCost Introduction 2015
Data Quality
A G S004 Smith 091707
Oracle BI Apps training
Cognitivo - Tackling the enterprise data quality challenge
Oracle BI apps Online Training
La Dove Associates -- CRM/Customer Care Consulting Overview
When Data Visualizations and Data Imports Just Don’t Work
Data quality and bi
Top 10 excel analytic tests to minimize fraud and process risks
Bw training 3 data modeling
Business requirements gathering for bi
Day 1 (Lecture 2): Business Analytics
20150118 s snet analytics vca
How to build the business case for Service Catalog
Capability Design & Data Sourcing
Resume
20/10 Vision: Building A 21st Century Market Research Organization
Ad

Exploiting data quality tools to meet the expectation of strategic business users

  • 1. Exploiting data quality tools to meet the expectation of strategic business users
  • 2. Key Purpose To evaluate the significance of Data Quality. To understand Data Quality and its relationship with business functionalities. To ensure the data reliability required.
  • 3. What is Data Quality It is the state of completeness, validity, consistency, timeliness, and accuracy that makes data appropriate for a specific use. Data quality should be a concern throughout the design, development, and operational phases of creating a data integration / data driven project.
  • 4. Key Issues Does the enterprise vision and mission include the concepts of information as a strategic business resource that adds value to its products and services for customer satisfaction? Does Information Systems include success criteria being measured by knowledge workers and end customers?
  • 5. Benefits of Data Quality I can accomplish my objectives For the right purpose Where I need it At the right place When I need it At the right time I can use it easily In the right format I can trust and rely on it With the right accuracy Whose meaning I know In the right context All the data I need With the right completeness The data I need The right data Business User Benefit Quality Characteristics
  • 7. Business Goal To provide the Right Data to the Right People at the Right Time Improved quality of Key Performance Indicators (KPI’s) Greater confidence in analytical systems
  • 8. Information Management (IM) - Framework Component Control over the IM process of Data Quality that satisfies the business requirement To ensure the availability of complete, accurate and timely data  To improve Customer interaction Revenue and Profits Day to Day decision making Effectiveness of the Strategy Operations
  • 9. Goals & Metrics Process Goals Tracking and reporting the measures through scorecards / Reports Resolve (Identify, investigate / analyse, fix) the issues Implement the action plans provided by stakeholders to improve Quality Process Goals
  • 11. Business Data – Graphical Presentation
  • 13. Data profiling is the process of systematically scanning and analyzing the contents of all the columns in tables of interest. Identify data defects Table column analysis Frequency distribution Min / Max / Outlier detection Data type analysis Unique / Null analysis Metadata analysis Data Profiling: Scoping the benefits
  • 14. Data Profiling – Example SAS Report
  • 15. Data Parsing 20 yr. Fixed, 14.5 percent, 2% orig + 1 point Term: 20 years Rate: 14.5% Points: 1% Origination Fee: 2% Type: Fixed Loan data Mr Khadim A Bukari Shah Name Prefix: Mr First Name: Khadim Middle Name: A Last Name: Bukhari Name Suffix: Shah Address data
  • 16. Data Standardization Company, co., co, comp, VP Sales, Sales Director, V.P. Sales Avenue, Ave, AVENUE, ave. Pakistan, PK, PAK, pk, pakistan Government, Govt., government, govt Company VP Sales AVE Pakistan Government
  • 17. Data Clustering and De-duplication 4 1 Standard Chartered Bank 4 1 SCB 4 1 Standard Chartered 1 1 3 3 8 Count 3 Imperial Chemical Industries 3 I C I 3 ICI. 3 I. C. I. 3 ICI Cluster Customer
  • 18. Householding – Family Clustering Bus Bus Ind Ind Ind Ind Ind Ind Account Type I I Chundrigarh, Karachi SCB Main I. I. Chundrigarh Rd, Khi Standard Chartered Khan Defence, Karachi Khan, Arif Khan Defence, Karachi Farah Khan Khan Defence, Karachi Khan, Shehzad Kashif Clifton – 5 Kashif, Shazia Clifton – 5 Hse # 5, St # 3, KDA Scheme 1, Lahore Address Kashif Saeed, Kashif Shah Nadir Shah Family Customer
  • 19. The Data Quality Metrics Record Completeness Measures the percentage of required data fields which is sourced from the Credit Data Marts (CDM) Computed as (Number of Required Data Fields in CDM / Total Number of Required Data Fields)*100 Example Suppose 9 Fields are required for a specific model: Record Completeness: = Fields in Warehouse / Fields Required = (7/ 9) *100 = 78% Note: No source of data available for Fields 7 & 9 9) Behavior Score 8) Total O/s 7) O/s bal dpd 90 days 6) O/s bal dpd 60 days 5) O/s bal dpd 30 days 4) No of Times A/c delinquent 3) Date of Account open 2) Customer Segment 1) Product Code Fields required for Past Due (PD) calculation 7) Total O/s 6) O/s bal dpd 60 days 5) O/s bal dpd 30 days 4) No of Times A/c delinquent 3) Date of Account open 2) Customer Segment 1) Product Code Fields in Warehouse
  • 20. The Data Quality Metrics ……Contd. Field Population Measures the Level of Population for each data field Metrics are in Defects per Million Observations (DPMO) which is calculated for each data field as (Number of Records with Missing Values / Total Number of Records)*1,000,000 Example Suppose there exist 10 records with information as given below Field Population: Name: (0/10)*1,000,000 = 0 DPMO Credit Limit: (1/10)*1,000,000 = 100,000 DPMO Gender: (3/10)*1,000,000 = 300,000 DPMO Address1: (0/10)*1,000,000 = 0 DPMO Address2: (0/10)*1,000,000 = 0 DPMO City 1 Road 2 M 900 Siraj City 2 Road 1 F 800 Ayesha City 1 Road 2 F 300 Maria City 1 Road 2 M 400 Tabrez City 1 Road 2 500 Asif City 1 Road 2 300 Halani City 1 Road 2 100 Pervaiz City 2 Road 1 F Farah City 1 Road 2 F 2000 Sadia City 1 Road 1 M 1000 Naheed Address2 Address1 Gender Credit Limit Name
  • 21. The Data Quality Metrics ……Contd. Validity Measures the Level of Validity for each data field based on specific business rules and reference lookup tables Metrics are in Defects per Million Observations (DPMO) which is calculated for each data field as (Number of Records with Invalid Values / Total Number of Records)*1,000,000 Example Suppose there exist 10 records with information as given below Business Rule used: Customer Name : Should not contain number Credit Limit: should not be blank or less than 0 Gender: Should contain only F or M Validity: Customer Name = (1/10)*1,000,000 = 100,000 DPMO Credit Limit = (3/10)*1,000,000 = 300,000 DPMO Gender = (4/10)*1,000,000 = 400,000 DPMO Country3 Road3 M 900 Siraj Country4 Road3 F 800 Nadia Country4 Road2 F 300 Behnaz Country4 Road1 M Chishti Country5 Road2 G 500 8CV8 Country5 Road2 G 300 Marvi Country4 Road3 G -100 Khayyam Country3 Road3 F -200 Sadia Country2 Road3 G 2000 Ishaq Country1 Road1 M 1000 Shahid Country Address1 Gender Credit Limit Customer Name
  • 22. The Data Quality Metrics ……Contd. Consistency Measures if the data stored is trustable or can be taken as true information Computed as (Sum of Outstanding Amount and Provisions in CDM / Sum of Balance as per GL)*100 Example Data Warehouse Report GL Trial Balance Consistency: = Sum of Balances (CDM) / Sum of Balances (GL) = (300+500) / (310+500) = 99% Data Warehouse Data Warehouse 500 200 100 NFI 001001200 Saving A/c with Cheque book Ali 00200300 Credit Card Siraj 001001200 Current A/c Shahid GL Code Product Customer 500 00200300 300 001001200 Amount GL Code Unsecured Product-NFI Wealth Product-NFI Account Description 500 00200300 310 001001200 Amount Account ID
  • 23. The Data Quality Metrics ……Contd. Relevance Measures the degree to which the data has a definition and an explanation Calculated as (Number of Date Fields which have available metadata / Total Number of Data Fields)*100 Example Data Dictionary Fields Data Warehouse Fields Relevance: Data Warehouse = 2/5*100 = 40% f_bankrupt a_curr_bal a_highest_paymt a_crlmt id_acct For CCMS, scan ACARDDET for the current month for CTXN = Payments. Compare ATXN with previous A_HIGHEST_PAYMT. If A_HIGHEST_PAYMT is greater then retain value else set ATXN. Direct Extract PDW Business Rule Direct Extract from PDW Direct Extract Highest Payment Amount Maximum payment since the account opened B_BCDM_CARD_SMRY_HIST_M01 a_highest_paymt Direct Extract Direct Extract Account Common identifier A unique value assigned to each account for masing purposes B_BCDM_CARD_ACCT_HIST_M01 id_acct CDM Business Rule TP Business Rule Attribute Definition Business Definition Table Name DW Field Name
  • 24. The Data Quality Metrics ……Contd. Availability Measures the percentage of tables which were completely loaded into warehouse and available for users on scheduled time Computed as (Number of tables loaded on time) / (Number of all the tables) Example Detail of receipt of Input files for warehouse Availability: = Tables Loaded on time / All Tables = (4/6) *100 = 67% Latency Measures the Number of Days that the tables were delayed from month end date From same example above, Latency is 8 days . -1 day 4-Mar 5-Mar 3 days 8-Mar 5-Mar 0 day 5-Mar 5-Mar -1 day 0 day 2 days Delay 4-Mar 5-Mar 5-Mar 5-Mar 7-Mar 5-Mar Actual Date Expected Date
  • 25. Audit Components Data Quality scorecards Action Plan documentations Country Escalation Process Operational data entry process Audit Guidelines
  • 26. Audit Population IM Governance Committee members General Manager (GM’s) / Group Product Heads Systems Development & Application Support Team Inputs / Outputs BRD forms for the Data quality targets Data Quality Scorecards / Reports Operational Data Entry Processes Audit Guidelines ……Contd.
  • 27. Data Quality Reporting Process Categorization of Data Every Data is categorized into Regulatory Attribute Flag indicating whether the data is being used for regulatory reporting Analytical Attribute Flag indicating whether the data is required for scorecard , behavior analytics Performance Measures (KPI’s) Flag to indicate whether the data is used for MIS purpose or not
  • 28. Conclusion Key benefits: Faster and reliable financial and management reporting Higher Return on Investment (ROI) on marketing automation campaigns Faster migration of legacy data to new ERP systems

Editor's Notes

  • #2: Good Morning! Ladies and Gentlemen. It is a pleasure to be here among all of you. It has been an excellent opportunity to share your views regarding Enterprise Data Management. We all are here today to face the upcoming challenges of BI in managing enterprise data. In today's competitive world, Business and Information Systems must balance the two sets of problems: those related to today and those related to tomorrow. Why do businesses normally fail? Business failure occurs when management focuses too much attention on today's immediate needs, such as quarterly profits and high costs but at the expense of solving tomorrow's problem, such as satisfying customers' emerging requirements. What good is clean data in the data warehouse if it is the wrong data? Strategic Business Users are distinguishable from other users because they serve a defined external market in which they play a key role in strategic planning and product development. Their success mainly depends on the quality of data produced. When companies become large, they are usually composed of a number of businesses segments, comprising of Revenue and Cost Centres. Normally in Banking Industry it varies around 100 business segments.
  • #3: We need to understand the key purpose of this presentation. To evaluate the significance of Data Quality Management, as it is the basic ingredient involved in controlling and monitoring business information. It ensures that important data stored within an enterprise is reliable, accurate and complete. Organizing data is the most critical and mandatory task as the available information is meant to be shared by different people to make strategic business decisions within a company. Generally, Finance and the Business continue to struggle with a differential amount created because of the reconciliation quality between the two. Business Information Management department must ensure complete Financial quality to be maintained on an agreed frequency to have authentic and reliable information to be provided to Strategic Business Users.
  • #4: Many organizations are stacked with volumes of data, which contain off colour information. Unhealthy information can cause more harm to the health of a company compared to having no information at all. In order to acquire operational efficiency and better performance, it becomes essential to deploy transactional data intelligence. Unfortunately, most data management solutions fail to provide the need driven analytics. The reason behind the failure is that the requisite data does have the quality required at capturing stage and instead lot of expectations are build with the analytics to do wonders based on that data. Centralization of processes is also one of the Key factor, which enables to maintain the quality in successful organizations. e.g. Centralized Account Opening Forms, Centralized Clearings.
  • #5: What are the issues that we face? Is the top management interested in having Quality Data? Do they depend on the Management Information provided or they still have a preset mind? Is Quality Data a part of new and existing product development? Is Customer Satisfaction valuable for the Corporation? Is management willing to spend on Quality systems? Does the Company have the right people to deploy those systems?
  • #6: It is the duty of IT Managers to determine the data quality characteristics, because they are the ones who have the true insight of what the strategic business users require to fulfil their objectives. The right data is the one which the end user requires. All these data quality characteristics like right completeness; right context, right accuracy, right format, time, place and purpose assist in accomplishing the objectives of strategic business user.
  • #7: This is the BI model which I made for Standard Chartered Bank in Pakistan. This model explains how different data are gathered from different data sources of Consumer Banking. Data sources like Core Banking application, Secured Assets, Unsecured Assets, Credit Cards, Merchant Acquiring and Alternate Distribution Channels were to be integrated into One data warehouse. There were massive issues relating to data quality management, as the bank went through a merger at that time. Duplication of records, missing data fields, etc. were the main hindrances to complete the task. Major discrepancies were found in the demographics data due to the poor data quality standards found in the major databases. It’s a simple garbage in garbage out concept. In every organizations, the issue arise at the input level, where the person handling the key punching operator is not aware of the consequences. Awareness at this level is mandatory to capture the required information to the desired level to maintain quality data. The systems should also have the capability to mark all such fields as mandatory fields, where restriction can be imposed on these fields to avoid issues. There is a list of data types that can be derived for each core banking data application like Day end data i.e. Period End Balances, Average Balances and No. of Accounts, Transactional data, Demographical data, Historical data, Financial data, Operational data and Service Quality data It becomes very challenging for the BI department to maintain high level of integrity in producing analytics or creating cubes with multi dimensional analytical capability. To create a single view of customer for Cross sell / Up sell, Campaign management, Loyalty Management, Attrition & Retention Management Most of these data quality issues are to be addressed in a Centralized Data Warehouse, where business rules are defined at the LDM level to avoid any conflicts.
  • #8: To understand the business goal, we need to be clear in What is Right Data? Right Data is the One, which is Correct, Accurate, Usable, Consistent, and Reliable Right People are the One, who are authorized to access the information in a proper hierarchal level. These level should be clearly defined to apply the proper Security Controls. Right Time Implies to quick turnaround time Frequencies are to be determined to assess the TAT required to publish such information. Defining the right KPIs is another milestone which needs to be defined very clearly. The strategic business users need to define their needs very clearly so as to assess the quality of data based on that. We need to build great confidence on Management Information systems in order to have maximize profitability and reduce costly operational inefficiencies.
  • #9: Control over IM process means that data resource management must continually ask, How are its information products going to capture and deliver knowledge resources that enable the business to achieve its mission, and how will it improve end-customer satisfaction? The top management must have a commitment to quality and productivity. I would like to quote an example of Customer Interaction in regards to Data Quality, where when a Citi Gold customer walks into the priority area and provide his account number, the RM gets the info about him like his relationship with CITI, the room temperature he prefers, the sugar he would like to add in his coffee, the lightning in the room he prefers, his travel details, etc. to make him feel privileged. That sort of customer interaction is only possible when data is captured with a concept to create superior quality in the service. Sales tracker needs to be in place to track the day to day activities in order to boost sales, highlight the areas where potential growth is not achieved to facilitate the day to day decision making. To overall improve the quality of work of Operational staff by monitoring the transactions load per teller in the branch to meet the standards of transaction processing.
  • #10: All data quality issues are to be Identified and proper action must be taken to apply proper measures to avoid such behaviors. There are many instances, where information is not captured even at the stage of Application form which is being done manually. All these things are to be incorporated forcefully in the Customer Relationship Management (CRM) to maintain the quality. Action plans need to be very clearly defined to help improve data quality. These processes need to be reviewed on regularly basis to avoid any conflicts at the initial stage of input.
  • #11: I wanted to share with you the actual template that we use in our internal portal for BIU tracking. As you can see clearly, that quality output always results in quick decision making. The more clearly the data is defined, the chances of quality decisions increases tremendously. Confidence in this data is the key to success, as all strategic business users take actions based on these results. Qualitative data can only be created once the business rules are clear and transparent, and are practical in implementation. This part of the portal is created with a concept of Right Data i.e. Daily Deposits Sales Tracking, for the Right People i.e. The Retail Banking population, and at the Right Time i.e. On a daily basis before the start of Banking time.
  • #12: This is another extract from our BIU Portal, which is a complete graphical presentation to give a birds eye view of the deposits tracker. You can very clearly conceive that this graphical presentation gives you a very Clear, Concise and Accurate picture This graph is generated in such clarity that anyone related can understand the depth to be narrated through this image. Further breakups of this image are available to enhance the detail scrutinizing concept. This tracking graph shows complete time series for the entire one year with variance monitoring against Budgeted Period End and Average Balances. To stay on top, all strategic business users make the most of this tracker for micro scrutinizing on daily basis, for aggressive follow ups and to help boost sales.
  • #13: Now we come to the other part of the presentation, where we will discuss in detail, on how to exploit the data quality tools to achieve the desire results and help the Strategic Business Users in strategic planning, product development and marketing related activities.
  • #14: We need to define Data Profiling techniques that would help in Identify data defects in a much detailed manner. A detailed table column analysis is to be conducted to evaluate Frequency Distributions of a occurrence To conduct the data type analysis and to check the Unique records and analyze the Null records.
  • #15: Lets take an example of SAS report to review how the data profiling takes place. As we look at the table, we have a total of 11130 records, if we calculate the population of missing rows we can see that 0.3% of transaction id population is available. Similarly, distinct records needs to be identified where density of the distinct field is denoted by 1 where it satisfies the business rule.
  • #16: One more thing to learn is how to do the data parsing. As the example shows, the address data is parsed in manner where Name Prefix, First Name, Middle Name, Last Name and Name Suffix is clearly defined. Similarly, Loan data is parsed into Term, Rate, Points, Origination Fee and Type. These rules are to be very clearly defined to be captured in a proper manner.
  • #17: Data Standardization is also one of the key pillars of Data Quality tools. As we can see from the example, that we need to define the Standardized approach where the all occurrence needs to be captured in the same manner. Company name when stated in various manners have to be captured in the ware house as Company. What needs to be defined is to clearly state what is required from the population of what is available. Analytics mainly depend on the quality standards that are available.
  • #18: Data Clustering and De-duplication is used to Cluster the data in an effective manner. Several accounts of ICI company are maintained, where clustering is used to define them. These clustering mainly helps in producing Customer Profitability Reporting, where we need to club all relationship of a single company with the organization to check the profitability of that specific Customer.
  • #19: Householding – Family Clustering is used to identify the same families. A common example is to find out top 50 depositors of the bank where it is very interesting to observe that more than one member of the family holds huge deposits. These depositors once combined gives a true picture of the top 50 customers. The address field in general can be changed to a such extent, where the house no. is fixed and the rest of all can be clustered in a way it should be derived.
  • #20: Page Presentation Slides
  • #21: Page Presentation Slides
  • #22: Page Presentation Slides
  • #23: Page Presentation Slides
  • #24: Page Presentation Slides
  • #25: Page Presentation Slides
  • #26: Audit has a key role in maintaining the data quality in a consistent way. For this purpose Audit Guidelines are established where data quality scorecards are reviewed, all action plans are documented and where ever necessary a proper escalation process is followed. All operational data entry processes are reviewed on regular basis to avoid any discrepancies in the system.
  • #27: Audit Population consists of Information Governance Committee members who are generally the senior members who are involved for proper escalation and decision making. All GMs and GPHs are also taken in the audit population System development and application support members are also involved to have a complete forum to handle the issues. Business requirement detail forms are very clearly defined to set the data quality targets. Data quality scorecards and reports are being generated All Operational data entry processes are reviewed to avoid day to day problems.
  • #28: Page Presentation Slides Data has to be categorized into three attributes Where regulatory attributes are to be flagged, to highlight that these are to be used in regulatory reporting requirements. Analytical attributes are to be flagged, to highlight that these are to be used in preparing scorecards and behavioral analytics. Performance measures also need to be flagged, to highlight that the data is to be used for MIS purpose
  • #29: Now we can conclude the key benefits that can be derived from this superior data quality As we can have a faster and reliable Financial and Management Reporting mechanism We can have higher Returns on Investment on marketing automation campaigns. We can have a much faster migration of legacy data to new ERP systems.
  • #30: Thank You very much.