SlideShare a Scribd company logo
Guest Presentation
HOW CLEAN IS YOUR DATABASE?
DATA SCRUBBING FOR ALL SKILL SETS (2020 EDITION)
CHAD PETROVAY, TMS ADMINISTRATOR
THE MORGAN LIBRARY & MUSEUM
Data Quality
Data quality is a measure of the condition of data
based on factors such as accuracy, completeness,
consistency, reliability and whether it’s up to date.
?
What fields do you have that contain
data quality issues?
?
What resources do you have to scrub
your data?
What is your personal skill level?
Power User
Uses the TMS UI;
has expanded rights,
but not full rights.
SQL Expert
Wait? TMS has a UI?
Nah, I’ll just script it in the
database.
Administrator
Full rights in TMS and
access to DB Config.
?
What is your data worth?
The costs of poor data quality are
between 20% and 35% of the operating
revenue of the average organization
LARRY ENGLISH
PREVENTION
“An ounce of prevention is worth a pound of cure”
-Benjamin Franklin
Institution
Value your Data
• Understand the costs
• Make data-driven
decisions
• Be a champion for
accurate data
Institution
Standards
• Establish the rules for data entry
• Conceptualize terms and
authority values
Prevents
• Data entry errors
• Formatting errors
• Inconsistency
• Creativity
Institution
Training
• Makes the system approachable
• Improves user efficiency
Prevents
• Data entry errors
• Unmanaged data silos (Excel)
Power User
Spell Check
Uses the Spelling and Grammar
engine in Microsoft Office.
Prevents:
• Typographical errors
• Misspellings
• Punctuation errors
• Grammatical errors
Power User
Function Keys
Reduces keystrokes when entering
repeated text.
Prevents:
• Typographical errors
• Misspellings
• Punctuation errors
• Grammatical errors
• Formatting errors
System Admin
Customize Field Labels
• Clarify field usage
• Makes system more intuitive
• Align field labels with your
institutional lingo
Prevents
• Confusion
System Admin
Customize Field Labels
In Database Configuration
1. Manage » Tables/Columns
2. Find the table
3. Find the column (i.e. field)
4. Right-click » Edit
5. Change Local Column Name
System Admin
Security Groups
• If your institution does not use a
field, then restrict access
• Restrict control of authority values
to select power users
• Text Types & Term Types
Prevents
• Populating obsolete fields
• Creativity
DATA PROFILING
“Mistakes are the portals of discovery.”
-James Joyce
System Admin
Usage Report
In TMS Module
1. Maintenance » Authorities » Others
2. Usage Report
3. After report generates:
• Browse
• Print
• Edit as RTF
• Save As RTF
System Admin
Frequency Report
In Database Configuration
1. Manage » Tables/Columns
2. Find the table
3. Find the column (i.e. field)
4. Right-click » Frequency
5. Save TXT file
Power User
Crystal Reports
In TMS Module
1. Report » Reports
2. Find report by name
3. Click Run
When creating the report:
1. Add formula “reporttype”
2. "NOTLINKED""NOTLINKED"
SQL
Distinct Values: SQL
A SQL query will return all
records, including:
• Departments you cannot see
• Template records
SELECT
DISTINCT ObjectName
FROM Objects
SELECT
ObjectName, COUNT(*)
FROM Objects
GROUP BY ObjectName
[HAVING COUNT(*) = 1]
[HAVING COUNT(*) > 1]
Power User
Distinct Values: Excel Pivot Table
Is the field in a List View?
Can you export your result set?
1. Export into Excel
2. Copy column into a new sheet
3. Create column “Count”
4. Fill “Count” with 1
5. Create a Pivot Table
Tutorial
• bit.ly/3d4M8Ou
Power User
OpenRefine
Install OpenRefine
• Download at www.openrefine.org
• Extract archive
• Execute openrefine.exe
• Opens in your web browser
Requires Java
• www.java.com/en/download/
Power User
OpenRefine: Facets
• A Facet shows a value
distribution
• Filter records
• Batch change
• Facets
• Word Facet
• Text-Length Facet
• Null / Empty String / Blank
Facets
Power User
OpenRefine: Duplicates
• Facets
• Duplicates Facet
• Facet by Star
• Facet by Flag
• Export to Excel
Power User
DataCleaner
Install Community Edition
• Download at www.datacleaner.org
• Extract archive
• Execute DataCleaner.exe
Requires Java
• www.java.com/en/download/
Power User
DataCleaner: DataStore
If your server uses NT Authentication:
• Add SQL user to the database
Create a datastore in DataCleaner:
1. Select Microsoft SQL Server
2. Supply details
• Hostname = Server name
• Database = TMS
• Username & Password
Power User
DataCleaner: New Job
• Navigation pane
• Datastore elements
• Library of actions
• Canvas
Power User
DataCleaner: Building Job
• Drag database elements and
components onto the canvas
• Best to drag columns instead
of full tables/views
• Use filter to exclude NULLs
and empty strings
Power User
DataCleaner: Results
• String Analysis
• Row Count
• Null/Blank Count
• All upper/lower count
• Char/Word count
• Max/Min/Avg char count
• Max/Min/Avg space count
• Max/Min word count
• Click arrow for details
Power User
DataCleaner: Results
• Value Distribution
• Total count
• Distinct count
• List of distinct values
(except uniques)
• Graphical rank-size of distinct
values
• Click arrow for details
Power User
DataCleaner: Results
• Pattern Finder
• A = Uppercase letter
• a = Lowercase letter
• # = Number
• ? = AlphaNumeric
• Graphical rank-size of distinct
patterns
• Click arrow for details
Power User
Data Quality Services (DQS)
• Knowledge Base
• Projects
• Cleansing
• Matching
• Bundled with SQL Server
• Enterprise Edition
• Developer Edition
• Only works with local
databases
PLANNING
“To achieve great things, two things are needed:
a plan, and not quite enough time.” –Leonard Bernstein
Institution
Human Capital
Human capital is essential
for any data scrubbing
project.
• Colleagues
• Interns
• Volunteers
Power User
Project Management
• Record projects
• Plan future projects
• Track progress
• Provide metrics for
administration
Power User
Cheat Sheets
• Training tool
• Project specific
• Simplifies access to
standards
DATA SCRUBBING
“Cleaning and organizing is a practice not a project.”
-Meagan Francis
The Three Modes
Human Middleware
Direct human contact with UI
Usually Record-by-Record
Labor intensive
Automation
Requires additional
tools/services/platforms
Steeper learning curve
Artificial Intelligence
SQL Script
Change one or more records
through the back-end
Requires intimate knowledge of
database structure
Human Middleware
Finding Records by Pattern
• Query using wildcards:
• single character (?)
• multi-character (*)
• Wrap sequences with
double quotes
Format TMS Search
(646) 733-2239 “(???) ???-????”
646.733.2239 *???.???.????*
+44 (0)207379 8188 +*
(510) 652-8950 ext 223 “* ext*”
Chad M. “* ?.”
Cheryl & Edward *&*
Cheryl and Edward “* and *”
Human Middleware
Search and Replace (String)
In Objects Module
1. Maintenance » Database »
Search and Replace
2. Select Module/Table/Column
3. Provide search and replace terms
4. Review results
• Replace All
• Replace
• Skip
System Admin
Search and Replace (Thesaurus)
In Database Configuration
1. Edit » Search and Replace »
Linked Thesaurus Terms
2. Click Zoom button (…) to find source term
3. Click Zoom button (…) to find target term
4. Click OK and confirm
Human Middleware
Merge Constituents Utility
In Plugins folder
1. Search for duplicate constituents
2. Select candidates from the
suggestions
3. Click Next
Feature Idea: Constituent Packages!!
Human Middleware
Merge Constituents Utility
4. Set Target record
• Right-click » Merge to this
5. Edit data in the columns of
the grid
6. Go section by section and
select the data to keep
7. Ready to merge?
• File » Merge
8. Save an XML file
SQL
Updating with SQL
• Know the system
• Test the SQL script in a sandbox
environment first
• Backup your database
before running SQL script
• Consider converting frequently used
scripts into Stored Procedures
• Gallery Systems may not be able to
provide support
SQL
Finding Records with Patterns
• Use a LIKE Statement
• Query using wildcards:
• single character (_)
• multi-character (%)
Format TMS Search
(646) 733-2239 ( _ _ _ ) _ _ _ - _ _ _ _
646.733.2239 % _ _ _ . _ _ _ . _ _ _ _ %
+44 (0)207379 8188 +%
(510) 652-8950 ext. 223 % ext%
Chad M. % _.
Cheryl & Edward %&%
Cheryl and Edward % and %
SQL
Excel Trick
If you have data in Excel
1. Create a SQL script using a
CONCATENATE formula
2. Copy the formula down the
column
3. Select and copy the column
4. Paste the content in SSMS
5. Execute
SQL
My Stored Procedure
Stored Procedure
• @ColumnID = Identifies the field
• Get the ColumnID from the
Data Dictionary
• @PK = Primary key for the record
• @NewValue = the value you want
• @LoginID = your username
EXECUTE [dbo].[MLM_UpdateFieldValue]
@ColumnID = 1243, @PK = 273469,
@NewValue = ‘Gift of John Doe’,
@LoginID = ‘cpetrovay’;
EXECUTE [dbo].[MLM_UpdateFieldValue]
@ColumnID = 1228, @PK = 273469,
@NewValue = ‘Loaned Object’,
@LoginID = ‘cpetrovay’;
SQL
My Stored Procedure
Process:
• Truncates new value if too long
• Looks up authority key values
• Updates only when value changes
• Tracks change in Audit Trail
Available:
• github.com/cpetrovay/TMS_UpdateField_SP/
EXECUTE [dbo].[MLM_UpdateFieldValue]
@ColumnID = 1243, @PK = 273469,
@NewValue = ‘Gift of John Doe’,
@LoginID = ‘cpetrovay’;
EXECUTE [dbo].[MLM_UpdateFieldValue]
@ColumnID = 1228, @PK = 273469,
@NewValue = ‘Loaned Object’,
@LoginID = ‘cpetrovay’;
MONITORING
“Without a systematic way to start and keep data clean,
bad data will happen.” -Donato Diorio
Human Middleware
Saved Queries
• Save time by saving your
periodic review queries.
• User has to initiate query
Human Middleware
Audit Trail Report
• Review changes to the
database
• Identify where to provide
additional training
• User has to run the report
Automation
TMS Alerts
• Notifies user when
predefined criteria is met
• User has to regularly
access TMS
• Setup by SQL Expert
Automation
Database Mail
• Sends email to user when
predefined criteria is met
• Requires configuration in
SQL Server
• Setup by SQL Expert
Automation
SSRS Subscription
• Sends email to user when
predefined criteria is met
• Requires configuration in
SSRS Server
• Setup by Report Writer
FINAL THOUGHTS
“Data that is loved tends to survive.”
–Kurt Bollacker
“While few things in life are guaranteed,
it is safe to say that not addressing data quality
issues this year means you’ll be facing the
same issues next year,
likely on a larger scale.”
-BO CRADER (sgENGAGE)
Data scrubbing
goes on as long as it has to.
THE 7TH RULE OF THE DATA SCRUB
CHAD PETROVAY
TMS ADMINISTRATOR, THE MORGAN LIBRARY & MUSEUM
cpetrovay@themorgan.org
Q&A

More Related Content

PDF
Developing Dynamic Reports for TMS Using Crystal Reports
Chad Petrovay
 
PPTX
SKILLWISE-SSIS DESIGN PATTERN FOR DATA WAREHOUSING
Skillwise Group
 
PPTX
Advanced integration services on microsoft ssis 1
Skillwise Group
 
PPTX
SQL Server 2008 Overview
David Chou
 
PPT
Database programming in vb net
Zishan yousaf
 
PDF
Visual Basic.Net & Ado.Net
FaRid Adwa
 
PDF
SQL Server 2016 novelties
MSDEVMTL
 
PPTX
SQL Server Workshop for Developers - Visual Studio Live! NY 2012
Andrew Brust
 
Developing Dynamic Reports for TMS Using Crystal Reports
Chad Petrovay
 
SKILLWISE-SSIS DESIGN PATTERN FOR DATA WAREHOUSING
Skillwise Group
 
Advanced integration services on microsoft ssis 1
Skillwise Group
 
SQL Server 2008 Overview
David Chou
 
Database programming in vb net
Zishan yousaf
 
Visual Basic.Net & Ado.Net
FaRid Adwa
 
SQL Server 2016 novelties
MSDEVMTL
 
SQL Server Workshop for Developers - Visual Studio Live! NY 2012
Andrew Brust
 

What's hot (20)

PDF
Statistics and Indexes Internals
Antonios Chatzipavlis
 
PDF
Microsoft SQL Server 2016 - Everything Built In
David J Rosenthal
 
PPT
For Beginers - ADO.Net
Snehal Harawande
 
PPTX
For Beginners - Ado.net
Tarun Jain
 
PPTX
Ado.net
pacatarpit
 
PPTX
Ado .net
Manish Singh
 
PDF
Sql server 2016 new features
Ajeet Singh
 
PPT
SQL Server 2008 for Developers
ukdpe
 
PPT
Introduction to ADO.NET
rchakra
 
PPS
Vb.net session 05
Niit Care
 
PDF
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...
Beat Signer
 
PPT
Chap14 ado.net
mentorrbuddy
 
PPT
ASP.NET 09 - ADO.NET
Randy Connolly
 
PPTX
Web based database application design using vb.net and sql server
Ammara Arooj
 
PPT
Ado.net
Iblesoft
 
PPTX
Sql 2016 - What's New
dpcobb
 
PDF
Ado.net
Vikas Trivedi
 
PDF
World2016_T5_S5_SQLServerFunctionalOverview
Farah Omer
 
PDF
World2016_T5_S7_TeradataFunctionalOverview
Farah Omer
 
PPTX
Mapping Data Flows Training deck Q1 CY22
Mark Kromer
 
Statistics and Indexes Internals
Antonios Chatzipavlis
 
Microsoft SQL Server 2016 - Everything Built In
David J Rosenthal
 
For Beginers - ADO.Net
Snehal Harawande
 
For Beginners - Ado.net
Tarun Jain
 
Ado.net
pacatarpit
 
Ado .net
Manish Singh
 
Sql server 2016 new features
Ajeet Singh
 
SQL Server 2008 for Developers
ukdpe
 
Introduction to ADO.NET
rchakra
 
Vb.net session 05
Niit Care
 
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...
Beat Signer
 
Chap14 ado.net
mentorrbuddy
 
ASP.NET 09 - ADO.NET
Randy Connolly
 
Web based database application design using vb.net and sql server
Ammara Arooj
 
Ado.net
Iblesoft
 
Sql 2016 - What's New
dpcobb
 
Ado.net
Vikas Trivedi
 
World2016_T5_S5_SQLServerFunctionalOverview
Farah Omer
 
World2016_T5_S7_TeradataFunctionalOverview
Farah Omer
 
Mapping Data Flows Training deck Q1 CY22
Mark Kromer
 
Ad

Similar to How Clean is your Database? Data Scrubbing for all Skill Sets (20)

PDF
How Clean is your database? Data scrubbing for all skills sets
Chad Petrovay
 
PPTX
Consultas en MS SQL Server 2012
CarlosFloresRoman
 
PPS
06 qmds2005 session08
Niit Care
 
PPT
asdasdasdasdsadasdasdasdasdsadasdasdasdsadsadasd
MuhamedAhmed35
 
PDF
Sql For Ibm I A Database Modernization Guide None Rafael Victriapereira
zehuakiyang
 
PDF
15 ways to optimize your sql queries hungred dot com
Kaing Menglieng
 
PDF
Query Tuning for Database Pros & Developers
Code Mastery
 
PPTX
Introduction.pptxyfdvkbvvxxmnvczsyjkmnbvhj
shindepoornima94
 
PPTX
HPD SQL Training - Beginner - 20220916.pptx
PatriceRochon1
 
PDF
Tools and Tips: From Accidental to Efficient Data Warehouse Developer (SQLBit...
Cathrine Wilhelmsen
 
PPTX
SQL Server - Introduction to TSQL
Peter Gfader
 
PDF
Part_2_Operations_On_Table_DBMS.pdf made by saiket sir
priyanshu71149745
 
PDF
48742447 11g-sql-fundamentals-ii-additional-practices-and-solutions
Ashwin Kumar
 
PPT
Toc Sg
Sudharsan S
 
PDF
Database Essentials for Healthcare Finance Professionals
Brad Adams
 
PPT
Passionate-Intro_Ch_11A for university students
innoxentawan420
 
PPTX
SQL Server 2012 Best Practices
Microsoft TechNet - Belgium and Luxembourg
 
PPTX
Database COMPLETE
Abrar ali
 
PDF
Sqlii
Ebtsam Mohamed
 
DOC
Oracle
Rajeev Uppala
 
How Clean is your database? Data scrubbing for all skills sets
Chad Petrovay
 
Consultas en MS SQL Server 2012
CarlosFloresRoman
 
06 qmds2005 session08
Niit Care
 
asdasdasdasdsadasdasdasdasdsadasdasdasdsadsadasd
MuhamedAhmed35
 
Sql For Ibm I A Database Modernization Guide None Rafael Victriapereira
zehuakiyang
 
15 ways to optimize your sql queries hungred dot com
Kaing Menglieng
 
Query Tuning for Database Pros & Developers
Code Mastery
 
Introduction.pptxyfdvkbvvxxmnvczsyjkmnbvhj
shindepoornima94
 
HPD SQL Training - Beginner - 20220916.pptx
PatriceRochon1
 
Tools and Tips: From Accidental to Efficient Data Warehouse Developer (SQLBit...
Cathrine Wilhelmsen
 
SQL Server - Introduction to TSQL
Peter Gfader
 
Part_2_Operations_On_Table_DBMS.pdf made by saiket sir
priyanshu71149745
 
48742447 11g-sql-fundamentals-ii-additional-practices-and-solutions
Ashwin Kumar
 
Toc Sg
Sudharsan S
 
Database Essentials for Healthcare Finance Professionals
Brad Adams
 
Passionate-Intro_Ch_11A for university students
innoxentawan420
 
SQL Server 2012 Best Practices
Microsoft TechNet - Belgium and Luxembourg
 
Database COMPLETE
Abrar ali
 
Ad

More from Chad Petrovay (6)

PDF
A Crash Course in SQL Server Administration for Reluctant Database Administra...
Chad Petrovay
 
PDF
The Museum System & Social Media: Changing their relationship status from ‘It...
Chad Petrovay
 
PDF
The Museum System (TMS) & Researchers: Synergizing Collection and Library Inf...
Chad Petrovay
 
PDF
Advanced Crystal Reports: Techniques for compiling Annual Reports & other sta...
Chad Petrovay
 
PDF
The Rest of the Collection: Using virtual objects to manage abstract objects,...
Chad Petrovay
 
PDF
TMS as a Remote Application
Chad Petrovay
 
A Crash Course in SQL Server Administration for Reluctant Database Administra...
Chad Petrovay
 
The Museum System & Social Media: Changing their relationship status from ‘It...
Chad Petrovay
 
The Museum System (TMS) & Researchers: Synergizing Collection and Library Inf...
Chad Petrovay
 
Advanced Crystal Reports: Techniques for compiling Annual Reports & other sta...
Chad Petrovay
 
The Rest of the Collection: Using virtual objects to manage abstract objects,...
Chad Petrovay
 
TMS as a Remote Application
Chad Petrovay
 

Recently uploaded (20)

PDF
advancepresentationskillshdhdhhdhdhdhhfhf
jasmenrojas249
 
PDF
What to consider before purchasing Microsoft 365 Business Premium_PDF.pdf
Q-Advise
 
PDF
Key Features to Look for in Arizona App Development Services
Net-Craft.com
 
PPTX
Web Testing.pptx528278vshbuqffqhhqiwnwuq
studylike474
 
PPTX
AI-Ready Handoff: Auto-Summaries & Draft Emails from MQL to Slack in One Flow
bbedford2
 
PPTX
Contractor Management Platform and Software Solution for Compliance
SHEQ Network Limited
 
PPTX
Explanation about Structures in C language.pptx
Veeral Rathod
 
PDF
Adobe Illustrator Crack Full Download (Latest Version 2025) Pre-Activated
imang66g
 
PDF
Balancing Resource Capacity and Workloads with OnePlan – Avoid Overloading Te...
OnePlan Solutions
 
PDF
MiniTool Power Data Recovery Crack New Pre Activated Version Latest 2025
imang66g
 
PPTX
Presentation about Database and Database Administrator
abhishekchauhan86963
 
PPTX
Visualising Data with Scatterplots in IBM SPSS Statistics.pptx
Version 1 Analytics
 
PDF
Download iTop VPN Free 6.1.0.5882 Crack Full Activated Pre Latest 2025
imang66g
 
DOCX
Can You Build Dashboards Using Open Source Visualization Tool.docx
Varsha Nayak
 
PPTX
Presentation about variables and constant.pptx
safalsingh810
 
PDF
WatchTraderHub - Watch Dealer software with inventory management and multi-ch...
WatchDealer Pavel
 
PDF
New Download FL Studio Crack Full Version [Latest 2025]
imang66g
 
PDF
Generating Union types w/ Static Analysis
K. Matthew Dupree
 
PDF
New Download MiniTool Partition Wizard Crack Latest Version 2025
imang66g
 
PPTX
Presentation about variables and constant.pptx
kr2589474
 
advancepresentationskillshdhdhhdhdhdhhfhf
jasmenrojas249
 
What to consider before purchasing Microsoft 365 Business Premium_PDF.pdf
Q-Advise
 
Key Features to Look for in Arizona App Development Services
Net-Craft.com
 
Web Testing.pptx528278vshbuqffqhhqiwnwuq
studylike474
 
AI-Ready Handoff: Auto-Summaries & Draft Emails from MQL to Slack in One Flow
bbedford2
 
Contractor Management Platform and Software Solution for Compliance
SHEQ Network Limited
 
Explanation about Structures in C language.pptx
Veeral Rathod
 
Adobe Illustrator Crack Full Download (Latest Version 2025) Pre-Activated
imang66g
 
Balancing Resource Capacity and Workloads with OnePlan – Avoid Overloading Te...
OnePlan Solutions
 
MiniTool Power Data Recovery Crack New Pre Activated Version Latest 2025
imang66g
 
Presentation about Database and Database Administrator
abhishekchauhan86963
 
Visualising Data with Scatterplots in IBM SPSS Statistics.pptx
Version 1 Analytics
 
Download iTop VPN Free 6.1.0.5882 Crack Full Activated Pre Latest 2025
imang66g
 
Can You Build Dashboards Using Open Source Visualization Tool.docx
Varsha Nayak
 
Presentation about variables and constant.pptx
safalsingh810
 
WatchTraderHub - Watch Dealer software with inventory management and multi-ch...
WatchDealer Pavel
 
New Download FL Studio Crack Full Version [Latest 2025]
imang66g
 
Generating Union types w/ Static Analysis
K. Matthew Dupree
 
New Download MiniTool Partition Wizard Crack Latest Version 2025
imang66g
 
Presentation about variables and constant.pptx
kr2589474
 

How Clean is your Database? Data Scrubbing for all Skill Sets

  • 1. Guest Presentation HOW CLEAN IS YOUR DATABASE? DATA SCRUBBING FOR ALL SKILL SETS (2020 EDITION) CHAD PETROVAY, TMS ADMINISTRATOR THE MORGAN LIBRARY & MUSEUM
  • 2. Data Quality Data quality is a measure of the condition of data based on factors such as accuracy, completeness, consistency, reliability and whether it’s up to date.
  • 3. ? What fields do you have that contain data quality issues?
  • 4. ? What resources do you have to scrub your data?
  • 5. What is your personal skill level? Power User Uses the TMS UI; has expanded rights, but not full rights. SQL Expert Wait? TMS has a UI? Nah, I’ll just script it in the database. Administrator Full rights in TMS and access to DB Config.
  • 6. ? What is your data worth?
  • 7. The costs of poor data quality are between 20% and 35% of the operating revenue of the average organization LARRY ENGLISH
  • 8. PREVENTION “An ounce of prevention is worth a pound of cure” -Benjamin Franklin
  • 9. Institution Value your Data • Understand the costs • Make data-driven decisions • Be a champion for accurate data
  • 10. Institution Standards • Establish the rules for data entry • Conceptualize terms and authority values Prevents • Data entry errors • Formatting errors • Inconsistency • Creativity
  • 11. Institution Training • Makes the system approachable • Improves user efficiency Prevents • Data entry errors • Unmanaged data silos (Excel)
  • 12. Power User Spell Check Uses the Spelling and Grammar engine in Microsoft Office. Prevents: • Typographical errors • Misspellings • Punctuation errors • Grammatical errors
  • 13. Power User Function Keys Reduces keystrokes when entering repeated text. Prevents: • Typographical errors • Misspellings • Punctuation errors • Grammatical errors • Formatting errors
  • 14. System Admin Customize Field Labels • Clarify field usage • Makes system more intuitive • Align field labels with your institutional lingo Prevents • Confusion
  • 15. System Admin Customize Field Labels In Database Configuration 1. Manage » Tables/Columns 2. Find the table 3. Find the column (i.e. field) 4. Right-click » Edit 5. Change Local Column Name
  • 16. System Admin Security Groups • If your institution does not use a field, then restrict access • Restrict control of authority values to select power users • Text Types & Term Types Prevents • Populating obsolete fields • Creativity
  • 17. DATA PROFILING “Mistakes are the portals of discovery.” -James Joyce
  • 18. System Admin Usage Report In TMS Module 1. Maintenance » Authorities » Others 2. Usage Report 3. After report generates: • Browse • Print • Edit as RTF • Save As RTF
  • 19. System Admin Frequency Report In Database Configuration 1. Manage » Tables/Columns 2. Find the table 3. Find the column (i.e. field) 4. Right-click » Frequency 5. Save TXT file
  • 20. Power User Crystal Reports In TMS Module 1. Report » Reports 2. Find report by name 3. Click Run When creating the report: 1. Add formula “reporttype” 2. "NOTLINKED""NOTLINKED"
  • 21. SQL Distinct Values: SQL A SQL query will return all records, including: • Departments you cannot see • Template records SELECT DISTINCT ObjectName FROM Objects SELECT ObjectName, COUNT(*) FROM Objects GROUP BY ObjectName [HAVING COUNT(*) = 1] [HAVING COUNT(*) > 1]
  • 22. Power User Distinct Values: Excel Pivot Table Is the field in a List View? Can you export your result set? 1. Export into Excel 2. Copy column into a new sheet 3. Create column “Count” 4. Fill “Count” with 1 5. Create a Pivot Table Tutorial • bit.ly/3d4M8Ou
  • 23. Power User OpenRefine Install OpenRefine • Download at www.openrefine.org • Extract archive • Execute openrefine.exe • Opens in your web browser Requires Java • www.java.com/en/download/
  • 24. Power User OpenRefine: Facets • A Facet shows a value distribution • Filter records • Batch change • Facets • Word Facet • Text-Length Facet • Null / Empty String / Blank Facets
  • 25. Power User OpenRefine: Duplicates • Facets • Duplicates Facet • Facet by Star • Facet by Flag • Export to Excel
  • 26. Power User DataCleaner Install Community Edition • Download at www.datacleaner.org • Extract archive • Execute DataCleaner.exe Requires Java • www.java.com/en/download/
  • 27. Power User DataCleaner: DataStore If your server uses NT Authentication: • Add SQL user to the database Create a datastore in DataCleaner: 1. Select Microsoft SQL Server 2. Supply details • Hostname = Server name • Database = TMS • Username & Password
  • 28. Power User DataCleaner: New Job • Navigation pane • Datastore elements • Library of actions • Canvas
  • 29. Power User DataCleaner: Building Job • Drag database elements and components onto the canvas • Best to drag columns instead of full tables/views • Use filter to exclude NULLs and empty strings
  • 30. Power User DataCleaner: Results • String Analysis • Row Count • Null/Blank Count • All upper/lower count • Char/Word count • Max/Min/Avg char count • Max/Min/Avg space count • Max/Min word count • Click arrow for details
  • 31. Power User DataCleaner: Results • Value Distribution • Total count • Distinct count • List of distinct values (except uniques) • Graphical rank-size of distinct values • Click arrow for details
  • 32. Power User DataCleaner: Results • Pattern Finder • A = Uppercase letter • a = Lowercase letter • # = Number • ? = AlphaNumeric • Graphical rank-size of distinct patterns • Click arrow for details
  • 33. Power User Data Quality Services (DQS) • Knowledge Base • Projects • Cleansing • Matching • Bundled with SQL Server • Enterprise Edition • Developer Edition • Only works with local databases
  • 34. PLANNING “To achieve great things, two things are needed: a plan, and not quite enough time.” –Leonard Bernstein
  • 35. Institution Human Capital Human capital is essential for any data scrubbing project. • Colleagues • Interns • Volunteers
  • 36. Power User Project Management • Record projects • Plan future projects • Track progress • Provide metrics for administration
  • 37. Power User Cheat Sheets • Training tool • Project specific • Simplifies access to standards
  • 38. DATA SCRUBBING “Cleaning and organizing is a practice not a project.” -Meagan Francis
  • 39. The Three Modes Human Middleware Direct human contact with UI Usually Record-by-Record Labor intensive Automation Requires additional tools/services/platforms Steeper learning curve Artificial Intelligence SQL Script Change one or more records through the back-end Requires intimate knowledge of database structure
  • 40. Human Middleware Finding Records by Pattern • Query using wildcards: • single character (?) • multi-character (*) • Wrap sequences with double quotes Format TMS Search (646) 733-2239 “(???) ???-????” 646.733.2239 *???.???.????* +44 (0)207379 8188 +* (510) 652-8950 ext 223 “* ext*” Chad M. “* ?.” Cheryl & Edward *&* Cheryl and Edward “* and *”
  • 41. Human Middleware Search and Replace (String) In Objects Module 1. Maintenance » Database » Search and Replace 2. Select Module/Table/Column 3. Provide search and replace terms 4. Review results • Replace All • Replace • Skip
  • 42. System Admin Search and Replace (Thesaurus) In Database Configuration 1. Edit » Search and Replace » Linked Thesaurus Terms 2. Click Zoom button (…) to find source term 3. Click Zoom button (…) to find target term 4. Click OK and confirm
  • 43. Human Middleware Merge Constituents Utility In Plugins folder 1. Search for duplicate constituents 2. Select candidates from the suggestions 3. Click Next Feature Idea: Constituent Packages!!
  • 44. Human Middleware Merge Constituents Utility 4. Set Target record • Right-click » Merge to this 5. Edit data in the columns of the grid 6. Go section by section and select the data to keep 7. Ready to merge? • File » Merge 8. Save an XML file
  • 45. SQL Updating with SQL • Know the system • Test the SQL script in a sandbox environment first • Backup your database before running SQL script • Consider converting frequently used scripts into Stored Procedures • Gallery Systems may not be able to provide support
  • 46. SQL Finding Records with Patterns • Use a LIKE Statement • Query using wildcards: • single character (_) • multi-character (%) Format TMS Search (646) 733-2239 ( _ _ _ ) _ _ _ - _ _ _ _ 646.733.2239 % _ _ _ . _ _ _ . _ _ _ _ % +44 (0)207379 8188 +% (510) 652-8950 ext. 223 % ext% Chad M. % _. Cheryl & Edward %&% Cheryl and Edward % and %
  • 47. SQL Excel Trick If you have data in Excel 1. Create a SQL script using a CONCATENATE formula 2. Copy the formula down the column 3. Select and copy the column 4. Paste the content in SSMS 5. Execute
  • 48. SQL My Stored Procedure Stored Procedure • @ColumnID = Identifies the field • Get the ColumnID from the Data Dictionary • @PK = Primary key for the record • @NewValue = the value you want • @LoginID = your username EXECUTE [dbo].[MLM_UpdateFieldValue] @ColumnID = 1243, @PK = 273469, @NewValue = ‘Gift of John Doe’, @LoginID = ‘cpetrovay’; EXECUTE [dbo].[MLM_UpdateFieldValue] @ColumnID = 1228, @PK = 273469, @NewValue = ‘Loaned Object’, @LoginID = ‘cpetrovay’;
  • 49. SQL My Stored Procedure Process: • Truncates new value if too long • Looks up authority key values • Updates only when value changes • Tracks change in Audit Trail Available: • github.com/cpetrovay/TMS_UpdateField_SP/ EXECUTE [dbo].[MLM_UpdateFieldValue] @ColumnID = 1243, @PK = 273469, @NewValue = ‘Gift of John Doe’, @LoginID = ‘cpetrovay’; EXECUTE [dbo].[MLM_UpdateFieldValue] @ColumnID = 1228, @PK = 273469, @NewValue = ‘Loaned Object’, @LoginID = ‘cpetrovay’;
  • 50. MONITORING “Without a systematic way to start and keep data clean, bad data will happen.” -Donato Diorio
  • 51. Human Middleware Saved Queries • Save time by saving your periodic review queries. • User has to initiate query
  • 52. Human Middleware Audit Trail Report • Review changes to the database • Identify where to provide additional training • User has to run the report
  • 53. Automation TMS Alerts • Notifies user when predefined criteria is met • User has to regularly access TMS • Setup by SQL Expert
  • 54. Automation Database Mail • Sends email to user when predefined criteria is met • Requires configuration in SQL Server • Setup by SQL Expert
  • 55. Automation SSRS Subscription • Sends email to user when predefined criteria is met • Requires configuration in SSRS Server • Setup by Report Writer
  • 56. FINAL THOUGHTS “Data that is loved tends to survive.” –Kurt Bollacker
  • 57. “While few things in life are guaranteed, it is safe to say that not addressing data quality issues this year means you’ll be facing the same issues next year, likely on a larger scale.” -BO CRADER (sgENGAGE)
  • 58. Data scrubbing goes on as long as it has to. THE 7TH RULE OF THE DATA SCRUB
  • 59. CHAD PETROVAY TMS ADMINISTRATOR, THE MORGAN LIBRARY & MUSEUM [email protected] Q&A