ALPSP Seminar: Data the universe and everything
22nd January 2014
Laura Cox
Contents
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.

Healthy data
Why now?
Institutional Identifiers – what and why?
Common data problems
Data governance
Data integration (linking your data together)
Institutional Identifiers in the supply chain
Institutional Identifiers – which?
ISNI IDs
What you can do now?
Why is healthy data important?
Good quality, healthy data can be
utilized to gain insight into customers,
business relationships and to support
strategic planning, decision making,
and ongoing business operations.
But when it’s unhealthy….
Poor data has real consequences
 Hard to get a true picture of relationships with institutions
 Lack of quality author (and affiliation) data

 Inability to see overlap between authors, members and

customers
 Inaccurate holdings and revenue reports
 Protracted time and effort taken to analyse data
Everything becomes more difficult, and less accurate
Healthy records are:
 Complete
 Accurate
 Free of duplicates
 Current
 Consistent

 Conform to standards
Unique Identifiers
What are they? How can they help?
 Numeric or alpha-numeric designations which are associated with







a single entity
Entities can be an institution, person, or piece of content
Enable the disambiguation of each entity
Proper understanding of the customer, author, reader or
institution
Proper identification of content object, article, product, or
package
Can be used internally or in conjunction with external partners
Why we should worry about data now?
 Number of researchers increasing by 3% per annum*
 Number of articles increasing by 3% per annum, current

output is 1.8-1.9 million per year*
 Number of journals increasing by 3.5% per annum*
 Growth in China has been in double digits for over 15 years*
 Increased demand for anytime/anywhere access
 Library budgets are frozen or being cut, less money for more
content means we have to work smarter
* Ware, M and Mabe, M, The STM Report, 2012
What are Institutional Identifiers for?
Disambiguating:
 UCL:
 University College London (UK)
 Université Catholique de Louvain
(Belgium)
 Universidad Cristiana
Latinoamericana (Ecuador)
 University College Lillebælt
(Denmark)
 Centro Universitario Celso Lisboa
(Brazil)
 Union County Library (USA)

 NPL:
 National Physical Laboratory (UK)
 National Physical Laboratory
(India)
 York University
 University of York (UK)
 York University (Canada)
 Northeastern University:
 Northeastern University
(Boston, USA)
 Northeastern University
(Shenyang, China)
What are Institutional Identifiers for?
Consolidating:

Hierarchy View:

 University of Oxford

 University of Northampton

 Univ. Oxford

 Northampton Business School

 Oxford University

 School of Education

 Library, Oxford Univ.
 Radcliffe Science Library

 School of Health

 School of Science and Technology


 Bodleian Library



 Bodleian, Oxford



 Oxford, University of



Division of Computing
Division of Engineering
Environmental & Geographical Sciences
Institute for Creative Leather
Technologies

 School of Social Sciences
 School of The Arts
Use cases – the why
Identifiers enforce uniqueness
 Disambiguate institutional records
 Eradicate duplication of data
 Ensure correct delivery, entitlements and access rights
 Better understand your customer base and relationships with

institutions
 Improve “trust” in data
 Map institutions into their hierarchy
Common data problems
 Most publishers have problems with data:
 Multiple accounts for each customer

 Multiple internal IT systems for different purposes
 Data entry without standard names or ID numbers
 Lack of hierarchy information
 No formal manner to track customers across systems
The challenge: Data Sources
 Multiple data sources – ‘system’ data silos
 Multiple locations – ‘geographic’ data silos

 Data entered by different people for different purposes
 Data from third parties in the supply chain
 Data from bought-in sources
The challenge: Data Sources
Typical publisher systems:

Data can be entered by:

 Financial system

 Organisation staff

 CRM/Sales database

 Authors

 Authentication system

 Society members

 Fulfilment

 Agents in the supply chain

 Usage statistics
 Submissions system
 Author database
 Document Storage (contracts and

licences)
 …..

 3rd party organisations
 …..
Implementing a data governance plan
 Important considerations:
 What data is held, where it is held and how it is accessed?

 How can the data be used to further benefit different





departments, processes or activities?
Could the use of current or planned systems be expanded for
further benefit?
Is data highly accurate and consolidated or in need of cleansing?
Are there applications of data that have not been explored?
What requirements are there for additional data?
Improve data capture
 If you can – use web forms
 Implement required fields

 Data validation – at a minimum use naming conventions
 Address validation – postcode lookup
 Institution validation – institution lookup

 Web form consistency across systems
 Avoid free-text fields
 Make institutional identifiers a requirement
Implementing Institutional IDs
Turn your records from this…..

…..into this.
Data integration
CRM

 Using Institutional Identifiers

to link internal systems:

Electronic
document
storage

Financial
System

 Prevent duplicate account








creation
Break down silos
Keep data up-to-date and
systems synchronised
Enable staff to use data more
effectively
Simplify data transmission
Improve overall data quality

Authentication

Institutional
Identifiers

Membership
system

Usage
statistics

Author
Database
Fulfilment
system
Linking author and institution IDs
 When authors and their affiliations are linked

correctly, publishers gain:
 Market intelligence about authors and institutions
 Author and subscriber information mapped together
 Knowledge of where research funding is concentrated
 Reduction in time taken calculating open access charges (APCs)

 Institutions gain information about their overall research

output
 Funders gain information about where authors reside and
publish
The scholarly supply chain
 Purpose:
 Serving the author and reader

 Disseminate content as widely as possible
 Ensure content is easily discoverable
 Provide information in an efficient and trouble-free manner

regardless of:




Content type
User requirements
Desired methods of access
The supply chain (simple version)
Author

Funders
Submission
and Peer
Review
System

End User

Discovery
Service

Consortium
Consortium

Data
Providers and
Systems
(multiple)

Publisher

Online Host
or
Technology
Partner

Library

Fulfilment
House or
System

Subscription
Agent or
Sales Agent

Societies
Supply-chain spaghetti
Author

Funders
Submission
and Peer
Review
System

End User

Discovery
Service

Consortium
Consortium

Data
Providers and
Systems
(multiple)

Publisher

Online Host
or
Technology
Partner

Library

Fulfilment
House or
System

Subscription
Agent or
Sales Agent

Societies
What could possibly go wrong?
 Records are unconnected through the supply chain, links fail:
 Between entities
 Between internal systems
 Between external systems









Renewals are mishandled
Journal transfers are mishandled
Access and authentication is mishandled
Authors and individuals are not linked to their institution
Open access fees have to be checked manually
Authors are not linked to their research
Funders are not linked to the research they fund
Where stronger links are needed
 Finding a path to using standardized data, which:
 Eradicates duplicate records within and between systems

 Enables seamless communication between organizations
 Smoothes the supply chain, removing ambiguity or lack of

information for any party
 Enables higher quality of service
 Increases understanding of customer base and enables better
decision making for everyone involved
Supply-chain spaghetti
Author

Funders
Submission
and Peer
Review
System

End User

Discovery
Service

Consortium
Consortium

Data
Providers and
Systems
(multiple)

Publisher

Online Host
or
Technology
Partner

Library

Fulfilment
House or
System

Subscription
Agent or
Sales Agent

Societies
…becomes organised, with accurate
data and information flow

Consortium
The vision
In an ideal world we would be able to utilise, provide and
obtain data that is accurate, complete and easily joined
together:
 Reducing problems and errors
 Providing better overall service
 Creating seamless processes
 Providing a better understanding of customers and our own
businesses
External linking – in the supply chain
 Using Identifiers will:
 Ensure accuracy of information

 Speed up data transactions
 Reduce queries
 Reduce costs
 Open data up to new uses
 Ensures that authors receive credit for the work they produce
 Ensures that end users receive uninterrupted access to the

content they need
A truly linked supply chain

Identifiers
Institutional Identifiers – which ones?
 JISC and CASRAI (Consortia Advancing Standards in Research

Administration Information) report on Organisation IDs:
http://repository.jisc.ac.uk/5381/1/CC549D0011.0_org_ID_landscape_study.pdf
 Examined the landscape of organisational identifiers in the UK
and identified 23 different IDs
 Based on interviews with key individuals
 Lots of detail on use cases for publishing, funders, and
institutions
CASRAI report
 Disambiguating organisational information from multiple

sources typically described as “a nightmare”
 Benefits from effective unique identifiers are truly realised
when data is shared
 Key aspects of identifiers that support the widest range of
uses:
 Governance
 Trust
 Transparency
 Temporal
 Appropriate metadata
●
●
●

●

●
●
●

●
●
●

●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●

●

●

●

●

●
●

●
●

●
●
●

●
●
●

●
●
●
●
●
●
●
●
●
●

Publishers

●
●

Funders

Companies

●

●

●
●

Curated

Regulated

Mainly
used for
linking

●
●
●

●

HEIs

Global
Global
Global
Global
Global
Global
Global
Global
Global
UK
UK
UK
UK
UK
UK
UK
UK
UK
UK
UK
UK
England
EU

Historic

Please note that ORCID had not
released the institutional
affiliation at the point at which
this report was published.

Identifier
Name
Dun & Bradstreet
FundRef
ISNI
ORCID
Ringgold's Identify
MACE & UK Federation
VIAF
Research Analytics
Companies House
Gateway to Research
Government bodies
HESA
IDBR
Janet
Je-S/CDR OrgID
Research Fish
RCUK
ROS
UCAS
UKPRN
HEFCE
PIC

Coverage

Identifiers identified

●
●
●
●

●
●

●

●
●

●
●
●

●

●

●
●
●

●

●
●

●

●

●
●

●
●
●
●
●

●

●

●

●
●

●
●

●

●

Publishers

●
●

Funders

Companies

●

HEIs

●

●

●
●

Curated

Regulated

●
●
●

●

Historic

Global
Global
Global
Global
Global
Global
Global
Global
Global

Mainly
used for
linking

Identifier
Name
Dun & Bradstreet
FundRef
ISNI
ORCID
Ringgold's Identify
MACE & UK Federation
VIAF
Research Analytics

Coverage

Global Identifiers
●
●
●
●

●
●

●

●
●

●

●
ISNI
ISNI Number

ISNI Number

Party ID 1

Party ID 2

Proprietary
Information and/or
Metadata

Proprietary
Information and/or
Metadata

 ISO Standard 27729
 ISNI is designed to be a

“bridge identifier”
 Covers any type of entity
ISNI IDs
 Ringgold is an ISNI Registration Agency for institutions
 Unique ISNI Institutional ID number can connect any data

and any systems
 ISNI IDs should be used by publishers and across the
scholarly supply chain to:
 Link systems using the ID numbers
 Link data sets which contain proprietary metadata
 Provide clean data transmission
ISNI spans all industries, market segments, and regions
Academia
Medical
Corporate
Government
Not-for-profit
Public libraries
Schools

Publishers
Funding bodies
Intermediaries
Distributors

http://isni.org/
What can YOU do now?
 Engage with the problems you have with data
 Find some resources – think about time not just money

 Consider how data could better serve your organisation
 Appoint a data champion and document everything
 Generate a data governance policy

 Create some basic rules for data entry
 Utilise universal identifiers to clean and link your data
 Work with suppliers and customers to utilise institutional

identifiers to strengthen the supply chain
Laura Cox
President, Chief Financial and Operating Officer
Ringgold Inc.
laura.cox@ringgold.com

Institutional Identifiers internally and throughout the supply chain

  • 1.
    ALPSP Seminar: Datathe universe and everything 22nd January 2014 Laura Cox
  • 2.
    Contents 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Healthy data Why now? InstitutionalIdentifiers – what and why? Common data problems Data governance Data integration (linking your data together) Institutional Identifiers in the supply chain Institutional Identifiers – which? ISNI IDs What you can do now?
  • 4.
    Why is healthydata important? Good quality, healthy data can be utilized to gain insight into customers, business relationships and to support strategic planning, decision making, and ongoing business operations. But when it’s unhealthy….
  • 5.
    Poor data hasreal consequences  Hard to get a true picture of relationships with institutions  Lack of quality author (and affiliation) data  Inability to see overlap between authors, members and customers  Inaccurate holdings and revenue reports  Protracted time and effort taken to analyse data Everything becomes more difficult, and less accurate
  • 6.
    Healthy records are: Complete  Accurate  Free of duplicates  Current  Consistent  Conform to standards
  • 7.
    Unique Identifiers What arethey? How can they help?  Numeric or alpha-numeric designations which are associated with      a single entity Entities can be an institution, person, or piece of content Enable the disambiguation of each entity Proper understanding of the customer, author, reader or institution Proper identification of content object, article, product, or package Can be used internally or in conjunction with external partners
  • 9.
    Why we shouldworry about data now?  Number of researchers increasing by 3% per annum*  Number of articles increasing by 3% per annum, current output is 1.8-1.9 million per year*  Number of journals increasing by 3.5% per annum*  Growth in China has been in double digits for over 15 years*  Increased demand for anytime/anywhere access  Library budgets are frozen or being cut, less money for more content means we have to work smarter * Ware, M and Mabe, M, The STM Report, 2012
  • 11.
    What are InstitutionalIdentifiers for? Disambiguating:  UCL:  University College London (UK)  Université Catholique de Louvain (Belgium)  Universidad Cristiana Latinoamericana (Ecuador)  University College Lillebælt (Denmark)  Centro Universitario Celso Lisboa (Brazil)  Union County Library (USA)  NPL:  National Physical Laboratory (UK)  National Physical Laboratory (India)  York University  University of York (UK)  York University (Canada)  Northeastern University:  Northeastern University (Boston, USA)  Northeastern University (Shenyang, China)
  • 12.
    What are InstitutionalIdentifiers for? Consolidating: Hierarchy View:  University of Oxford  University of Northampton  Univ. Oxford  Northampton Business School  Oxford University  School of Education  Library, Oxford Univ.  Radcliffe Science Library  School of Health  School of Science and Technology   Bodleian Library   Bodleian, Oxford   Oxford, University of  Division of Computing Division of Engineering Environmental & Geographical Sciences Institute for Creative Leather Technologies  School of Social Sciences  School of The Arts
  • 13.
    Use cases –the why Identifiers enforce uniqueness  Disambiguate institutional records  Eradicate duplication of data  Ensure correct delivery, entitlements and access rights  Better understand your customer base and relationships with institutions  Improve “trust” in data  Map institutions into their hierarchy
  • 15.
    Common data problems Most publishers have problems with data:  Multiple accounts for each customer  Multiple internal IT systems for different purposes  Data entry without standard names or ID numbers  Lack of hierarchy information  No formal manner to track customers across systems
  • 16.
    The challenge: DataSources  Multiple data sources – ‘system’ data silos  Multiple locations – ‘geographic’ data silos  Data entered by different people for different purposes  Data from third parties in the supply chain  Data from bought-in sources
  • 17.
    The challenge: DataSources Typical publisher systems: Data can be entered by:  Financial system  Organisation staff  CRM/Sales database  Authors  Authentication system  Society members  Fulfilment  Agents in the supply chain  Usage statistics  Submissions system  Author database  Document Storage (contracts and licences)  …..  3rd party organisations  …..
  • 19.
    Implementing a datagovernance plan  Important considerations:  What data is held, where it is held and how it is accessed?  How can the data be used to further benefit different     departments, processes or activities? Could the use of current or planned systems be expanded for further benefit? Is data highly accurate and consolidated or in need of cleansing? Are there applications of data that have not been explored? What requirements are there for additional data?
  • 20.
    Improve data capture If you can – use web forms  Implement required fields  Data validation – at a minimum use naming conventions  Address validation – postcode lookup  Institution validation – institution lookup  Web form consistency across systems  Avoid free-text fields  Make institutional identifiers a requirement
  • 21.
    Implementing Institutional IDs Turnyour records from this….. …..into this.
  • 23.
    Data integration CRM  UsingInstitutional Identifiers to link internal systems: Electronic document storage Financial System  Prevent duplicate account      creation Break down silos Keep data up-to-date and systems synchronised Enable staff to use data more effectively Simplify data transmission Improve overall data quality Authentication Institutional Identifiers Membership system Usage statistics Author Database Fulfilment system
  • 24.
    Linking author andinstitution IDs  When authors and their affiliations are linked correctly, publishers gain:  Market intelligence about authors and institutions  Author and subscriber information mapped together  Knowledge of where research funding is concentrated  Reduction in time taken calculating open access charges (APCs)  Institutions gain information about their overall research output  Funders gain information about where authors reside and publish
  • 26.
    The scholarly supplychain  Purpose:  Serving the author and reader  Disseminate content as widely as possible  Ensure content is easily discoverable  Provide information in an efficient and trouble-free manner regardless of:    Content type User requirements Desired methods of access
  • 27.
    The supply chain(simple version) Author Funders Submission and Peer Review System End User Discovery Service Consortium Consortium Data Providers and Systems (multiple) Publisher Online Host or Technology Partner Library Fulfilment House or System Subscription Agent or Sales Agent Societies
  • 28.
    Supply-chain spaghetti Author Funders Submission and Peer Review System EndUser Discovery Service Consortium Consortium Data Providers and Systems (multiple) Publisher Online Host or Technology Partner Library Fulfilment House or System Subscription Agent or Sales Agent Societies
  • 29.
    What could possiblygo wrong?  Records are unconnected through the supply chain, links fail:  Between entities  Between internal systems  Between external systems        Renewals are mishandled Journal transfers are mishandled Access and authentication is mishandled Authors and individuals are not linked to their institution Open access fees have to be checked manually Authors are not linked to their research Funders are not linked to the research they fund
  • 30.
    Where stronger linksare needed  Finding a path to using standardized data, which:  Eradicates duplicate records within and between systems  Enables seamless communication between organizations  Smoothes the supply chain, removing ambiguity or lack of information for any party  Enables higher quality of service  Increases understanding of customer base and enables better decision making for everyone involved
  • 31.
    Supply-chain spaghetti Author Funders Submission and Peer Review System EndUser Discovery Service Consortium Consortium Data Providers and Systems (multiple) Publisher Online Host or Technology Partner Library Fulfilment House or System Subscription Agent or Sales Agent Societies
  • 32.
    …becomes organised, withaccurate data and information flow Consortium
  • 33.
    The vision In anideal world we would be able to utilise, provide and obtain data that is accurate, complete and easily joined together:  Reducing problems and errors  Providing better overall service  Creating seamless processes  Providing a better understanding of customers and our own businesses
  • 34.
    External linking –in the supply chain  Using Identifiers will:  Ensure accuracy of information  Speed up data transactions  Reduce queries  Reduce costs  Open data up to new uses  Ensures that authors receive credit for the work they produce  Ensures that end users receive uninterrupted access to the content they need
  • 35.
    A truly linkedsupply chain Identifiers
  • 37.
    Institutional Identifiers –which ones?  JISC and CASRAI (Consortia Advancing Standards in Research Administration Information) report on Organisation IDs: http://repository.jisc.ac.uk/5381/1/CC549D0011.0_org_ID_landscape_study.pdf  Examined the landscape of organisational identifiers in the UK and identified 23 different IDs  Based on interviews with key individuals  Lots of detail on use cases for publishing, funders, and institutions
  • 38.
    CASRAI report  Disambiguatingorganisational information from multiple sources typically described as “a nightmare”  Benefits from effective unique identifiers are truly realised when data is shared  Key aspects of identifiers that support the widest range of uses:  Governance  Trust  Transparency  Temporal  Appropriate metadata
  • 39.
    ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Publishers ● ● Funders Companies ● ● ● ● Curated Regulated Mainly used for linking ● ● ● ● HEIs Global Global Global Global Global Global Global Global Global UK UK UK UK UK UK UK UK UK UK UK UK England EU Historic Please notethat ORCID had not released the institutional affiliation at the point at which this report was published. Identifier Name Dun & Bradstreet FundRef ISNI ORCID Ringgold's Identify MACE & UK Federation VIAF Research Analytics Companies House Gateway to Research Government bodies HESA IDBR Janet Je-S/CDR OrgID Research Fish RCUK ROS UCAS UKPRN HEFCE PIC Coverage Identifiers identified ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
  • 40.
  • 42.
    ISNI ISNI Number ISNI Number PartyID 1 Party ID 2 Proprietary Information and/or Metadata Proprietary Information and/or Metadata  ISO Standard 27729  ISNI is designed to be a “bridge identifier”  Covers any type of entity
  • 43.
    ISNI IDs  Ringgoldis an ISNI Registration Agency for institutions  Unique ISNI Institutional ID number can connect any data and any systems  ISNI IDs should be used by publishers and across the scholarly supply chain to:  Link systems using the ID numbers  Link data sets which contain proprietary metadata  Provide clean data transmission
  • 44.
    ISNI spans allindustries, market segments, and regions Academia Medical Corporate Government Not-for-profit Public libraries Schools Publishers Funding bodies Intermediaries Distributors http://isni.org/
  • 46.
    What can YOUdo now?  Engage with the problems you have with data  Find some resources – think about time not just money  Consider how data could better serve your organisation  Appoint a data champion and document everything  Generate a data governance policy  Create some basic rules for data entry  Utilise universal identifiers to clean and link your data  Work with suppliers and customers to utilise institutional identifiers to strengthen the supply chain
  • 47.
    Laura Cox President, ChiefFinancial and Operating Officer Ringgold Inc. [email protected]