The Future of the
Scholarly Record
Todd A. Carpenter
Executive Director, NISO
@TAC_NISO
APE Annual Conference
January 11, 2022
“
”
You can’t connect the dots
looking forward;
you can only connect them
looking backwards.
- Steve Jobs
Scholars Have Always Shared Their Findings
Image: Walters Art Museum
This Moved from Books to Letters to Articles
Image: Academy of Natural Sciences of Drexel University
Scientists Also Shared Data
Ireland’s pre-1940 daily rainfall records, Geoscience Data Journal, Volume: 8, Issue: 1, Pages: 11-23, 16 August
2020, DOI: (10.1002/gdj3.103)
The Early Internet was Designed By
and For Researchers
Image: James L. Green, NASA
Realizing the 1980s Vision of HyperText
Screen shot of the HyperTIES authoring tool Screen shot of an Intermedia educational course
Skeuomorph
8
What Makes Someone a Digital Native?
Image: https://tbirdseyeview.wordpress.com/2015/01/18/who-just-said-digital-natives/ Terese Bird, University of Leicester
What about a
digital-native
scholarly record?
Realistically, It Needs to be Much More Than This
Multi-format and multimedia
Interoperable
Machine-readable formats
Adaptive design
Accessible
Transformable
Atomize-able
High quality metadata
Preservable
Linkable
Trackable
Multi-format and multimedia
Interoperable
Machine-readable formats
Adaptive design
Accessible
Transformable
Atomize-able
High quality metadata
Preservable
Linkable
Trackable
← Standards
← Standards
← Standards
← Standards
← Standards
← Standards
← Standards
← Standards
← Standards
← Standards
What is NISO and
what’s our role?
Photo: Minneapolis College of Art and Design Library
! Non-profit industry trade association accredited
by the American National Standards Institute (ANSI)
! Mission of developing and maintaining technical standards
related to information, documentation, discovery and
distribution of published materials and media
! Volunteer-driven organization: 500+ contributors spread
out across the world, roughly 25% based outside the US
! Responsible (directly and indirectly) for standards like
ISSN, DOI, Dublin Core metadata, DAISY digital talking
books, OpenURL, MARC records, and ISBN
About
16
Previously, the Scholarly
Record Didn’t Include
Much Data
Large Scale Data-Driven Science
18
“Increasingly, scientific
breakthroughs will be powered by
advanced computing capabilities
that help researchers manipulate
and explore massive datasets.”
NSF DataNET Program
DataNet projects funded by US National Science Foundation (NSF), launched in 2007
in support of NSF’s Cyberinfrastructure Vision for 21st Century Discovery
$100 million investment in five data networks over 10 years
DataONE
Data Conservancy
SEAD Sustainable Environment - Actionable Data
DataNet Federation Consortium
Terra Populus
Other data repository investments made outside of the US around this time:
! Australian National Data Service (Now ARDC)
! EU Open Data Portal
! UK JISC Repositories Support Project (RSP)
! DRIVER (Digital Repository Infrastructure Vision for European Research)
CODATA-ICSTI Data Citation Standards
and Practices
u Launched by International Council for Scientific and Technical
Information (ICSTI) and CODATA during the 27th General Assembly in
Cape Town in October 2010
TASKS
u Survey existing literature and existing data citation initiatives.
u Obtain input from stakeholders in library, academic, publishing, and research
communities.
u Hold at least one meeting and a workshop to help establish a solid foundation
of the state of the art and practices in this area.
u Work with the ISO and major regional and national standards organizations to
develop formal data citation standards and good practices.
u Hosted a dozen meetings worldwide, produced two reports Out of Cite Out Of
Mind: The Current State of Practice, Policy, and Technology for the Citation
of Data (2013) and NAS For Attribution
20
Beyond the PDF, San Diego, 2011
The Amsterdam Manifesto
on Data Citation
Drafted during a reception at the Future of
Research Communications (to become FORCE11)
conference in Amsterdam in March 2013
This led to the formation of a working group
within FORCE11 to develop the
JOINT DECLARATION OF DATA CITATION
PRINCIPLES published in 2014.
Endorsed by 125 organizations and nearly 300
individuals.
23
Slide: Sarah Jones
https://www.slideshare.net/sjDCC/fair-data
Releasing FAIR
24
FAIR into the Future
Since 2016, FAIR has caught on
It has also spawned a variety of
related work that is focused on
the implementation of these
ideas.
(Many of which are focused
on standards.)
Transforming
Discovery
Photo: Minneapolis College of Art and Design Library
27
Photo: OSU Thompson Library Stacks” by Kristin Six
28
Today the Stacks of the Library Look Like This
30
This is how you browse bookBot ”stacks”
A key problem today is this:
31
A key problem today is this:
You can’t walk the stacks
in a digital library
or browse the shelves
in an online bookstore
32
33
Digitally,
all you can
browse or search
is metadata
34
How much are you
investing in the quality
of your metadata?
The Algorithms Decide What You Discover
The Algorithms Decide What You Discover
Machines are Reading More Than Humans Are
Machines Don’t Communicate Like People Do
Image: Betsy Streeter. Available from CartoonStock. https://bit.ly/391dVyh
3
9
We Need to Create Content they Can Consume
Building the infrastructure
to support a transformed
scholarly record
u “All disciplines, whether or not data-
intensive, operate in a digital world
where all the elements of the
research process are connected or
connectable in ways that permit
them to be linked together as parts
of a research workstream, with the
possibility of digital interoperability
across the ‘research cycle’.”
u Launched in 2009
u Since grown to include
u 879 repositories
u 29 million DOIs registered
u +5.5 million in 2021 alone
u Just in September 2021:
u 13.9 million successful
resolutions
u On 3,769,169 objects
u Launched in 2012
u Since grown to include more
than 11 million id assignments
u 2,639,296 researchers
registered for new ORCID
identifiers just in 2020
u Managed by CrossRef
u Registry was donated to Crossref by
Elsevier after funding information
was added to its Content Registration
schema in 2012
u Initially included about 4,000 funders
u Now contains more than 21,536
funders
u Connects more than 2.5 million
published works with funding data
Source CrossRef: https://www.crossref.org/pdfs/about-funder-registry.pdf
u Research Organizations Registry
u Launched in 2019
u ROR launched with data from
Digital Science’s GRID database
u Now more than 97,000
institutions assigned
u Newly developing ISO standard
u Launched by Australian
Research Data Commons
(ARDC)
u Connects the various elements
of a research project, from
funding, to researchers, to
protocols, to papers, data sets
and other outputs.
RAiD Explained video
The Developing
Research Graph
PID Graph KPI: Number of resources and
links in the PID Graph available via
GraphQL API as of May 4, 2020.
Generated using (Fenner (2019a)).
RAiD
Projects
Where can this graph take us?
Investments in the
infrastructure allow
for tremendous
downstream applications
Yet all this infrastructure
is expensive.
Yet all this infrastructure
is expensive.
Is there a
long-term commitment
from key players
to support this infrastructure?
How scholarly
communications
culture (slowly)
changes
Using data sharing or discovery are
just two examples.
There are many, many more.
Format Transformations
Style 1992
Articles Print
Monographs Print
Research Data Astronomy,
maybe
Preprints ArXiv
Annotation Not
implemented
Discovery A/I Services
Authoring Word/Word
Perfect
Video TV
Presentations In-person
Distribution Agents
Identifiers ISBN/ISSN
Format Transformations
Style 1992 2002
Articles Print PDF
Monographs Print Print
Research Data Astronomy,
maybe
Astronomy,
Chemistry
Preprints ArXiv DSpace
Annotation Not
implemented
Coming soon
Discovery A/I Services Metasearch
Authoring Word/Word
Perfect
Word
Video TV Adobe Flash
Video Player
Presentations In-person In-person
Distribution Agents Online one-off
Identifiers ISBN/ISSN DOI
Format Transformations
Style 1992 2002 2012
Articles Print PDF PDF/HTML
Monographs Print Print Print/EPUB?
Research Data Astronomy,
maybe
Astronomy,
Chemistry
Explosion of
Data
Preprints ArXiv DSpace Ubiquitous
Repositories
Annotation Not
implemented
Coming soon Prototypes
(Annotea)
Discovery A/I Services Metasearch Google
Authoring Word/Word
Perfect
Word Word
Video TV Adobe Flash
Video Player
YouTube
Presentations In-person In-person In-person
Distribution Agents Online one-off Big Deal
Identifiers ISBN/ISSN DOI ORCID
Format Transformations To Date
Style 1992 2002 2012 2022
Articles Print PDF PDF/HTML HTML5
Monographs Print Print Print/EPUB? EPUB
Research Data Astronomy,
maybe
Astronomy,
Chemistry
Explosion of
Data
Focus on Data
Management
Preprints ArXiv DSpace Ubiquitous
Repositories
Subject
repositories
Annotation Not
implemented
Coming soon Prototypes
(Annotea)
Hypothes.is
Discovery A/I Services Metasearch Google Google!
Authoring Word/Word
Perfect
Word Word G Docs
Video TV Adobe Flash
Video Player
YouTube Video
Platforms
Presentations In-person In-person In-person Zoom
Distribution Agents Online one-off Big Deal Open Access
Identifiers ISBN/ISSN DOI ORCID RoR
The Scholarly
Record of
the Future
Technology drives change, but
it doesn’t determine the
direction or eventual
destination.
High quality metadata is
the key to discovery and
interoperability online.
Whose responsibility is
the metadata creation
necessary for a
digitally-native scholarly record?
The open-science paradigm
allows for content to be
replicated, moved, analyzed
and republished
in different forms
In a distributed data ecosystem,
how do we build a notification system
to connect the disparate pieces?
The economics of creation of
the scholarly record and its
distribution are being
up-ended because of openness
and the shift away from
subscriptions.
Can publishers effectively manage
supporting dual systems
infrastructure (i.e., both subscription
and open) without more effective
workflows?
Scholarly publishers are
investing heavily in automated
processes to support the
increased pace of content
creation and to
maintain profitability
Can machine learning
and automated metadata creation
support the process of creating
a new scholarly record?
The pandemic has transformed
the nature of scholarly (all?)
interactions and virtuality will
remain a core component of
work moving forward
Can we create an
access control system that is
truly viable and adoptable,
world-wide across all institutions?
People will only change the
systems they use if they are
motivated to do so, either by
better benefit to them, or
because they are forced to.
Is the recognition system adapting to
incorporate these new scholarly
record creation tools such that it will
motivate its use in practice?
Format Transformations in the Future
Style 1992 2002 2012 2022 2032
Articles Print PDF PDF/HTML HTML5 Distributed
Monographs Print Print Print/EPUB? EPUB EPUB
Research Data Astronomy,
maybe
Astronomy,
Chemistry
Explosion of
Data
Focus on Data
Management
FAIR-REST
ready
Preprints ArXiv DSpace Ubiquitous
Repositories
Subject
repositories
Distributed
Web
Annotation Not
implemented
Coming soon Prototypes
(Annotea)
Hypothes.is Native web
annotation
Discovery A/I Services Metasearch Google Google! AI-driven
Authoring Word/Word
Perfect
Word Word G Docs Cloud/AI
supported
Video TV Adobe Flash
Video Player
YouTube Video
Platforms
Embedded
everywhere
Presentations In-person In-person In-person Zoom AR/VR
Distribution Agents Online one-off Big Deal Open Access Open Science
Identifiers ISBN/ISSN DOI ORCID RoR RAiD
A parting thought
WE’RE IN THE VERY
EARLY DAYS OF THIS
TRANSFORMATION
Less Than 40 Years Into the Era of Digital Content
● It has been estimated that in the 1450s fewer than 10% of books
included page numbers.
● It wasn’t until the first decade of the 16th Century that scholars
started to use page numbers *
Technology changes far faster than the cultural
changes that are required to drive widespread
adoption
* Source: Words Onscreen: The Fate of Reading in a Digital World by Naomi S. Baron
THANK YOU
Todd Carpenter, Executive Director, NISO
@TAC_NISO
tcarpenter@niso.org
www.niso.org

Carpenter "The Future of the Scholarly Record"

  • 1.
    The Future ofthe Scholarly Record Todd A. Carpenter Executive Director, NISO @TAC_NISO APE Annual Conference January 11, 2022
  • 2.
    “ ” You can’t connectthe dots looking forward; you can only connect them looking backwards. - Steve Jobs
  • 3.
    Scholars Have AlwaysShared Their Findings Image: Walters Art Museum
  • 4.
    This Moved fromBooks to Letters to Articles Image: Academy of Natural Sciences of Drexel University
  • 5.
    Scientists Also SharedData Ireland’s pre-1940 daily rainfall records, Geoscience Data Journal, Volume: 8, Issue: 1, Pages: 11-23, 16 August 2020, DOI: (10.1002/gdj3.103)
  • 6.
    The Early Internetwas Designed By and For Researchers Image: James L. Green, NASA
  • 7.
    Realizing the 1980sVision of HyperText Screen shot of the HyperTIES authoring tool Screen shot of an Intermedia educational course
  • 8.
  • 9.
    What Makes Someonea Digital Native? Image: https://tbirdseyeview.wordpress.com/2015/01/18/who-just-said-digital-natives/ Terese Bird, University of Leicester
  • 10.
  • 11.
    Realistically, It Needsto be Much More Than This
  • 12.
    Multi-format and multimedia Interoperable Machine-readableformats Adaptive design Accessible Transformable Atomize-able High quality metadata Preservable Linkable Trackable
  • 13.
    Multi-format and multimedia Interoperable Machine-readableformats Adaptive design Accessible Transformable Atomize-able High quality metadata Preservable Linkable Trackable ← Standards ← Standards ← Standards ← Standards ← Standards ← Standards ← Standards ← Standards ← Standards ← Standards
  • 14.
    What is NISOand what’s our role?
  • 15.
    Photo: Minneapolis Collegeof Art and Design Library
  • 16.
    ! Non-profit industrytrade association accredited by the American National Standards Institute (ANSI) ! Mission of developing and maintaining technical standards related to information, documentation, discovery and distribution of published materials and media ! Volunteer-driven organization: 500+ contributors spread out across the world, roughly 25% based outside the US ! Responsible (directly and indirectly) for standards like ISSN, DOI, Dublin Core metadata, DAISY digital talking books, OpenURL, MARC records, and ISBN About 16
  • 17.
    Previously, the Scholarly RecordDidn’t Include Much Data
  • 18.
    Large Scale Data-DrivenScience 18 “Increasingly, scientific breakthroughs will be powered by advanced computing capabilities that help researchers manipulate and explore massive datasets.”
  • 19.
    NSF DataNET Program DataNetprojects funded by US National Science Foundation (NSF), launched in 2007 in support of NSF’s Cyberinfrastructure Vision for 21st Century Discovery $100 million investment in five data networks over 10 years DataONE Data Conservancy SEAD Sustainable Environment - Actionable Data DataNet Federation Consortium Terra Populus Other data repository investments made outside of the US around this time: ! Australian National Data Service (Now ARDC) ! EU Open Data Portal ! UK JISC Repositories Support Project (RSP) ! DRIVER (Digital Repository Infrastructure Vision for European Research)
  • 20.
    CODATA-ICSTI Data CitationStandards and Practices u Launched by International Council for Scientific and Technical Information (ICSTI) and CODATA during the 27th General Assembly in Cape Town in October 2010 TASKS u Survey existing literature and existing data citation initiatives. u Obtain input from stakeholders in library, academic, publishing, and research communities. u Hold at least one meeting and a workshop to help establish a solid foundation of the state of the art and practices in this area. u Work with the ISO and major regional and national standards organizations to develop formal data citation standards and good practices. u Hosted a dozen meetings worldwide, produced two reports Out of Cite Out Of Mind: The Current State of Practice, Policy, and Technology for the Citation of Data (2013) and NAS For Attribution 20
  • 21.
    Beyond the PDF,San Diego, 2011
  • 22.
    The Amsterdam Manifesto onData Citation Drafted during a reception at the Future of Research Communications (to become FORCE11) conference in Amsterdam in March 2013 This led to the formation of a working group within FORCE11 to develop the JOINT DECLARATION OF DATA CITATION PRINCIPLES published in 2014. Endorsed by 125 organizations and nearly 300 individuals.
  • 23.
  • 24.
  • 25.
    FAIR into theFuture Since 2016, FAIR has caught on It has also spawned a variety of related work that is focused on the implementation of these ideas. (Many of which are focused on standards.)
  • 26.
  • 27.
    Photo: Minneapolis Collegeof Art and Design Library 27
  • 28.
    Photo: OSU ThompsonLibrary Stacks” by Kristin Six 28
  • 29.
    Today the Stacksof the Library Look Like This
  • 30.
    30 This is howyou browse bookBot ”stacks”
  • 31.
    A key problemtoday is this: 31
  • 32.
    A key problemtoday is this: You can’t walk the stacks in a digital library or browse the shelves in an online bookstore 32
  • 33.
    33 Digitally, all you can browseor search is metadata
  • 34.
    34 How much areyou investing in the quality of your metadata?
  • 35.
    The Algorithms DecideWhat You Discover
  • 36.
    The Algorithms DecideWhat You Discover
  • 37.
    Machines are ReadingMore Than Humans Are
  • 38.
    Machines Don’t CommunicateLike People Do Image: Betsy Streeter. Available from CartoonStock. https://bit.ly/391dVyh
  • 39.
    3 9 We Need toCreate Content they Can Consume
  • 40.
    Building the infrastructure tosupport a transformed scholarly record
  • 41.
    u “All disciplines,whether or not data- intensive, operate in a digital world where all the elements of the research process are connected or connectable in ways that permit them to be linked together as parts of a research workstream, with the possibility of digital interoperability across the ‘research cycle’.”
  • 42.
    u Launched in2009 u Since grown to include u 879 repositories u 29 million DOIs registered u +5.5 million in 2021 alone u Just in September 2021: u 13.9 million successful resolutions u On 3,769,169 objects
  • 43.
    u Launched in2012 u Since grown to include more than 11 million id assignments u 2,639,296 researchers registered for new ORCID identifiers just in 2020
  • 44.
    u Managed byCrossRef u Registry was donated to Crossref by Elsevier after funding information was added to its Content Registration schema in 2012 u Initially included about 4,000 funders u Now contains more than 21,536 funders u Connects more than 2.5 million published works with funding data Source CrossRef: https://www.crossref.org/pdfs/about-funder-registry.pdf
  • 45.
    u Research OrganizationsRegistry u Launched in 2019 u ROR launched with data from Digital Science’s GRID database u Now more than 97,000 institutions assigned
  • 46.
    u Newly developingISO standard u Launched by Australian Research Data Commons (ARDC) u Connects the various elements of a research project, from funding, to researchers, to protocols, to papers, data sets and other outputs. RAiD Explained video
  • 47.
    The Developing Research Graph PIDGraph KPI: Number of resources and links in the PID Graph available via GraphQL API as of May 4, 2020. Generated using (Fenner (2019a)). RAiD Projects
  • 48.
    Where can thisgraph take us?
  • 49.
    Investments in the infrastructureallow for tremendous downstream applications
  • 50.
    Yet all thisinfrastructure is expensive.
  • 51.
    Yet all thisinfrastructure is expensive. Is there a long-term commitment from key players to support this infrastructure?
  • 52.
    How scholarly communications culture (slowly) changes Usingdata sharing or discovery are just two examples. There are many, many more.
  • 53.
    Format Transformations Style 1992 ArticlesPrint Monographs Print Research Data Astronomy, maybe Preprints ArXiv Annotation Not implemented Discovery A/I Services Authoring Word/Word Perfect Video TV Presentations In-person Distribution Agents Identifiers ISBN/ISSN
  • 54.
    Format Transformations Style 19922002 Articles Print PDF Monographs Print Print Research Data Astronomy, maybe Astronomy, Chemistry Preprints ArXiv DSpace Annotation Not implemented Coming soon Discovery A/I Services Metasearch Authoring Word/Word Perfect Word Video TV Adobe Flash Video Player Presentations In-person In-person Distribution Agents Online one-off Identifiers ISBN/ISSN DOI
  • 55.
    Format Transformations Style 19922002 2012 Articles Print PDF PDF/HTML Monographs Print Print Print/EPUB? Research Data Astronomy, maybe Astronomy, Chemistry Explosion of Data Preprints ArXiv DSpace Ubiquitous Repositories Annotation Not implemented Coming soon Prototypes (Annotea) Discovery A/I Services Metasearch Google Authoring Word/Word Perfect Word Word Video TV Adobe Flash Video Player YouTube Presentations In-person In-person In-person Distribution Agents Online one-off Big Deal Identifiers ISBN/ISSN DOI ORCID
  • 56.
    Format Transformations ToDate Style 1992 2002 2012 2022 Articles Print PDF PDF/HTML HTML5 Monographs Print Print Print/EPUB? EPUB Research Data Astronomy, maybe Astronomy, Chemistry Explosion of Data Focus on Data Management Preprints ArXiv DSpace Ubiquitous Repositories Subject repositories Annotation Not implemented Coming soon Prototypes (Annotea) Hypothes.is Discovery A/I Services Metasearch Google Google! Authoring Word/Word Perfect Word Word G Docs Video TV Adobe Flash Video Player YouTube Video Platforms Presentations In-person In-person In-person Zoom Distribution Agents Online one-off Big Deal Open Access Identifiers ISBN/ISSN DOI ORCID RoR
  • 57.
  • 58.
    Technology drives change,but it doesn’t determine the direction or eventual destination.
  • 59.
    High quality metadatais the key to discovery and interoperability online.
  • 60.
    Whose responsibility is themetadata creation necessary for a digitally-native scholarly record?
  • 61.
    The open-science paradigm allowsfor content to be replicated, moved, analyzed and republished in different forms
  • 62.
    In a distributeddata ecosystem, how do we build a notification system to connect the disparate pieces?
  • 63.
    The economics ofcreation of the scholarly record and its distribution are being up-ended because of openness and the shift away from subscriptions.
  • 64.
    Can publishers effectivelymanage supporting dual systems infrastructure (i.e., both subscription and open) without more effective workflows?
  • 65.
    Scholarly publishers are investingheavily in automated processes to support the increased pace of content creation and to maintain profitability
  • 66.
    Can machine learning andautomated metadata creation support the process of creating a new scholarly record?
  • 67.
    The pandemic hastransformed the nature of scholarly (all?) interactions and virtuality will remain a core component of work moving forward
  • 68.
    Can we createan access control system that is truly viable and adoptable, world-wide across all institutions?
  • 69.
    People will onlychange the systems they use if they are motivated to do so, either by better benefit to them, or because they are forced to.
  • 70.
    Is the recognitionsystem adapting to incorporate these new scholarly record creation tools such that it will motivate its use in practice?
  • 71.
    Format Transformations inthe Future Style 1992 2002 2012 2022 2032 Articles Print PDF PDF/HTML HTML5 Distributed Monographs Print Print Print/EPUB? EPUB EPUB Research Data Astronomy, maybe Astronomy, Chemistry Explosion of Data Focus on Data Management FAIR-REST ready Preprints ArXiv DSpace Ubiquitous Repositories Subject repositories Distributed Web Annotation Not implemented Coming soon Prototypes (Annotea) Hypothes.is Native web annotation Discovery A/I Services Metasearch Google Google! AI-driven Authoring Word/Word Perfect Word Word G Docs Cloud/AI supported Video TV Adobe Flash Video Player YouTube Video Platforms Embedded everywhere Presentations In-person In-person In-person Zoom AR/VR Distribution Agents Online one-off Big Deal Open Access Open Science Identifiers ISBN/ISSN DOI ORCID RoR RAiD
  • 72.
  • 73.
    WE’RE IN THEVERY EARLY DAYS OF THIS TRANSFORMATION
  • 74.
    Less Than 40Years Into the Era of Digital Content ● It has been estimated that in the 1450s fewer than 10% of books included page numbers. ● It wasn’t until the first decade of the 16th Century that scholars started to use page numbers * Technology changes far faster than the cultural changes that are required to drive widespread adoption * Source: Words Onscreen: The Fate of Reading in a Digital World by Naomi S. Baron
  • 75.
    THANK YOU Todd Carpenter,Executive Director, NISO @TAC_NISO [email protected] www.niso.org