Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Archiving the Evolving Scholarly Record: A Perspective
Herbert Van de Sompel
@hvdsomp
Los Alamos National Laboratory
Acknowledgments: Andrew Treloar, @atreloar , ANDS
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
In This Talk
1. Functions of scholarly communication
2. Pointers to the future
3. Characterizing the future
1. Archiving the future
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Functions of Scholarly Communication
• Registration: Allows claims of precedence for a scholarly finding
• Certification: Establishes validity of the claim
• Awareness: Allows actors in the system to remain aware of new
claims
• Archiving: Preserves the scholarly record over time
Roosendaal, H, Geurts, C. (1997) Forces and functions in scientific communication
http://www.physik.uni-oldenburg.de/conferences/crisp97/roosendaal.html
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
System of Journals, Paper Version
• Registration: Manuscript submission
• Certification: Peer review
• Awareness: alerts, library shelf surfing
• Archiving: Journals in library stacks
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
System of Journals, Digital Version
• Registration: Manuscript submission
• Certification: Peer review
• Awareness: Various web discovery services
• Archiving: Special purpose archives (e.g. Portico), publishers
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
In This Talk
1. Functions of scholarly communication
2. Pointers to the future
3. Characterizing the future
1. Archiving the future
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Pointers to the Future
“The future is already here – it’s just not
very evenly distributed”
William Gibson
Gibson, W. (1999) The Science in Science FIction, NPR Interview
http://www.npr.org/templates/story/story.php?storyId=1067220
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Registration - BioRxiv
http://biorxiv.org
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Registration - GitHub
http://github.com
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Registration – slideshare
http://www.slideshare.net/hvdsomp/presentations
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Registration - WikiPathways
http://wikipathways.org/index.php/WikiPathways
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Registration - Neurolex
http://neurolex.org/wiki/Category:Olfactory_cortex_horizontal_cell
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Registration – Research Objects
http://researchobject.org/
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Registration - Observations
• Registration of wide variety of objects
• dynamic, compound, inter-related, distributed across the web
• Decoupling registration from certification
• Time stamping, versioning
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Certification – PubMed Commons
http://www.ncbi.nlm.nih.gov/pubmedcommons/
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Certification – The Open Journal
http://theoj.org
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Certification – slideshare
http://www.slideshare.net/hvdsomp/presentations
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Certification – Project FeederWatch
http://feederwatch.org
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Certification - Observations
• Certification decoupled from registration
• Certification of various types of objects
• Social interactions validating
• Machines validating
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Awareness – Twitter
http://twitter.com
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Awareness – myexperiment
http://myexperiment.org/
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Awareness – NARCIS
http://narcis.nl/
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Awareness – eLabNoteBook RSS Feeds
http://malaria.ourexperiment.org/feeds
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Awareness - Observations
• Awareness for various types of objects
• Real time awareness
• Awareness through social media
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Archiving – CLOCKSS
http://www.clockss.org/
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Archiving – DANS Easy
http://easy.dans.knaw.nl/
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Archiving – Australian Antarctic Data Centre
http://data.aad.gov.au/
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Archiving – perma.cc
http://perma.cc
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Archiving – EU Trusted Digital Repositories
http://trusteddigitalrepository.eu/Site/Welcome.html
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Archiving - Observations
• Archiving/Archives for various types of objects
• Distributed archives
• Archival consortia
• Audit for trustworthiness
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
In This Talk
1. Functions of scholarly communication
2. Pointers to the future
3. Characterizing the future
1. Archiving the future
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
The Future
• Registration
• Wide variety of objects
• Versions of objects
• Interrelated, interdependent objects
• Certification
• Variety of certification mechanisms
• Decoupled from / Overlaid upon Registration
• Awareness
• Real-time
• Social
• Variety of objects
• Archiving …
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Characterizing the Future – Scholarly Communication
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Characterizing the Future – Communicated Objects
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
In This Talk
1. Functions of scholarly communication
2. Pointers to the future
3. Characterizing the future
1. Archiving the future
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
The Future – Core Observations
• The research process, not just its outcome, is becoming visible …
on the web
• Massive extension of the scholarly record with an enormous variety
of novel objects
• The objects are heterogeneous, dynamic, compound, inter-related
and distributed across the web
• The objects are often hosted on common web platforms that are not
dedicated to scholarship
The archival paradigm must take these characteristics into account
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Considerations about Archiving
• On the right track?
• Capturing paradigms
• Pockets of persistence
• Recording versus Archiving
• A perspective on scholarly infrastructure
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Considerations about Archiving
• On the right track?
• Capturing paradigms
• Pockets of persistence
• Recording versus Archiving
• A perspective on scholarly infrastructure
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Web-Based Journal System – Links to Articles
• Special-purpose archival solutions
for articles
• Rosenthal finds that what is archived
is too few, too healthy, too easy
• Attempts with the Keepers Registry
to map out what is archived
• Based on [ISSN, volume, issue],
not on DOI, HTTP URI
David Rosenthal (2013) Patio Perspectives at ANADP II: Preserving the Other Half
http://blog.dshr.org/2013/11/patio-perspectives-at-anadp-ii.html
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Web-Based Journal System – Links to Articles
Peter Burnhill (2014) Ensuring access to digital back copy
http://www.cni.org/topics/digital-preservation/ensuring-access-to-digital-back-copy/
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Web-Based Journal System – Links to Web at Large Resources
• Web archives contain snapshots, the
result of incidental archiving
• The Hiberlink project finds that for the
large majority of these “Web at Large”
resources, no temporally appropriate
archived versions exist
• Memento infrastructure allows auditing
what is globally archived based on
HTTP URI
http://hiberlink.org
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Links Abstracted to Top Level Domain Targets
Martin Klein, Herbert Van de Sompel et al. (2014) Scholarly context not found
To appear in PLoS ONE on December 26 2014
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Loss of Current Context – Link Rot
Martin Klein, Herbert Van de Sompel et al. (2014) Scholarly context not found
To appear in PLoS ONE on December 26 2014
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Loss of Past Context – Archival Status (14 day window)
Martin Klein, Herbert Van de Sompel et al. (2014) Scholarly context not found
To appear in PLoS ONE on December 26 2014
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Considerations about Archiving
• On the right track?
• Capturing paradigms
• Pockets of persistence
• Recording versus Archiving
• A perspective on scholarly infrastructure
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Perspective on “Repository” Capture Paradigm
• Atomic object
• Finalized object
• Removal of context
• Perspective on object: file in a file
system
• Capture request by owner of object
• Capture time decided by owner of
object
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Perspective on “Web” Capture Paradigm
• Compound object (context essential)
• Constituents of compound object in
flux
• Perspective on constituents:
resources with URIs on the web
• Capture request by user of the
constituents, owned by self, owned by
3rd parties
• Capture time decided by user of the
constituents
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Considerations about Archiving
• On the right track?
• Capturing paradigms
• Pockets of persistence
• Recording versus Archiving
• A perspective on scholarly infrastructure
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Creating Pockets of Persistence
How to achieve the ability to:
• Persistently
• Precisely
• Seamlessly
revisit the Scholarly Web of
the Past and of the Now at
some point in the Future
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Creating Pockets of Persistence
How to achieve the ability to:
• Persistently
• Precisely
• Seamlessly
revisit the Scholarly Web of
the Past and of the Now at
some point in the Future
This challenge exists for the entire web,
but some communities actually care
about addressing it:
• scholarly communication,
• legal publications,
• journalism,
• Wikipedia,
• …
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Pro-Active Capture for a Seed Collection
• Seed Collection - Starting point for capture is a seed collection of
interest to communities that care, e.g.
o Scholarly literature
o Legal documents
o On-Line journalism
o Wikipedia articles
• Lifecycle Events – Intervene at critical moments in the lifecycle of
items in these collections to pro-actively capture
o Collection items – some solutions in place
o Web resources referenced in collection items
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Pro-Active Capture for a Seed Collection
• Request by user of a A to capture A,
B, C, D, E
• Request for capture may result in
• In-situ or remote capture
• Creation of snapshot or creation
of trace
• Archival URI, capture datetime
• Interoperability for on-demand
capture
• Orchestration of capture process
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Pro-Active Capture for Seed Collection
• What those crucial lifecycle events are may depend on the
collection type
Wikipedia
• Creation of new article
• Creation of new version of
article
• Creation of substantially
new version of article
• Addition of external
reference to article
• References to article
exceed a certain threshold
Scholarly Literature
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Scholarly Literature: Experimental Zotero Extension
Richard Wincewicz (2014) Prototype Hiberlink plugin for Zotero
https://www.youtube.com/v/ZYmi_Ydr65M%26vq
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Scholarly Literature: Experimental HiberActive Service
Martin Klein et al. (2014) HiberActive: Pro-Active Archiving of web references
Open Repositories 2014 http://www.slideshare.net/martinklein0815/hiberactive
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Considerations about Archiving
• On the right track?
• Capturing paradigms
• Pockets of persistence
• Recording versus Archiving
• A perspective on scholarly infrastructure
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Web Platforms for Scholarship
• Increasingly, common web platforms are used for scholarship
• GitHub, Wikis, Wordpress, etc.
• Many of these platforms have desirable characteristics
• Versioning
• Time stamping
• Social embedding
• But, these platforms record rather than archive
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Recording is not Archiving
“GitHub reserves the right at any time and from time to time to
modify or discontinue, temporarily or permanently, the Service (or
any part thereof) with or without notice.”
“GitHub does not warrant that (i) the service will meet your specific
requirements, (ii) the service will be uninterrupted, timely, secure, or
error-free, (iii) the results that may be obtained from the use of the
service will be accurate or reliable, (iv) the quality of any products,
services, information, or other material purchased or obtained by
you through the service will meet your expectations, and (v) any
errors in the Service will be corrected.”
GitHub Terms of Service
http://help.github.com/articles/github-terms-of-service
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Recording versus Archiving
Recording Archiving
Short-term Longer-term
No guarantees provided Attempt to provide guarantees
Write many/read many Write once/Read many
Scholarly process Scholarly record
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Considerations about Archiving
• On the right track?
• Capturing paradigms
• Recording versus Archiving
• A perspective on scholarly infrastructure
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Infrastructure Considerations
• Various incentives to move objects from Private to Recording:
• Share with self, team, comply with funder requirements
• Objects in Recording are network accessible and in global (HTTP)
namespace
• Within reach of web-scale processes aimed at selectively
moving them from Recording to Archiving
• Core aspects of these processes include
• Ability to snapshot the state of interlinked objects at specific
moments in their lifecycle
• Transfer of snapshots from Recording platforms to appropriate,
distributed Archive platforms (interoperability)
• Curatorial decisions regarding what should be captured
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Curatorial Considerations
• What are the criteria involved in deciding (which states of) which
objects get captured/archived?
• What triggers transition from Recording to Archiving?
• On-demand in lifecycle, social status of the object, reference
made to object, deliberate randomness for serendipity, …
• What to archive?
• Snapshot of object or trace of object (metadata, provenance, …)
?
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Final Considerations
• Need organizational, technical, and curatorial interfaces between
Recording and Archiving platforms
• Need organizational and technical interfaces across Archiving
platforms
Herbert Van de Sompel
OCLC ESR, Washington, DC, December 10 2014
Archiving the Evolving Scholarly Record: A Perspective
Herbert Van de Sompel
@hvdsomp
Los Alamos National Laboratory
Acknowledgments: Andrew Treloar, @atreloar , ANDS

A Perspective on Archiving the Scholarly Record

  • 1.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Archiving the Evolving Scholarly Record: A Perspective Herbert Van de Sompel @hvdsomp Los Alamos National Laboratory Acknowledgments: Andrew Treloar, @atreloar , ANDS
  • 2.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 In This Talk 1. Functions of scholarly communication 2. Pointers to the future 3. Characterizing the future 1. Archiving the future
  • 3.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Functions of Scholarly Communication • Registration: Allows claims of precedence for a scholarly finding • Certification: Establishes validity of the claim • Awareness: Allows actors in the system to remain aware of new claims • Archiving: Preserves the scholarly record over time Roosendaal, H, Geurts, C. (1997) Forces and functions in scientific communication http://www.physik.uni-oldenburg.de/conferences/crisp97/roosendaal.html
  • 4.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 System of Journals, Paper Version • Registration: Manuscript submission • Certification: Peer review • Awareness: alerts, library shelf surfing • Archiving: Journals in library stacks
  • 5.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 System of Journals, Digital Version • Registration: Manuscript submission • Certification: Peer review • Awareness: Various web discovery services • Archiving: Special purpose archives (e.g. Portico), publishers
  • 6.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 In This Talk 1. Functions of scholarly communication 2. Pointers to the future 3. Characterizing the future 1. Archiving the future
  • 7.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Pointers to the Future “The future is already here – it’s just not very evenly distributed” William Gibson Gibson, W. (1999) The Science in Science FIction, NPR Interview http://www.npr.org/templates/story/story.php?storyId=1067220
  • 8.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Registration - BioRxiv http://biorxiv.org
  • 9.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Registration - GitHub http://github.com
  • 10.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Registration – slideshare http://www.slideshare.net/hvdsomp/presentations
  • 11.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Registration - WikiPathways http://wikipathways.org/index.php/WikiPathways
  • 12.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Registration - Neurolex http://neurolex.org/wiki/Category:Olfactory_cortex_horizontal_cell
  • 13.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Registration – Research Objects http://researchobject.org/
  • 14.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Registration - Observations • Registration of wide variety of objects • dynamic, compound, inter-related, distributed across the web • Decoupling registration from certification • Time stamping, versioning
  • 15.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Certification – PubMed Commons http://www.ncbi.nlm.nih.gov/pubmedcommons/
  • 16.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Certification – The Open Journal http://theoj.org
  • 17.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Certification – slideshare http://www.slideshare.net/hvdsomp/presentations
  • 18.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Certification – Project FeederWatch http://feederwatch.org
  • 19.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Certification - Observations • Certification decoupled from registration • Certification of various types of objects • Social interactions validating • Machines validating
  • 20.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Awareness – Twitter http://twitter.com
  • 21.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Awareness – myexperiment http://myexperiment.org/
  • 22.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Awareness – NARCIS http://narcis.nl/
  • 23.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Awareness – eLabNoteBook RSS Feeds http://malaria.ourexperiment.org/feeds
  • 24.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Awareness - Observations • Awareness for various types of objects • Real time awareness • Awareness through social media
  • 25.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Archiving – CLOCKSS http://www.clockss.org/
  • 26.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Archiving – DANS Easy http://easy.dans.knaw.nl/
  • 27.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Archiving – Australian Antarctic Data Centre http://data.aad.gov.au/
  • 28.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Archiving – perma.cc http://perma.cc
  • 29.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Archiving – EU Trusted Digital Repositories http://trusteddigitalrepository.eu/Site/Welcome.html
  • 30.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Archiving - Observations • Archiving/Archives for various types of objects • Distributed archives • Archival consortia • Audit for trustworthiness
  • 31.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 In This Talk 1. Functions of scholarly communication 2. Pointers to the future 3. Characterizing the future 1. Archiving the future
  • 32.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 The Future • Registration • Wide variety of objects • Versions of objects • Interrelated, interdependent objects • Certification • Variety of certification mechanisms • Decoupled from / Overlaid upon Registration • Awareness • Real-time • Social • Variety of objects • Archiving …
  • 33.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Characterizing the Future – Scholarly Communication
  • 34.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Characterizing the Future – Communicated Objects
  • 35.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 In This Talk 1. Functions of scholarly communication 2. Pointers to the future 3. Characterizing the future 1. Archiving the future
  • 36.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 The Future – Core Observations • The research process, not just its outcome, is becoming visible … on the web • Massive extension of the scholarly record with an enormous variety of novel objects • The objects are heterogeneous, dynamic, compound, inter-related and distributed across the web • The objects are often hosted on common web platforms that are not dedicated to scholarship The archival paradigm must take these characteristics into account
  • 37.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Considerations about Archiving • On the right track? • Capturing paradigms • Pockets of persistence • Recording versus Archiving • A perspective on scholarly infrastructure
  • 38.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Considerations about Archiving • On the right track? • Capturing paradigms • Pockets of persistence • Recording versus Archiving • A perspective on scholarly infrastructure
  • 39.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Web-Based Journal System – Links to Articles • Special-purpose archival solutions for articles • Rosenthal finds that what is archived is too few, too healthy, too easy • Attempts with the Keepers Registry to map out what is archived • Based on [ISSN, volume, issue], not on DOI, HTTP URI David Rosenthal (2013) Patio Perspectives at ANADP II: Preserving the Other Half http://blog.dshr.org/2013/11/patio-perspectives-at-anadp-ii.html
  • 40.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Web-Based Journal System – Links to Articles Peter Burnhill (2014) Ensuring access to digital back copy http://www.cni.org/topics/digital-preservation/ensuring-access-to-digital-back-copy/
  • 41.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Web-Based Journal System – Links to Web at Large Resources • Web archives contain snapshots, the result of incidental archiving • The Hiberlink project finds that for the large majority of these “Web at Large” resources, no temporally appropriate archived versions exist • Memento infrastructure allows auditing what is globally archived based on HTTP URI http://hiberlink.org
  • 42.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Links Abstracted to Top Level Domain Targets Martin Klein, Herbert Van de Sompel et al. (2014) Scholarly context not found To appear in PLoS ONE on December 26 2014
  • 43.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Loss of Current Context – Link Rot Martin Klein, Herbert Van de Sompel et al. (2014) Scholarly context not found To appear in PLoS ONE on December 26 2014
  • 44.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Loss of Past Context – Archival Status (14 day window) Martin Klein, Herbert Van de Sompel et al. (2014) Scholarly context not found To appear in PLoS ONE on December 26 2014
  • 45.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Considerations about Archiving • On the right track? • Capturing paradigms • Pockets of persistence • Recording versus Archiving • A perspective on scholarly infrastructure
  • 46.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Perspective on “Repository” Capture Paradigm • Atomic object • Finalized object • Removal of context • Perspective on object: file in a file system • Capture request by owner of object • Capture time decided by owner of object
  • 47.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Perspective on “Web” Capture Paradigm • Compound object (context essential) • Constituents of compound object in flux • Perspective on constituents: resources with URIs on the web • Capture request by user of the constituents, owned by self, owned by 3rd parties • Capture time decided by user of the constituents
  • 48.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Considerations about Archiving • On the right track? • Capturing paradigms • Pockets of persistence • Recording versus Archiving • A perspective on scholarly infrastructure
  • 49.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Creating Pockets of Persistence How to achieve the ability to: • Persistently • Precisely • Seamlessly revisit the Scholarly Web of the Past and of the Now at some point in the Future
  • 50.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Creating Pockets of Persistence How to achieve the ability to: • Persistently • Precisely • Seamlessly revisit the Scholarly Web of the Past and of the Now at some point in the Future This challenge exists for the entire web, but some communities actually care about addressing it: • scholarly communication, • legal publications, • journalism, • Wikipedia, • …
  • 51.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Pro-Active Capture for a Seed Collection • Seed Collection - Starting point for capture is a seed collection of interest to communities that care, e.g. o Scholarly literature o Legal documents o On-Line journalism o Wikipedia articles • Lifecycle Events – Intervene at critical moments in the lifecycle of items in these collections to pro-actively capture o Collection items – some solutions in place o Web resources referenced in collection items
  • 52.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Pro-Active Capture for a Seed Collection • Request by user of a A to capture A, B, C, D, E • Request for capture may result in • In-situ or remote capture • Creation of snapshot or creation of trace • Archival URI, capture datetime • Interoperability for on-demand capture • Orchestration of capture process
  • 53.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Pro-Active Capture for Seed Collection • What those crucial lifecycle events are may depend on the collection type Wikipedia • Creation of new article • Creation of new version of article • Creation of substantially new version of article • Addition of external reference to article • References to article exceed a certain threshold Scholarly Literature
  • 54.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Scholarly Literature: Experimental Zotero Extension Richard Wincewicz (2014) Prototype Hiberlink plugin for Zotero https://www.youtube.com/v/ZYmi_Ydr65M%26vq
  • 55.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Scholarly Literature: Experimental HiberActive Service Martin Klein et al. (2014) HiberActive: Pro-Active Archiving of web references Open Repositories 2014 http://www.slideshare.net/martinklein0815/hiberactive
  • 56.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Considerations about Archiving • On the right track? • Capturing paradigms • Pockets of persistence • Recording versus Archiving • A perspective on scholarly infrastructure
  • 57.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Web Platforms for Scholarship • Increasingly, common web platforms are used for scholarship • GitHub, Wikis, Wordpress, etc. • Many of these platforms have desirable characteristics • Versioning • Time stamping • Social embedding • But, these platforms record rather than archive
  • 58.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Recording is not Archiving “GitHub reserves the right at any time and from time to time to modify or discontinue, temporarily or permanently, the Service (or any part thereof) with or without notice.” “GitHub does not warrant that (i) the service will meet your specific requirements, (ii) the service will be uninterrupted, timely, secure, or error-free, (iii) the results that may be obtained from the use of the service will be accurate or reliable, (iv) the quality of any products, services, information, or other material purchased or obtained by you through the service will meet your expectations, and (v) any errors in the Service will be corrected.” GitHub Terms of Service http://help.github.com/articles/github-terms-of-service
  • 59.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Recording versus Archiving Recording Archiving Short-term Longer-term No guarantees provided Attempt to provide guarantees Write many/read many Write once/Read many Scholarly process Scholarly record
  • 60.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Considerations about Archiving • On the right track? • Capturing paradigms • Recording versus Archiving • A perspective on scholarly infrastructure
  • 61.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014
  • 62.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Infrastructure Considerations • Various incentives to move objects from Private to Recording: • Share with self, team, comply with funder requirements • Objects in Recording are network accessible and in global (HTTP) namespace • Within reach of web-scale processes aimed at selectively moving them from Recording to Archiving • Core aspects of these processes include • Ability to snapshot the state of interlinked objects at specific moments in their lifecycle • Transfer of snapshots from Recording platforms to appropriate, distributed Archive platforms (interoperability) • Curatorial decisions regarding what should be captured
  • 63.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Curatorial Considerations • What are the criteria involved in deciding (which states of) which objects get captured/archived? • What triggers transition from Recording to Archiving? • On-demand in lifecycle, social status of the object, reference made to object, deliberate randomness for serendipity, … • What to archive? • Snapshot of object or trace of object (metadata, provenance, …) ?
  • 64.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Final Considerations • Need organizational, technical, and curatorial interfaces between Recording and Archiving platforms • Need organizational and technical interfaces across Archiving platforms
  • 65.
    Herbert Van deSompel OCLC ESR, Washington, DC, December 10 2014 Archiving the Evolving Scholarly Record: A Perspective Herbert Van de Sompel @hvdsomp Los Alamos National Laboratory Acknowledgments: Andrew Treloar, @atreloar , ANDS