@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Herbert Van de Sompel
Los Alamos National Laboratory @hvdsomp
http://orcid.org/0000-0002-0715-6126
Michael L. Nelson
Old Dominion University @phonedude_mln
http://orcid.org/0000-0003-3749-8116
Martin Klein
Los Alamos National Laboratory @mart1nkle1n
http://orcid.org/0000-0003-0130-2097
To the Rescue of the
Orphans of Scholarly Communication
The project is funded by the Andrew W. Mellon Foundation
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
• Problem statement
Scholarly objects are everywhere on the web, and are not
systematically archived
• Project perspective
Capturing objects using an institutional & web archiving paradigm
• Object capture flow:
• Step 1: Discovering a researcher’s web identities
• Step 2: Discovering artifacts per web identity
• Step 3: Determining the web boundary per artifact
• Step 4: Capturing resources in the artifacts’ web boundary
Outline
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Scholarship is Evolving
• The research process, not just its outcome, is becoming visible …
on the web
• Massive extension of the scholarly record with an enormous variety
of novel objects
• The objects are heterogeneous, dynamic, compound, inter-related
and distributed across the web
• The objects are often hosted on common web platforms that are not
dedicated to scholarship
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
101 Innovations in Scholarly Communication
Bianca Kramer & Jeroen Bosman. 101 Innovations in Scholarly Communication
https://innoscholcomm.silk.co/
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
The Evolving Scholarly Record
Brian Lavoie et al. (2014) The Evolving Scholarly Record
http://www.oclc.org/content/dam/research/publications/library/2014/oclcresearch-evolving-scholarly-
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Web Platforms Record Scholarship
• Increasingly, common web platforms are used for scholarship
• GitHub, Wikis, Wordpress, etc.
• Many of these platforms have desirable characteristics
• Versioning
• Time stamping
• Social embedding
• But, these platforms record rather than archive
Herbert Van de Sompel & Andrew Treloar (2014) A Perspective on Archiving the Scholarly Web
http://public.lanl.gov/herbertv/papers/Papers/2014/iPres2014_Sompel_Treloar.pdf
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Recording is not Archiving
“GitHub reserves the right at any time and from time to time to
modify or discontinue, temporarily or permanently, the Service (or
any part thereof) with or without notice.”
GitHub Terms of Service
http://help.github.com/articles/github-terms-of-service
https://help.github.com/articles/github-terms-of-service/
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Recording is not Archiving
GitHub Terms of Service
http://help.github.com/articles/github-terms-of-service
https://opensource.googleblog.com/2015/03/farewell-to-google-code.html
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Recording versus Archiving
Recording Archiving
Short-term Longer-term
No guarantees provided Attempt to provide guarantees
Write many/read many Write once/Read many
Scholarly process Scholarly record
Herbert Van de Sompel & Andrew Treloar (2014) A Perspective on Archiving the Scholarly Web
http://public.lanl.gov/herbertv/papers/Papers/2014/iPres2014_Sompel_Treloar.pdf
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Meet Some New School Researchers
Ian Milligan Mark Matienzo
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Meet Some New School Researchers
Ian Milligan
https://ianmilligan.ca/
https://www.slideshare.net/IanMilligan1
https://github.com/ianmilligan1
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Meet Some New School Researchers
Mark Matienzo
http://matienzo.org/
https://www.slideshare.net/anarchivist/presentations
https://github.com/anarchivist
https://osf.io/tgr4k/
https://www.drupal.org/user/380762
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
SlideShare Artifact: 0 Mementos
https://www.slideshare.net/IanMilligan1/resaw-geo-cities
http://timetravel.mementoweb.org/list/20140513211653/https://www.slideshare.net/IanMilligan1/resa
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
GitHub Artifact: 1 Memento
https://github.com/ianmilligan1/Historian-WARC-1
http://web.archive.org/web/*/https://github.com/ianmilligan1/Historian-WARC-1
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
The Scholarly Orphans Project
• Funded by the Andrew W, Mellon Foundation
• Los Alamos National Laboratory & New Mexico Consortium
• Old Dominion University
• 04/2016 - 03/2019
• How to capture scholarly orphans for long-term archiving?
• Project explores a paradigm inspired by web archiving
• Scale of the problem
• Bilateral agreements with most web portals unlikely
• Project explores an institution driven paradigm
• Institution should be interested in capturing the artifacts its
scholars deposit on the web
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
An Institutional & Web Archiving Perspective
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Related Work
• LOCKSS
• Web crawling approach
• Focused on journal literature
• Archive-It
• On-demand, subscription-based web archiving
• Not focused on scholarly orphans
• Institutional repository
• Capture an institution’s output
• Focused on manual upload (of journal literature)
• The Locker Project
• Capture an individual’s web presence
• Not focused on scholarly orphans
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Capture Flow
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Capture Flow – Step 1
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Algorithmic Discovery of Web Identities
James Powell et al. (2014) EgoSystem: Where are our alumni?
In: code4lib http://journal.code4lib.org/articles/9519
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Discovery of Web Identities via a Registry: ORCID
Martin Klein and Herbert Van de Sompel (2017) Discovering scholarly orphans using ORCID
In: JCDL2017 https://arxiv.org/abs/1703.09343
Ian Milligan
http://orcid.org/0000-0002-1470-7723
Mark Matienzo
http://orcid.org/0000-0003-3270-1306
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Ian Milligan’s ORCID
• Web Identities: 0
http://orcid.org/0000-0002-1470-7723
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Mark Matienzo’s ORCID
• Web Identities: 3
(homepage, ScopusID,
ResearcherID)
http://orcid.org/0000-0003-3270-1306
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Mark Matienzo’s Home Page
• URI to GitHub
repository, Twitter
• Could be included in
ORCID profile
http://matienzo.org/
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
• Evaluation of ORCID for automatic discovery of Web Identities
• How well does ORCID represent the global community of active
researchers?
• Adoption rate
• Subject coverage
• Geo-location coverage
• How well does ORCID score when it comes to listing Web Identities?
Discovery of Web Identities via a Registry: ORCID
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
ORCID - Adoption Rate
2013 2014 2015 2016
05000001000000150000020000002500000
ORCIDs total
ORCIDs with given names
ORCIDs with first names
ORCIDs with works
ORCIDs with affiliations
ORCIDs with web identities
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
ORCID - Subject Coverage
0
10
20
30
40
50
60
Other
Life Sciences
Physical
Sciences
Mathematics and
Computer Sciences
Education
Psychology and
Social Sciences
Engineering
Humanities and Arts
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
ORCID Subjects
Ph.D. Researchers
Publications
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
ORCID - Geo-Location Coverage
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
ORCID - Geo-Location Coverage
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
ORCID - Geo-Location Coverage
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
ORCID - Web Identities
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
ORCID - Web Identities
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Discovery of Web Identities via a Registry: ORCID
• Adoption rate is increasing
• Subject coverage is focused, does not cover all disciplines equally
• Geo-Location coverage is good but not quite representative
• Web Identity coverage is poor; not usable for our purpose in its
current state
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Capture Flow – Step 2
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Discovery of Artifacts per Web Identity
• Algorithmic approach
• Scrape artifacts from
pages
http://matienzo.org/publications/
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Discovery of Artifacts per Web Identity
• Notifications
• Subscribe to portal
notifications about a
researcher’s new
artifacts
https://www.slideshare.net/anarchivist/presentations
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Discovery of Artifacts per Web Identity
• Artifact Registry
• 5 artifacts of interest
(standards document,
reports, book reviews)
http://orcid.org/0000-0003-3270-1306
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Capture Flow – Step 3
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Determination of Web Boundary per Artifact
http://signposting.org
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
HTTP Links
Mark Nottingham (2010) RFC5988: Web Linking.
http://tools.iets.org/rfc/rfc5988.txt
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
HTTP Links
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
HTTP Links
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Signposting - Publication Boundary Pattern
http://signposting.org/publication_boundary/oxford/
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Signposting - Bibliographic Metadata Pattern
http://signposting.org/bibliographic_metadata/springer/
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Capture Flow – Step 4
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
• Legal
• robots.txt
• Licenses
• Technical
• Capture tools
• Capture quality
• Capture authenticity
Challenges Regarding Capturing Web Artifacts
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Legal Challenges re Capturing Artifacts – A Wake-Up Call
SlideShare
• robots.txt unclear, some pages disallowed
• License seems to prohibit archiving
GitHub
• robots.txt unclear, some pages disallowed
• License seems to allow archiving
Drupal
• robots.txt allows relevant URIs
• License seems to prohibit archiving
Open Science Framework
• robots.txt does not disallow crawlers
• License does not mention archiving, individual content may have
specific license
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Capture Tools Challenges: Mark’s SlideShare
Live
Internet Archive
Webrecorder.io
https://www.slideshare.net/anarchivist/to-hell-with-good-intentions-linked-data-and-the-power-to-name
http://web.archive.org/web/20161229053246/http://www.slideshare.net/anarchivist/to-hell-with-good-intentions-linked-data-and-the-power-to-name
https://webrecorder.io/martinklein/cni_test/20170330014029/https://www.slideshare.net/anarchivist/to-hell-with-good-intentions-linked-data-and-
the-power-to-name
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Capture Tools Challenges: Mark’s GitHub
Live
Internet Archive
Webrecorder.io
https://github.com/rightsstatements/rightsstatements.github.io
https://web.archive.org/web/20170328040646/https://github.com/rightsstatements/rightsstatements.github.io
https://webrecorder.io/martinklein/cni_test/20170330014135/https://github.com/rightsstatements/rightsstatements.github.io
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Capture Tools Challenges: Mark’s OSF
Live
Internet Archive
Webrecorder.io
https://osf.io/h4ru8/wiki/home/
http://web.archive.org/web/20170328042647/https://osf.io/h4ru8/wiki/home/
https://webrecorder.io/martinklein/cni_test/20170330014219/https://osf.io/h4ru8/wiki/home/
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Capture Quality - How well was this page archived?
• Continuing research on memento
damage, first published at JCDL 2014
• Premise: simply reporting “9/10
embedded images were archived” is
insufficient to describe how well the
archive / replay system performed
• Use heuristics from Mechanical Turk
testing to approximate human
conception of damage, e.g.:
o increase weight of missing images
that are large, or centered in the
viewport
o stylesheets can be important!
check for “ugly” results
J.F. Brunelle, M. Kelly, H. SalahEldeen M. C. Weigle, and M. L. Nelson (2014) Not all mementos are created equal:
Measuring the impact of missing resources. In: JCDL 2014
http://dx.doi.org/10.1109/JCDL.2014.6970187 http://dx.doi.org/10.1007/s00799-015-0150-6
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Triptych CSS
“regular” web
pages have nearly
equal distribution
of content over
each third of a
page
if a CSS is missing
AND > 75% of the
non-background
color is in the left
2/3s of the page,
then users
consider this
damaged
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
A Memento Damage Service, Python Library, and Docker Image
Erika Siregar
http://memento-damage.cs.odu.edu/
https://github.com/erikaris/web-memento-damage
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Just a Little Bit of Damage…
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Moderate Damage…
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Significant Damage…
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Ian’s GitHub Memento…
https://github.com/ianmilligan1/Historian-WARC-1
http://web.archive.org/web/20130922192416/https://github.com/ianmilligan1/Historian-WARC-1
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
… Has Slight Damage
does not
appear to
violate the
“75% / left-
2/3s” rule
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Capture Authenticity - Has this page been tampered with?
• The days of implicitly trusting Brewster
& IA are over
o the people who brought you
fake news will eventually bring you
fake archives
o “mo’ archives mo’ problems”
• Premise: use multiple, independent
archives to record fixity information
from dated observations of mementos
• Plans:
o blockchain
o provenance (i.e., a memento of
memento != 2 independent
mementos)
https://climate.nasa.gov/vital-signs/carbon-dioxide/
http://web.archive.org/web/20170312201332/https://climate.nasa.gov/vital-signs/carbon-dioxide/
http://localhost:8282/michael/wayback/20170313023607/https://climate.nasa.gov/vital-signs/carbon-dioxide/
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Push a Web Page into Multiple Archives
Mohamed Aturban (2017) Archive Now (archivenow): A Python Library to Integrate On-Demand
Archives http://ws-dl.blogspot.com/2017/02/2017-02-22-archive-now-archivenow.html
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Record Fixity in a Manifest File
Shawn Jones (2016) Mementos In the Raw, Take Two
http://ws-dl.blogspot.com/2016/08/2016-08-15-mementos-in-raw-take-two.html
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Publish Manifest to the Web
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Archive the Manifest
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
“You can’t tell the players without a scorecard” – Harry M. Stevens
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Verifying the Authenticity of a Memento
• Given a Memento, URI-M, that we wish to verify
• Lookup the URI-M at a manifest server
o e.g, captureproject.org/{URI-M}
• Discover all the mementos of the manifest, and verify their
integrity with “trusty URIs”
• For each URI-M listed in the manifest, repeat the fixity calculation
as described in the manifest
• Vote if fixity matches (not tampered with) or if fixity doesn’t match
(tampered with)
o Majority vote wins (assuming independent archives)
Mohamed Aturban (2017) Summary of "Trusty URIs: Verifiable, Immutable, and Permanent Digital
Artifacts for Linked Data” http://ws-dl.blogspot.com/2017/01/2017-01-15-summary-of-trusty-uris.html
Video at https://www.youtube.com/watch?v=EY15lj-7_lc
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Discussion
@hvdsomp, @phonedude_mln, @mart1nkle1n
CNI Spring 2017, Albuquerque, NM, 3 Apr 2017
Herbert Van de Sompel
Los Alamos National Laboratory @hvdsomp
http://orcid.org/0000-0002-0715-6126
Michael L. Nelson
Old Dominion University @phonedude_mln
http://orcid.org/0000-0003-3749-8116
Martin Klein
Los Alamos National Laboratory @mart1nkle1n
http://orcid.org/0000-0003-0130-2097
To the Rescue of the
Orphans of Scholarly Communication
The project is funded by the Andrew W. Mellon Foundation

To the Rescue of the Orphans of Scholarly Communication

  • 1.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 Herbert Van de Sompel Los Alamos National Laboratory @hvdsomp http://orcid.org/0000-0002-0715-6126 Michael L. Nelson Old Dominion University @phonedude_mln http://orcid.org/0000-0003-3749-8116 Martin Klein Los Alamos National Laboratory @mart1nkle1n http://orcid.org/0000-0003-0130-2097 To the Rescue of the Orphans of Scholarly Communication The project is funded by the Andrew W. Mellon Foundation
  • 2.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 • Problem statement Scholarly objects are everywhere on the web, and are not systematically archived • Project perspective Capturing objects using an institutional & web archiving paradigm • Object capture flow: • Step 1: Discovering a researcher’s web identities • Step 2: Discovering artifacts per web identity • Step 3: Determining the web boundary per artifact • Step 4: Capturing resources in the artifacts’ web boundary Outline
  • 3.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 Scholarship is Evolving • The research process, not just its outcome, is becoming visible … on the web • Massive extension of the scholarly record with an enormous variety of novel objects • The objects are heterogeneous, dynamic, compound, inter-related and distributed across the web • The objects are often hosted on common web platforms that are not dedicated to scholarship
  • 4.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 101 Innovations in Scholarly Communication Bianca Kramer & Jeroen Bosman. 101 Innovations in Scholarly Communication https://innoscholcomm.silk.co/
  • 5.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 The Evolving Scholarly Record Brian Lavoie et al. (2014) The Evolving Scholarly Record http://www.oclc.org/content/dam/research/publications/library/2014/oclcresearch-evolving-scholarly-
  • 6.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 Web Platforms Record Scholarship • Increasingly, common web platforms are used for scholarship • GitHub, Wikis, Wordpress, etc. • Many of these platforms have desirable characteristics • Versioning • Time stamping • Social embedding • But, these platforms record rather than archive Herbert Van de Sompel & Andrew Treloar (2014) A Perspective on Archiving the Scholarly Web http://public.lanl.gov/herbertv/papers/Papers/2014/iPres2014_Sompel_Treloar.pdf
  • 7.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 Recording is not Archiving “GitHub reserves the right at any time and from time to time to modify or discontinue, temporarily or permanently, the Service (or any part thereof) with or without notice.” GitHub Terms of Service http://help.github.com/articles/github-terms-of-service https://help.github.com/articles/github-terms-of-service/
  • 8.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 Recording is not Archiving GitHub Terms of Service http://help.github.com/articles/github-terms-of-service https://opensource.googleblog.com/2015/03/farewell-to-google-code.html
  • 9.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 Recording versus Archiving Recording Archiving Short-term Longer-term No guarantees provided Attempt to provide guarantees Write many/read many Write once/Read many Scholarly process Scholarly record Herbert Van de Sompel & Andrew Treloar (2014) A Perspective on Archiving the Scholarly Web http://public.lanl.gov/herbertv/papers/Papers/2014/iPres2014_Sompel_Treloar.pdf
  • 10.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 Meet Some New School Researchers Ian Milligan Mark Matienzo
  • 11.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 Meet Some New School Researchers Ian Milligan https://ianmilligan.ca/ https://www.slideshare.net/IanMilligan1 https://github.com/ianmilligan1
  • 12.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 Meet Some New School Researchers Mark Matienzo http://matienzo.org/ https://www.slideshare.net/anarchivist/presentations https://github.com/anarchivist https://osf.io/tgr4k/ https://www.drupal.org/user/380762
  • 13.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 SlideShare Artifact: 0 Mementos https://www.slideshare.net/IanMilligan1/resaw-geo-cities http://timetravel.mementoweb.org/list/20140513211653/https://www.slideshare.net/IanMilligan1/resa
  • 14.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 GitHub Artifact: 1 Memento https://github.com/ianmilligan1/Historian-WARC-1 http://web.archive.org/web/*/https://github.com/ianmilligan1/Historian-WARC-1
  • 15.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 The Scholarly Orphans Project • Funded by the Andrew W, Mellon Foundation • Los Alamos National Laboratory & New Mexico Consortium • Old Dominion University • 04/2016 - 03/2019 • How to capture scholarly orphans for long-term archiving? • Project explores a paradigm inspired by web archiving • Scale of the problem • Bilateral agreements with most web portals unlikely • Project explores an institution driven paradigm • Institution should be interested in capturing the artifacts its scholars deposit on the web
  • 16.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 An Institutional & Web Archiving Perspective
  • 17.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 Related Work • LOCKSS • Web crawling approach • Focused on journal literature • Archive-It • On-demand, subscription-based web archiving • Not focused on scholarly orphans • Institutional repository • Capture an institution’s output • Focused on manual upload (of journal literature) • The Locker Project • Capture an individual’s web presence • Not focused on scholarly orphans
  • 18.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 Capture Flow
  • 19.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 Capture Flow – Step 1
  • 20.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 Algorithmic Discovery of Web Identities James Powell et al. (2014) EgoSystem: Where are our alumni? In: code4lib http://journal.code4lib.org/articles/9519
  • 21.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 Discovery of Web Identities via a Registry: ORCID Martin Klein and Herbert Van de Sompel (2017) Discovering scholarly orphans using ORCID In: JCDL2017 https://arxiv.org/abs/1703.09343 Ian Milligan http://orcid.org/0000-0002-1470-7723 Mark Matienzo http://orcid.org/0000-0003-3270-1306
  • 22.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 Ian Milligan’s ORCID • Web Identities: 0 http://orcid.org/0000-0002-1470-7723
  • 23.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 Mark Matienzo’s ORCID • Web Identities: 3 (homepage, ScopusID, ResearcherID) http://orcid.org/0000-0003-3270-1306
  • 24.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 Mark Matienzo’s Home Page • URI to GitHub repository, Twitter • Could be included in ORCID profile http://matienzo.org/
  • 25.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 • Evaluation of ORCID for automatic discovery of Web Identities • How well does ORCID represent the global community of active researchers? • Adoption rate • Subject coverage • Geo-location coverage • How well does ORCID score when it comes to listing Web Identities? Discovery of Web Identities via a Registry: ORCID
  • 26.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 ORCID - Adoption Rate 2013 2014 2015 2016 05000001000000150000020000002500000 ORCIDs total ORCIDs with given names ORCIDs with first names ORCIDs with works ORCIDs with affiliations ORCIDs with web identities
  • 27.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 ORCID - Subject Coverage 0 10 20 30 40 50 60 Other Life Sciences Physical Sciences Mathematics and Computer Sciences Education Psychology and Social Sciences Engineering Humanities and Arts ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ORCID Subjects Ph.D. Researchers Publications
  • 28.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 ORCID - Geo-Location Coverage
  • 29.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 ORCID - Geo-Location Coverage
  • 30.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 ORCID - Geo-Location Coverage
  • 31.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 ORCID - Web Identities
  • 32.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 ORCID - Web Identities
  • 33.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 Discovery of Web Identities via a Registry: ORCID • Adoption rate is increasing • Subject coverage is focused, does not cover all disciplines equally • Geo-Location coverage is good but not quite representative • Web Identity coverage is poor; not usable for our purpose in its current state
  • 34.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 Capture Flow – Step 2
  • 35.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 Discovery of Artifacts per Web Identity • Algorithmic approach • Scrape artifacts from pages http://matienzo.org/publications/
  • 36.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 Discovery of Artifacts per Web Identity • Notifications • Subscribe to portal notifications about a researcher’s new artifacts https://www.slideshare.net/anarchivist/presentations
  • 37.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 Discovery of Artifacts per Web Identity • Artifact Registry • 5 artifacts of interest (standards document, reports, book reviews) http://orcid.org/0000-0003-3270-1306
  • 38.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 Capture Flow – Step 3
  • 39.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 Determination of Web Boundary per Artifact http://signposting.org
  • 40.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 HTTP Links Mark Nottingham (2010) RFC5988: Web Linking. http://tools.iets.org/rfc/rfc5988.txt
  • 41.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 HTTP Links
  • 42.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 HTTP Links
  • 43.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 Signposting - Publication Boundary Pattern http://signposting.org/publication_boundary/oxford/
  • 44.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 Signposting - Bibliographic Metadata Pattern http://signposting.org/bibliographic_metadata/springer/
  • 45.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 Capture Flow – Step 4
  • 46.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 • Legal • robots.txt • Licenses • Technical • Capture tools • Capture quality • Capture authenticity Challenges Regarding Capturing Web Artifacts
  • 47.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 Legal Challenges re Capturing Artifacts – A Wake-Up Call SlideShare • robots.txt unclear, some pages disallowed • License seems to prohibit archiving GitHub • robots.txt unclear, some pages disallowed • License seems to allow archiving Drupal • robots.txt allows relevant URIs • License seems to prohibit archiving Open Science Framework • robots.txt does not disallow crawlers • License does not mention archiving, individual content may have specific license
  • 48.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 Capture Tools Challenges: Mark’s SlideShare Live Internet Archive Webrecorder.io https://www.slideshare.net/anarchivist/to-hell-with-good-intentions-linked-data-and-the-power-to-name http://web.archive.org/web/20161229053246/http://www.slideshare.net/anarchivist/to-hell-with-good-intentions-linked-data-and-the-power-to-name https://webrecorder.io/martinklein/cni_test/20170330014029/https://www.slideshare.net/anarchivist/to-hell-with-good-intentions-linked-data-and- the-power-to-name
  • 49.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 Capture Tools Challenges: Mark’s GitHub Live Internet Archive Webrecorder.io https://github.com/rightsstatements/rightsstatements.github.io https://web.archive.org/web/20170328040646/https://github.com/rightsstatements/rightsstatements.github.io https://webrecorder.io/martinklein/cni_test/20170330014135/https://github.com/rightsstatements/rightsstatements.github.io
  • 50.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 Capture Tools Challenges: Mark’s OSF Live Internet Archive Webrecorder.io https://osf.io/h4ru8/wiki/home/ http://web.archive.org/web/20170328042647/https://osf.io/h4ru8/wiki/home/ https://webrecorder.io/martinklein/cni_test/20170330014219/https://osf.io/h4ru8/wiki/home/
  • 51.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 Capture Quality - How well was this page archived? • Continuing research on memento damage, first published at JCDL 2014 • Premise: simply reporting “9/10 embedded images were archived” is insufficient to describe how well the archive / replay system performed • Use heuristics from Mechanical Turk testing to approximate human conception of damage, e.g.: o increase weight of missing images that are large, or centered in the viewport o stylesheets can be important! check for “ugly” results J.F. Brunelle, M. Kelly, H. SalahEldeen M. C. Weigle, and M. L. Nelson (2014) Not all mementos are created equal: Measuring the impact of missing resources. In: JCDL 2014 http://dx.doi.org/10.1109/JCDL.2014.6970187 http://dx.doi.org/10.1007/s00799-015-0150-6
  • 52.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 Triptych CSS “regular” web pages have nearly equal distribution of content over each third of a page if a CSS is missing AND > 75% of the non-background color is in the left 2/3s of the page, then users consider this damaged
  • 53.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 A Memento Damage Service, Python Library, and Docker Image Erika Siregar http://memento-damage.cs.odu.edu/ https://github.com/erikaris/web-memento-damage
  • 54.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 Just a Little Bit of Damage…
  • 55.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 Moderate Damage…
  • 56.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 Significant Damage…
  • 57.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 Ian’s GitHub Memento… https://github.com/ianmilligan1/Historian-WARC-1 http://web.archive.org/web/20130922192416/https://github.com/ianmilligan1/Historian-WARC-1
  • 58.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 … Has Slight Damage does not appear to violate the “75% / left- 2/3s” rule
  • 59.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 Capture Authenticity - Has this page been tampered with? • The days of implicitly trusting Brewster & IA are over o the people who brought you fake news will eventually bring you fake archives o “mo’ archives mo’ problems” • Premise: use multiple, independent archives to record fixity information from dated observations of mementos • Plans: o blockchain o provenance (i.e., a memento of memento != 2 independent mementos) https://climate.nasa.gov/vital-signs/carbon-dioxide/ http://web.archive.org/web/20170312201332/https://climate.nasa.gov/vital-signs/carbon-dioxide/ http://localhost:8282/michael/wayback/20170313023607/https://climate.nasa.gov/vital-signs/carbon-dioxide/
  • 60.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 Push a Web Page into Multiple Archives Mohamed Aturban (2017) Archive Now (archivenow): A Python Library to Integrate On-Demand Archives http://ws-dl.blogspot.com/2017/02/2017-02-22-archive-now-archivenow.html
  • 61.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 Record Fixity in a Manifest File Shawn Jones (2016) Mementos In the Raw, Take Two http://ws-dl.blogspot.com/2016/08/2016-08-15-mementos-in-raw-take-two.html
  • 62.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 Publish Manifest to the Web
  • 63.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 Archive the Manifest
  • 64.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 “You can’t tell the players without a scorecard” – Harry M. Stevens
  • 65.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 Verifying the Authenticity of a Memento • Given a Memento, URI-M, that we wish to verify • Lookup the URI-M at a manifest server o e.g, captureproject.org/{URI-M} • Discover all the mementos of the manifest, and verify their integrity with “trusty URIs” • For each URI-M listed in the manifest, repeat the fixity calculation as described in the manifest • Vote if fixity matches (not tampered with) or if fixity doesn’t match (tampered with) o Majority vote wins (assuming independent archives) Mohamed Aturban (2017) Summary of "Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linked Data” http://ws-dl.blogspot.com/2017/01/2017-01-15-summary-of-trusty-uris.html Video at https://www.youtube.com/watch?v=EY15lj-7_lc
  • 66.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 Discussion
  • 67.
    @hvdsomp, @phonedude_mln, @mart1nkle1n CNISpring 2017, Albuquerque, NM, 3 Apr 2017 Herbert Van de Sompel Los Alamos National Laboratory @hvdsomp http://orcid.org/0000-0002-0715-6126 Michael L. Nelson Old Dominion University @phonedude_mln http://orcid.org/0000-0003-3749-8116 Martin Klein Los Alamos National Laboratory @mart1nkle1n http://orcid.org/0000-0003-0130-2097 To the Rescue of the Orphans of Scholarly Communication The project is funded by the Andrew W. Mellon Foundation