Digital preservation and the
web: challenges for libraries
Corey Davis, Council of Prairie and Pacific University Libraries (COPPUL)
Digital Preservation Coordinator
The big challenge for all of us
“Much of our global
cultural heritage, and
our own individual and
social imprint, is at
serious risk of
disappearing.”
Richard S. Whitt, Corporate Director for
Strategic Initiatives at Google
Keepers “…represents
only about 20% of the
‘continuing resources’
and ‘integrated
resources’ having an
ISSN.”
http://library.ifla.org/121/1/
098-burnhill-en.pdf
Traditional library collections…
…and the early web
The web now…
1. With AJAX and HTML5, the web is transitioning from a document-
centric information space, to an applications-based information
space
2. Content is tailored to people, locations, and devices. There is often
no “canonical version” of a webpage anymore
Davis Digital Preservation and the Web: Challenges for Libraries
Davis Digital Preservation and the Web: Challenges for Libraries
Amnesiac civilization
• “HTML5, in effect, changes the language
of the Web from HTML to Javascript,
from a static document description
language to a programming language.”
• “I've been warning for some time that
one of the fundamental problems facing
digital preservation is the evolution of
content from static to dynamic.”
• http://blog.dshr.org/2011/08/moonalice-
plays-palo-alto.html
Current preservation services…
• Tend to focus on discrete objects or packages (PDFs, images, XML)
• And the creation of Archival Information Packages (AIPs)
• “I have always thought of the ‘autonomous AIP’ zipped up and held on a
storage device as an residue of paper-thinking.” Jon Tilbury, Preservica (Pasig-
discuss listserv)
Some examples of the challenges
of preserving dynamic web
content
The short tail and long tail
1. CNN http://cnn.com
2. Colonial Despatches https://bcgenesis.uvic.ca/
The short tail: CNN
• “CNN.com has been unarchivable since
2016-11-01T15:01:31”
• http://ws-dl.blogspot.ca/2017/01/2017-01-
20-cnncom-has-been-unarchivable.html
Davis Digital Preservation and the Web: Challenges for Libraries
January 20th, 2017, Inauguration Day
• “In short, the archival failure is caused
by changes CNN made to their CDN
(content delivery network); these
changes are reflected in the JavaScript
used to render the homepage.”
• John Berlin http://ws-
dl.blogspot.ca/2017/01/2017-01-20-
cnncom-has-been-unarchivable.html
The long tail:
Colonial Despatches
• “This digital archive contains the
original correspondence
between the British Colonial
Office and the colonies of
Vancouver Island and British
Columbia.”
• https://bcgenesis.uvic.ca/
Davis Digital Preservation and the Web: Challenges for Libraries
Davis Digital Preservation and the Web: Challenges for Libraries
Davis Digital Preservation and the Web: Challenges for Libraries
How can we address these
challenges together?
Working with the long-tail
• Major project at University of Victoria to explore the archiving of
dynamic, interactive websites in the digital humanities
• Working with information producers and developers to create
preservation-friendly applications
Selecting technologies for long-term survival
• “We have settled on building web applications which have
virtually no server-side requirements beyond response to
HTTP requests, but instead are based on client-side HTML5,
JavaScript and Cascading Style Sheets.”
• “Using these core standards, we are building completely
‘static’ websites which can actually function locally in any
current web browser, with no server at all, but which still
preserve virtually all of the appearance and functionality of
the original web applications they replace ”
• Martin Holmes, Programmer/Consultant, University of Victoria
Humanities Computing and Media Centre
Best practices for content creators: Distill.pub
• “A Distill article (at least
in its ideal, aspirational
form) isn’t just a paper.
It’s an interactive
medium that lets users
– ‘readers’ is no longer
sufficient – work
directly with machine
learning models.”
• http://distill.pub/about/
Distill.pub
Interactivity and preservation
• “Distill does an excellent job of publishing articles that use
interactivity to provide high-quality explanations … without sacrificing
preservability.”
• David Rosenthal http://blog.dshr.org/2017/05/distill-is-this-what-journals-
should.html
Capturing the dynamic
web: Webrecorder.io
• Developed by Rhizome for
preservation of interactive
online art
• Focus on dynamic web content
Academic publications and CLOCKSS
• Digital preservation
collaboration
between research
libraries and
publishers
• Working to develop
functionality to
harvest dynamic
content from
publishers’ websites
To sum up…
Significant issues
• Costs
• Dynamic content
• Presents significant technical and policy issues for preservation
• Scale
• A technical and financial issue
• Incentives
• Public policy could address some of this
• Proprietary information and DRM
• Copyright legislation for preservation not likely forthcoming
Collaboration is key
• Libraries need to work together
• Libraries and publishers and other content creators need to work
together
• Publishers can practice “preservation in place”
Thanks
• corey@coppul.ca
• @coreyleedavis

More Related Content

PDF
Corrado -- Establishing the Landscape
PPTX
Dr Natalie Harrower - DRI and Open Data
PPTX
Conrad "The experience of scholarly users: An introduction"
PDF
CAEPIA 2011
PPTX
User Focused Digital Library: A Practical Guide
PDF
digital libraries: the phoenix rises from the ashes
Corrado -- Establishing the Landscape
Dr Natalie Harrower - DRI and Open Data
Conrad "The experience of scholarly users: An introduction"
CAEPIA 2011
User Focused Digital Library: A Practical Guide
digital libraries: the phoenix rises from the ashes

What's hot (19)

PDF
Qatar Digital Library Project Workshop
PPTX
Digital fabrication as a library integrated service
ODP
New challenges for digital scholarship and curation in the era of ubiquitous ...
PPT
Planning and Managing Digital Library & Archive Projects
PDF
A distributed network of digital heritage information - Unesco/NDL India
PPTX
Next Steps for IMLS's National Digital Platform
PDF
A distributed network of digital heritage information - Semantics Amsterdam
PPTX
Towards long-term preservation of linked data - the PRELIDA project
ZIP
Linked Open Data in Libraries, Archives & Museums
PPTX
Ingrid Dillo - Digital humanities challenges and the Research Data Alliance
PDF
ICPC SOS workshop
PDF
Julian D. Richards - Open Data in European Archaeology
PDF
The role of a Socio-informatrician
PPT
Input friendly intranets
PPT
Global Networked Digital Environment: How Libraries Shape the Future
PPTX
Endings and new beginnings: update on the Jisc Content programme 2011-13
PPTX
Bl labs roadshow aab_open_university.2016
PPTX
BL Labs Roadshow 2016 - Digital Research Team
PPT
Practical steps towards digital preservation at institutional levels
Qatar Digital Library Project Workshop
Digital fabrication as a library integrated service
New challenges for digital scholarship and curation in the era of ubiquitous ...
Planning and Managing Digital Library & Archive Projects
A distributed network of digital heritage information - Unesco/NDL India
Next Steps for IMLS's National Digital Platform
A distributed network of digital heritage information - Semantics Amsterdam
Towards long-term preservation of linked data - the PRELIDA project
Linked Open Data in Libraries, Archives & Museums
Ingrid Dillo - Digital humanities challenges and the Research Data Alliance
ICPC SOS workshop
Julian D. Richards - Open Data in European Archaeology
The role of a Socio-informatrician
Input friendly intranets
Global Networked Digital Environment: How Libraries Shape the Future
Endings and new beginnings: update on the Jisc Content programme 2011-13
Bl labs roadshow aab_open_university.2016
BL Labs Roadshow 2016 - Digital Research Team
Practical steps towards digital preservation at institutional levels
Ad

Similar to Davis Digital Preservation and the Web: Challenges for Libraries (20)

PPT
Preservation for the Next Generation
PDF
Future-Proofing the Web: What We Can Do Today
PPT
An Introduction to digital preservation at the Library of Congress
PPTX
Digital preservation and curation of information.presentation
PPTX
DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising
PPT
Preservation of Web Resources: The JISC PoWR Project
PDF
Introduction to Web Archiving
PPTX
Web Preservation, or Managing your Organisation’s Online Presence After the O...
PPTX
Stability in the Midst of Change: Addressing Challenges for Digital Preservation
PPT
292 daniel dollar ssp yale_28_may2008
PPTX
Preservation for 21st Century Library Collections
PPT
280 eileen fenton presentation
PPT
Web 1.0, Web 2.0 and Digital Preservation
PPT
MS PowerPoint format
PPT
20yrs: 2007 Brussels Digital Preservation: Setting the Course for a Decade of...
PDF
Save This Book
PPT
Web Preservation in a Web 2.0 Environment
PPT
Digital Preservation and Social Media Outreach
PPT
Challenges for Web Resource Preservation, Marieke Guy, UKOLN
PDF
Ec Preservation Policies Manuela Speiser
Preservation for the Next Generation
Future-Proofing the Web: What We Can Do Today
An Introduction to digital preservation at the Library of Congress
Digital preservation and curation of information.presentation
DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising
Preservation of Web Resources: The JISC PoWR Project
Introduction to Web Archiving
Web Preservation, or Managing your Organisation’s Online Presence After the O...
Stability in the Midst of Change: Addressing Challenges for Digital Preservation
292 daniel dollar ssp yale_28_may2008
Preservation for 21st Century Library Collections
280 eileen fenton presentation
Web 1.0, Web 2.0 and Digital Preservation
MS PowerPoint format
20yrs: 2007 Brussels Digital Preservation: Setting the Course for a Decade of...
Save This Book
Web Preservation in a Web 2.0 Environment
Digital Preservation and Social Media Outreach
Challenges for Web Resource Preservation, Marieke Guy, UKOLN
Ec Preservation Policies Manuela Speiser
Ad

More from National Information Standards Organization (NISO) (20)

PPTX
Larry Bennett_ ALA Annual Convention 2025AL2 slides.pptx
PPTX
Potash "Our Journey & Vision for Accessible Content"
PPTX
O'Leary "Progress Assessment - How Far Are We from Delivery"
PPTX
Carpenter and O'Leary "Accessibility Standards and the Future of Inclusive Pu...
PPTX
Davidian "Transfer Code of Practice Standing Committee Update"
PPTX
Patham "NISO Open Discovery Initiative (ODI) Update"
PPTX
Hichliffe "A Standard Terminology for Peer Review"
PPTX
Levin "KBART RP Update at ALA Annual 2025"
PPTX
Carpenter "Advancing Infrastructure for Sustainable Collections: CCLP Project...
PPTX
Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...
PPTX
Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...
PDF
Carpenter "2025 NISO Annual Members Meeting"
PPTX
Allen "Social Marketing in Scholarly Communications"
PPTX
Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...
PDF
Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...
PDF
Pfeiffer "Secrets to Changing Behavior in Scholarly Communication: A 2025 NIS...
PPTX
Gilstrap "Accessibility Essentials: A 2025 NISO Training Series, Session 7, M...
PPTX
Turner "Accessibility Essentials: A 2025 NISO Training Series, Session 7, Lan...
PPTX
Comeford "Accessibility Essentials: A 2025 NISO Training Series, Session 7, A...
PPTX
Laverick and Richard "Accessibility Essentials: A 2025 NISO Training Series, ...
Larry Bennett_ ALA Annual Convention 2025AL2 slides.pptx
Potash "Our Journey & Vision for Accessible Content"
O'Leary "Progress Assessment - How Far Are We from Delivery"
Carpenter and O'Leary "Accessibility Standards and the Future of Inclusive Pu...
Davidian "Transfer Code of Practice Standing Committee Update"
Patham "NISO Open Discovery Initiative (ODI) Update"
Hichliffe "A Standard Terminology for Peer Review"
Levin "KBART RP Update at ALA Annual 2025"
Carpenter "Advancing Infrastructure for Sustainable Collections: CCLP Project...
Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...
Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...
Carpenter "2025 NISO Annual Members Meeting"
Allen "Social Marketing in Scholarly Communications"
Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...
Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...
Pfeiffer "Secrets to Changing Behavior in Scholarly Communication: A 2025 NIS...
Gilstrap "Accessibility Essentials: A 2025 NISO Training Series, Session 7, M...
Turner "Accessibility Essentials: A 2025 NISO Training Series, Session 7, Lan...
Comeford "Accessibility Essentials: A 2025 NISO Training Series, Session 7, A...
Laverick and Richard "Accessibility Essentials: A 2025 NISO Training Series, ...

Recently uploaded (20)

DOCX
Cambridge-Practice-Tests-for-IELTS-12.docx
PDF
Civil Department's presentation Your score increases as you pick a category
PDF
The TKT Course. Modules 1, 2, 3.for self study
PDF
1.Salivary gland disease.pdf 3.Bleeding and Clotting Disorders.pdf important
PDF
Farming Based Livelihood Systems English Notes
PDF
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
PDF
Journal of Dental Science - UDMY (2022).pdf
PDF
Journal of Dental Science - UDMY (2020).pdf
PDF
LIFE & LIVING TRILOGY - PART - (2) THE PURPOSE OF LIFE.pdf
PDF
International_Financial_Reporting_Standa.pdf
PDF
Skin Care and Cosmetic Ingredients Dictionary ( PDFDrive ).pdf
PPT
REGULATION OF RESPIRATION lecture note 200L [Autosaved]-1-1.ppt
PDF
Myanmar Dental Journal, The Journal of the Myanmar Dental Association (2013).pdf
PDF
Race Reva University – Shaping Future Leaders in Artificial Intelligence
PPTX
Education and Perspectives of Education.pptx
PDF
Fun with Grammar (Communicative Activities for the Azar Grammar Series)
PDF
AI-driven educational solutions for real-life interventions in the Philippine...
PPTX
What’s under the hood: Parsing standardized learning content for AI
PPTX
ELIAS-SEZIURE AND EPilepsy semmioan session.pptx
PDF
fundamentals-of-heat-and-mass-transfer-6th-edition_incropera.pdf
Cambridge-Practice-Tests-for-IELTS-12.docx
Civil Department's presentation Your score increases as you pick a category
The TKT Course. Modules 1, 2, 3.for self study
1.Salivary gland disease.pdf 3.Bleeding and Clotting Disorders.pdf important
Farming Based Livelihood Systems English Notes
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
Journal of Dental Science - UDMY (2022).pdf
Journal of Dental Science - UDMY (2020).pdf
LIFE & LIVING TRILOGY - PART - (2) THE PURPOSE OF LIFE.pdf
International_Financial_Reporting_Standa.pdf
Skin Care and Cosmetic Ingredients Dictionary ( PDFDrive ).pdf
REGULATION OF RESPIRATION lecture note 200L [Autosaved]-1-1.ppt
Myanmar Dental Journal, The Journal of the Myanmar Dental Association (2013).pdf
Race Reva University – Shaping Future Leaders in Artificial Intelligence
Education and Perspectives of Education.pptx
Fun with Grammar (Communicative Activities for the Azar Grammar Series)
AI-driven educational solutions for real-life interventions in the Philippine...
What’s under the hood: Parsing standardized learning content for AI
ELIAS-SEZIURE AND EPilepsy semmioan session.pptx
fundamentals-of-heat-and-mass-transfer-6th-edition_incropera.pdf

Davis Digital Preservation and the Web: Challenges for Libraries

  • 1. Digital preservation and the web: challenges for libraries Corey Davis, Council of Prairie and Pacific University Libraries (COPPUL) Digital Preservation Coordinator
  • 2. The big challenge for all of us
  • 3. “Much of our global cultural heritage, and our own individual and social imprint, is at serious risk of disappearing.” Richard S. Whitt, Corporate Director for Strategic Initiatives at Google
  • 4. Keepers “…represents only about 20% of the ‘continuing resources’ and ‘integrated resources’ having an ISSN.” http://library.ifla.org/121/1/ 098-burnhill-en.pdf
  • 7. The web now… 1. With AJAX and HTML5, the web is transitioning from a document- centric information space, to an applications-based information space 2. Content is tailored to people, locations, and devices. There is often no “canonical version” of a webpage anymore
  • 10. Amnesiac civilization • “HTML5, in effect, changes the language of the Web from HTML to Javascript, from a static document description language to a programming language.” • “I've been warning for some time that one of the fundamental problems facing digital preservation is the evolution of content from static to dynamic.” • http://blog.dshr.org/2011/08/moonalice- plays-palo-alto.html
  • 11. Current preservation services… • Tend to focus on discrete objects or packages (PDFs, images, XML) • And the creation of Archival Information Packages (AIPs) • “I have always thought of the ‘autonomous AIP’ zipped up and held on a storage device as an residue of paper-thinking.” Jon Tilbury, Preservica (Pasig- discuss listserv)
  • 12. Some examples of the challenges of preserving dynamic web content
  • 13. The short tail and long tail 1. CNN http://cnn.com 2. Colonial Despatches https://bcgenesis.uvic.ca/
  • 14. The short tail: CNN • “CNN.com has been unarchivable since 2016-11-01T15:01:31” • http://ws-dl.blogspot.ca/2017/01/2017-01- 20-cnncom-has-been-unarchivable.html
  • 16. January 20th, 2017, Inauguration Day
  • 17. • “In short, the archival failure is caused by changes CNN made to their CDN (content delivery network); these changes are reflected in the JavaScript used to render the homepage.” • John Berlin http://ws- dl.blogspot.ca/2017/01/2017-01-20- cnncom-has-been-unarchivable.html
  • 18. The long tail: Colonial Despatches • “This digital archive contains the original correspondence between the British Colonial Office and the colonies of Vancouver Island and British Columbia.” • https://bcgenesis.uvic.ca/
  • 22. How can we address these challenges together?
  • 23. Working with the long-tail • Major project at University of Victoria to explore the archiving of dynamic, interactive websites in the digital humanities • Working with information producers and developers to create preservation-friendly applications
  • 24. Selecting technologies for long-term survival • “We have settled on building web applications which have virtually no server-side requirements beyond response to HTTP requests, but instead are based on client-side HTML5, JavaScript and Cascading Style Sheets.” • “Using these core standards, we are building completely ‘static’ websites which can actually function locally in any current web browser, with no server at all, but which still preserve virtually all of the appearance and functionality of the original web applications they replace ” • Martin Holmes, Programmer/Consultant, University of Victoria Humanities Computing and Media Centre
  • 25. Best practices for content creators: Distill.pub • “A Distill article (at least in its ideal, aspirational form) isn’t just a paper. It’s an interactive medium that lets users – ‘readers’ is no longer sufficient – work directly with machine learning models.” • http://distill.pub/about/
  • 27. Interactivity and preservation • “Distill does an excellent job of publishing articles that use interactivity to provide high-quality explanations … without sacrificing preservability.” • David Rosenthal http://blog.dshr.org/2017/05/distill-is-this-what-journals- should.html
  • 28. Capturing the dynamic web: Webrecorder.io • Developed by Rhizome for preservation of interactive online art • Focus on dynamic web content
  • 29. Academic publications and CLOCKSS • Digital preservation collaboration between research libraries and publishers • Working to develop functionality to harvest dynamic content from publishers’ websites
  • 31. Significant issues • Costs • Dynamic content • Presents significant technical and policy issues for preservation • Scale • A technical and financial issue • Incentives • Public policy could address some of this • Proprietary information and DRM • Copyright legislation for preservation not likely forthcoming
  • 32. Collaboration is key • Libraries need to work together • Libraries and publishers and other content creators need to work together • Publishers can practice “preservation in place”