Digital Preservation: From Projects to Infrastructure Margaret Hedstrom University of Michigan
Outline of the Presentation Recent Developments in Digital Preservation Current Approaches and Solutions Infrastructure Requirements Bridging the Gaps Conclusion
Digital Preservation Challenges Keeping information alive and accessible in spite of changing technology Ensuring that information is credible and understandable so that it is not used inappropriately Sustaining information with an adequate flow of revenue over many decades
Emerging Standards  and Best Practices Framework and Models for Trusted Repositories Standards for Metadata and Data Formats Some Tools Managing Technology Dependencies
New Challenges Need for digital preservation repositories and services in new environments Scientific Data Entertainment and New Media Personal Archives Need for interoperability across repositories  Need for integration of data and publications
New Challenges Scalability of current methods Diversity of data, formats, production environments Quantity of ubiquitous data Appraisal and Selection Costs of digital preservation Need for approaches that generalize and scale gracefully
Moment of opportunity The pieces of a global network are falling into place Computation Communication Content Or are they? Diversity of content? Content exploitation? Comprehension? New knowledge generation?
What is missing? Comprehensive content Across disciplines, language, location Tools for analysis Sharing and exchange of content, data, results Acceleration in the generation of new knowledge Fundamental, not incremental, new discoveries Infrastructure to enable all of the above
Moving from Projects to Infrastructure Digital Preservation Projects have produced useful models, tools, and practices for specific types of content in specific environments How can we build on these projects and shift toward building digital preservation infrastructure?
What is infrastructure? Structures, systems, and social agreements that all allow disparate components of a system to work together on a grand scale. Effective  infrastructure allows people to interact with systems easily. Useful  infrastructure allows people to accomplish goals that would be impossible to achieve without it.
Digital Preservation  Infrastructure  Components Technical Aspects Interoperable hardware, software, and networking components Intellectual Components Interoperable metadata schema, ontologies, and knowledge representation Social Components Agreement on roles and responsibilities, incentives and rewards
Characteristics of Infrastructure Embeddedness Transparency Reach or scope Linked with conventions of practice  Embodiment of standards  Built on an installed base  Becomes visible upon breakdown Is fixed in modular increments, not all at once or globally Karen Ruhleder and Susan Leigh Star
Infrastructure  Requirements Local Source: Florence Millerand,  Cyberinfrastructure along social and technical dimensions  Technical Social Global Embodiment of Standards Reach/ Scope Links with conven- tions of practice Learned as part of membership Embedded- ness Build on an  Installed base Visible on breakdown Transparency
Infrastrcture: Some Concrete Examples The power system The transportation system
Cyber-infrastructure Initiatives Digital Projects and Digital Libraries [US] National Science Foundation (NSF) Blue Ribbon Panel on Cyberinfrastructure for Science and Engineering E-Science and Information Society Initiatives ACLS Commission on Cyberinfrastructure for Humanities and Social Science CASPAR Project
Identifying Gaps  Most digital preservation research and development is centered on repositories Architecture Metadata Tools Developments focus on the technical axis Many digital preservation efforts focus on activities within repositories Outreach to producers is limited to a subset of producer communities
Gaps in Infrastucture  Technical Social Global Embodiment of Standards Reach/ Scope Links with conven- tions of practice Learned as part of membership Embedded- ness Build on an  Installed base Visible on breakdown Transparency
Scope of OAIS Activities SIP  = Submission Information Package AIP =  Archival Information Package DIP =  Dissemination Information Package SIP Descriptive Info. AIP AIP DIP Administration P R O D U C E R C O N S U M E R queries result sets MANAGEMENT Ingest Access Data Management Archival Storage Descriptive Info. Preservation Planning orders
Repository-Centered View of Metadata Creation Primary Concern of Repository Developers Producer Consumer queries result sets orders OAIS Archival Information Packages Submission Information Packages Dissemination Information Packages
Identifying Gaps Interoperability between tools, standards and practices in producer communities and repository standards, tools and practices Two different workflows Data production Digital preservation
Identifying Gaps Social side of infrastructure Reaching into more producer communities Reaching more deeply into the data production process Provision for preservation becomes part of normal workflow Awareness and skill needed for preservation is learned as a part of collecting data, doing research, etc.
Bridging the Gaps How can we build infrastructure that unites the production of scientific data with long-term preservation? Technical Issues Tools the interoperate between production and preservation environments Workflows that begin in the production environment
Bridging the Gaps Social Issues Can we embed preservation awareness in the scientific production environment? Can we teach/learn good data practices as part of learning good research practice? Can we extend models of good practice from one lab to the next? One discipline to the next?
Conclusion Building digital preservation infrastructure will require: A long view of the information life cycle beginning at the point of creation (or before) Embedding digital preservation requirements into systems and tools for producing information Close attention to the fit between conventions of practice and preservation requirements

More Related Content

PPTX
e-infrastructural needs to support informatics
PPTX
CloudWatch2 Adoption Deep Dive
PDF
Standardization in Cloud/Cloud Computing
PPT
The Standards Dilemma - Digital Library Standards 2008
PPT
Distributed Scientific Collaboration: Research Opportunities in Citizen Science
PPTX
Work based-learning @MATEL workgroup
PPTX
SEAD: Sustainable Environment-Actionable Data - Robert McDonald - RDAP12
PPTX
Pistoia Alliance Debates: Moving Research Informatics into the Cloud: 25th Ma...
e-infrastructural needs to support informatics
CloudWatch2 Adoption Deep Dive
Standardization in Cloud/Cloud Computing
The Standards Dilemma - Digital Library Standards 2008
Distributed Scientific Collaboration: Research Opportunities in Citizen Science
Work based-learning @MATEL workgroup
SEAD: Sustainable Environment-Actionable Data - Robert McDonald - RDAP12
Pistoia Alliance Debates: Moving Research Informatics into the Cloud: 25th Ma...

What's hot (20)

PPTX
SPARC Repositories conference in Baltimore - Nov 2010
PDF
rons resume Final
PPT
Sustainability Training Workshop - Intro to the SSI
PPTX
The Pistoia Alliance: Update on Strategy and Progress
PPTX
Cyber Resilient Energy Delivery Consortium - Overview
PPT
Jane D. Johnson
PPT
Embedded Metadata and the Long Now
PPT
Short Introduction to weADAPT
PPT
Data management: expose, preserve, protect
PDF
FOSS4G 2021: Open source science
PPTX
Nikki Thurgate_The Multi-Scale Plot Network (MSPN): a network within a networ...
PDF
Sgci all-hands-9-16-16
PPT
Comparing Curricula for Digital Library and Digital Curation Education
PDF
WSSSPE: Building communities
PPTX
SGCI Science Gateways Landscape in North America
PPT
Intro: UBI-SERV
PPT
A future with no history meets a history with no future: how much do we need ...
PPTX
161129 tryggve-at niasc-biobanks
PPTX
Infrastructure for Supporting Computational Social Science
PPT
Laboratory Integration John Trigg
SPARC Repositories conference in Baltimore - Nov 2010
rons resume Final
Sustainability Training Workshop - Intro to the SSI
The Pistoia Alliance: Update on Strategy and Progress
Cyber Resilient Energy Delivery Consortium - Overview
Jane D. Johnson
Embedded Metadata and the Long Now
Short Introduction to weADAPT
Data management: expose, preserve, protect
FOSS4G 2021: Open source science
Nikki Thurgate_The Multi-Scale Plot Network (MSPN): a network within a networ...
Sgci all-hands-9-16-16
Comparing Curricula for Digital Library and Digital Curation Education
WSSSPE: Building communities
SGCI Science Gateways Landscape in North America
Intro: UBI-SERV
A future with no history meets a history with no future: how much do we need ...
161129 tryggve-at niasc-biobanks
Infrastructure for Supporting Computational Social Science
Laboratory Integration John Trigg
Ad

Viewers also liked (7)

PPTX
JCDL 2013 DOCTORAL CONSORTIUM
PPTX
Digitization Basics for Archives and Special Collections – Part 1: Select and...
PPTX
Making Sense of a Digital Collection
PDF
Jeff Rothenberg Digital Preservation Perspective
PDF
Standards and procedure in digitization and digital preservation
PPT
Digitalpreservation 120203055519-phpapp02
PPTX
Preservation and Access Can Coexist
JCDL 2013 DOCTORAL CONSORTIUM
Digitization Basics for Archives and Special Collections – Part 1: Select and...
Making Sense of a Digital Collection
Jeff Rothenberg Digital Preservation Perspective
Standards and procedure in digitization and digital preservation
Digitalpreservation 120203055519-phpapp02
Preservation and Access Can Coexist
Ad

Similar to Hedstrom Infrastructure (20)

PPT
Starting a Digital Preservation Program
PPT
A Framework for Geospatial Web Services for Public Health by Dr. Leslie Lenert
PPTX
NIH Data Summit - The NIH Data Commons
PPT
Cultivating Sustainable Software For Research
PPT
Cyberistructure
PPT
Information Services: Breaking down Departmental Silos
PDF
Sgci xsede-gateways-07-08-16
PDF
Skills Network Editor Big Data and data mining lesson.pdf
PDF
IoT system development.pdf
PPT
CI_for_NA
PDF
Bridging Gaps and Broadening Participation in Today's and Future Research Com...
PDF
Research Data Management, Challenges and Tools - Per Öster
PPT
Detailed Introduction to weADAPT
PPT
OAIS and It's Applicability for Libraries, Archives, and Digital Repositories...
PPTX
Building Data Ecosystems for Accelerated Discovery
PPTX
Australia's Environmental Predictive Capability
PPT
An introduction to weADAPT (2011)
PPTX
Madam Esther powerpoint corrected_094942.pptx
PPTX
Nectar cloud workshop ndj 20110331.2
PPTX
Meeting the NSF DMP Requirement June 13, 2012
Starting a Digital Preservation Program
A Framework for Geospatial Web Services for Public Health by Dr. Leslie Lenert
NIH Data Summit - The NIH Data Commons
Cultivating Sustainable Software For Research
Cyberistructure
Information Services: Breaking down Departmental Silos
Sgci xsede-gateways-07-08-16
Skills Network Editor Big Data and data mining lesson.pdf
IoT system development.pdf
CI_for_NA
Bridging Gaps and Broadening Participation in Today's and Future Research Com...
Research Data Management, Challenges and Tools - Per Öster
Detailed Introduction to weADAPT
OAIS and It's Applicability for Libraries, Archives, and Digital Repositories...
Building Data Ecosystems for Accelerated Discovery
Australia's Environmental Predictive Capability
An introduction to weADAPT (2011)
Madam Esther powerpoint corrected_094942.pptx
Nectar cloud workshop ndj 20110331.2
Meeting the NSF DMP Requirement June 13, 2012

Hedstrom Infrastructure

  • 1. Digital Preservation: From Projects to Infrastructure Margaret Hedstrom University of Michigan
  • 2. Outline of the Presentation Recent Developments in Digital Preservation Current Approaches and Solutions Infrastructure Requirements Bridging the Gaps Conclusion
  • 3. Digital Preservation Challenges Keeping information alive and accessible in spite of changing technology Ensuring that information is credible and understandable so that it is not used inappropriately Sustaining information with an adequate flow of revenue over many decades
  • 4. Emerging Standards and Best Practices Framework and Models for Trusted Repositories Standards for Metadata and Data Formats Some Tools Managing Technology Dependencies
  • 5. New Challenges Need for digital preservation repositories and services in new environments Scientific Data Entertainment and New Media Personal Archives Need for interoperability across repositories Need for integration of data and publications
  • 6. New Challenges Scalability of current methods Diversity of data, formats, production environments Quantity of ubiquitous data Appraisal and Selection Costs of digital preservation Need for approaches that generalize and scale gracefully
  • 7. Moment of opportunity The pieces of a global network are falling into place Computation Communication Content Or are they? Diversity of content? Content exploitation? Comprehension? New knowledge generation?
  • 8. What is missing? Comprehensive content Across disciplines, language, location Tools for analysis Sharing and exchange of content, data, results Acceleration in the generation of new knowledge Fundamental, not incremental, new discoveries Infrastructure to enable all of the above
  • 9. Moving from Projects to Infrastructure Digital Preservation Projects have produced useful models, tools, and practices for specific types of content in specific environments How can we build on these projects and shift toward building digital preservation infrastructure?
  • 10. What is infrastructure? Structures, systems, and social agreements that all allow disparate components of a system to work together on a grand scale. Effective infrastructure allows people to interact with systems easily. Useful infrastructure allows people to accomplish goals that would be impossible to achieve without it.
  • 11. Digital Preservation Infrastructure Components Technical Aspects Interoperable hardware, software, and networking components Intellectual Components Interoperable metadata schema, ontologies, and knowledge representation Social Components Agreement on roles and responsibilities, incentives and rewards
  • 12. Characteristics of Infrastructure Embeddedness Transparency Reach or scope Linked with conventions of practice Embodiment of standards Built on an installed base Becomes visible upon breakdown Is fixed in modular increments, not all at once or globally Karen Ruhleder and Susan Leigh Star
  • 13. Infrastructure Requirements Local Source: Florence Millerand, Cyberinfrastructure along social and technical dimensions Technical Social Global Embodiment of Standards Reach/ Scope Links with conven- tions of practice Learned as part of membership Embedded- ness Build on an Installed base Visible on breakdown Transparency
  • 14. Infrastrcture: Some Concrete Examples The power system The transportation system
  • 15. Cyber-infrastructure Initiatives Digital Projects and Digital Libraries [US] National Science Foundation (NSF) Blue Ribbon Panel on Cyberinfrastructure for Science and Engineering E-Science and Information Society Initiatives ACLS Commission on Cyberinfrastructure for Humanities and Social Science CASPAR Project
  • 16. Identifying Gaps Most digital preservation research and development is centered on repositories Architecture Metadata Tools Developments focus on the technical axis Many digital preservation efforts focus on activities within repositories Outreach to producers is limited to a subset of producer communities
  • 17. Gaps in Infrastucture Technical Social Global Embodiment of Standards Reach/ Scope Links with conven- tions of practice Learned as part of membership Embedded- ness Build on an Installed base Visible on breakdown Transparency
  • 18. Scope of OAIS Activities SIP = Submission Information Package AIP = Archival Information Package DIP = Dissemination Information Package SIP Descriptive Info. AIP AIP DIP Administration P R O D U C E R C O N S U M E R queries result sets MANAGEMENT Ingest Access Data Management Archival Storage Descriptive Info. Preservation Planning orders
  • 19. Repository-Centered View of Metadata Creation Primary Concern of Repository Developers Producer Consumer queries result sets orders OAIS Archival Information Packages Submission Information Packages Dissemination Information Packages
  • 20. Identifying Gaps Interoperability between tools, standards and practices in producer communities and repository standards, tools and practices Two different workflows Data production Digital preservation
  • 21. Identifying Gaps Social side of infrastructure Reaching into more producer communities Reaching more deeply into the data production process Provision for preservation becomes part of normal workflow Awareness and skill needed for preservation is learned as a part of collecting data, doing research, etc.
  • 22. Bridging the Gaps How can we build infrastructure that unites the production of scientific data with long-term preservation? Technical Issues Tools the interoperate between production and preservation environments Workflows that begin in the production environment
  • 23. Bridging the Gaps Social Issues Can we embed preservation awareness in the scientific production environment? Can we teach/learn good data practices as part of learning good research practice? Can we extend models of good practice from one lab to the next? One discipline to the next?
  • 24. Conclusion Building digital preservation infrastructure will require: A long view of the information life cycle beginning at the point of creation (or before) Embedding digital preservation requirements into systems and tools for producing information Close attention to the fit between conventions of practice and preservation requirements

Editor's Notes

  • #3: During my talk today I will discuss some of the new challenges in digital preservation. In particular, I will focus on new demands for long-term archiving of scientific data coming from the scientific community. The late Jim Gray, computer scientists and researcher at Microsoft, coined the phrase “the Fourth Paradigm” to describe the new revolution in the scientific method, where data-intensive science, allows researchers to mine existing data, find patterns, and discover emergent behaviors. This shift, along with other pressures, has created new demands to preserve scientific data for sharing and reuse. Government funding agencies increasingly require researchers to make their data available to other scientists and an increasing number of scientific journals insist that authors submit the raw data on which their findings are based as a condition of publication. In this talk, I will argue that many of the approaches we take to digital preservation that were designed to preserve the scientific and scholarly journal literature or were aimed at cultural heritage resources, need to be brought together into a deeper infrastructure for digital preservation in order to scale up to the demands of preserving scientific data. I will identify some of the gaps between current practice and infrastructure development and then suggest some ways on which our community might collaborate with the producers and consumers of scientific data to develop such an infrastructure for digital preservation.
  • #4: I am going to assume that the audience is somewhat familiar with the challenges of digital preservation generally, as these have not changes significant for several decades. The primary difference between preservation of traditional material and digital material is that we cannot separate the information we are preserving from a large technological and interpretive environment on which digital information depends. This requires different strategies from those that have worked in the physical environment and, at lease with the current state of practice, adds significantly to the costs. Because technology changes, digital information has to be kept in live systems that require a continuous stream of resources.
  • #5: During the last two decades, the preservation community has made significant advances in digital preservation. The most significant achievement was development and adoption of the Open Information Archival System (OAIS) Reference Model as an international standard. In addition to the Reference Model itself, there are models and tools for trusted repositories, a variety of metadata standards and standard data formats, and new software tools to manage digital archival repositories. The preservation community deserves a great deal of credit for drawing attention to these challenges, mobilizing resources for research and development, and deploying tools typically in an open manner.