SlideShare a Scribd company logo
Challenges for the
    New Era
       Diane I. Hillmann
Metadata Management Associates
     Oslo, February 8, 2013
Big Challenges/Big Ideas
     O Changing our thinking from records to
        statements
            O Will RDA help?
     O Where you start affects where you end up
     O Shifting our ways from ROI to Potlach
     O Recognizing that our human resources
       are limited
     O So how do we manage this data-that-isn‟t
       records?

Oslo 2013                                      2/7/13   2
Statements and Records
     O Records are still important but not as
        we‟ve used them in the past
            O We might want to think about records as
              the instantiation of a point of view
            O News: traditional library data has a point of
              view
     O MARC required consensus because of
        limitations built into the technology
            O Now we need provenance, so we know
              “Who said?”
Oslo 2013                                              2/7/13   3
Building RDVocab: Goals

     O Bridge the XML and RDF worlds
     O Ensure ability to map between RDA and
       other element sets
     O Provide a sound platform for extension of
       RDA Vocabularies into new and
       specialized domains
     O Consider methods for expressing AACR2
       structures in technical ways to ease the
       pain of transition to RDA

Oslo 2013                                    2/7/13   4
RDVocab Structure, Simplified
     O RDA Properties declared in two separate
        hierarchies:
            O An „unconstrained‟ vocabulary, with no explicit
              relationship to FRBR entities
            O A subset of classes, properties and
              subproperties with FRBR entities as „domains‟
     O Pros: retained usability in or out of
       libraries; better mapping to/from non-
       FRBR vocabularies
     O Cons: still seems too complex to many
       SemWeb implementers (many using
       BIBO)
Oslo 2013                                                 2/7/13   5
Why Unconstrained
                   Properties?
     O The „bounded‟ properties should be seen as
        the official JSC-defined RDA Application
        Profile for libraries
            O What‟s still lacking is the addition of the necessary
              constraints: datatypes, cardinality, associated
              value vocabularies
     O Extensions and mapping should be built from
        the unconstrained properties
            O Unconstrained vocabularies necessary for use in
              domains where FRBR not assumed or
              inappropriate
            O Mapping from vocabularies not using the FRBR
              model directly to ones that do (and back) creates
              serious problems for the „Web of Data‟
Oslo 2013                                                       2/7/13   6
Property (Generalized, no FRBR
                                                 Semantic
                  relationship)
                                                   Web

    Subproperty (with relationship to one FRBR
                       entity)


                                                  FRBR Entity




                                           The Simple Case:
            Library Applications            One Property--
                                           One FRBR Entity
Oslo 2013                                            2/7/13     7
Property (Generalized, no FRBR
                  relationship)
                                                    Semantic
                                                      Web

            Subproperty (with relationship to one
                       FRBR entity)
                                                    FRBR Entity
            Subproperty (with relationship to one
                       FRBR entity)
                                                    FRBR Entity




            Library Applications The Not-So-Simple Case:
                                 One Property—more than
Oslo 2013                            One FRBR Entity 8
                                               2/7/13
Roles: Attributes or
                Properties?
     O In 2005, the DC Usage Board worked with LC
       to build a formal representation of the MARC
       Relators so that these terms could be used
       with DC
        O This work provided a template for the
          registration of the role terms in RDA (in
          Appendix I) and, by extension, the other
          RDA relationships
     O Role and relationship properties are
       registered at the same level as elements,
       rather than as attributes (as MARC does with
       relators, and RDA does in its XML schemas)

Oslo 2013                                       2/7/13   9
Vocabulary Extension
     O The inclusion of unconstrained properties
        provides a path for extension of RDA into
        specialized library communities and non-
        library communities
         O They may have a different notion of how
            FRBR „aggregates‟ (For example, a
            colorized version of a film may be viewed
            as a separate work)
         O They may not wish to use FRBR at all
         O They may have additional, domain-specific
            properties to add, that could benefit from a
            relationship to the RDA properties
Oslo 2013                                           2/7/13   10
RDA:adaptedAs




            RDA:adaptedAsARadioScript




Oslo 2013                               2/7/13   11
RDA:adaptedAs

                 RDA:adaptedAsARadioScri
                 pt


      KidLit:adaptedAsAPictureBo
      ok
       Extension using Unconstrained Properties

Oslo 2013                              2/7/13   12
RDA:adapted
 As
                  RDA:adaptedAsARadioScr
                  ipt


               KidLit:adaptedAsAPictureBo
               ok

KidLit:adaptedAsAChapterBo
ok
      Extension using Unconstrained Properties
Oslo 2013                            2/7/13   13
Where you start affects where you
              end up
     O Simple metadata is more useful as output
        than input
            O The „long tail‟ of MARC‟s lesser used
              properties was built up over decades and
              shouldn‟t be discarded
     O Easier to dumb down than smarten up
     O Dublin Core and MARC examples of
       starting simple and trying to add on
     O Distribution models are important
Oslo 2013                                             2/7/13   14
Values vs. Costs
     O Machines cost less than people, but they
        can‟t replace people
            O Computers tend to require instructions
              from people to work well
            O But they are more consistent than people!
     O ROI culture vs. Potlatch culture
            O Is „who pays for this?‟ the right question?




Oslo 2013                                              2/7/13   15
The Management Conundrum
     O Traditional ILS‟s haven‟t worked for us for
        a long time
            O They were built to create and manage
              catalog data
            O We can no longer invest in the catalog
              paradigm
     O Libraries are data builders, data
        managers, data distributors
            O The centralized, master record model is as
              dead as MARC encoding
Oslo 2013                                              2/7/13   16
Challenges for a new era
Linked Data is Inherently
                Chaotic
     O Requires creating and aggregating data in
        a broader context
            O There is no one „correct‟ record to be made
              from this, no objective „truth‟
     O This approach is different from the
        cataloging tradition
            O BUT, the focus on vocabularies is familiar
     O In the SemWeb world vocabularies are
        more complex than the thesauri we know
Oslo 2013                                             2/7/13   18
Model of ‘the World’ /XML
     O XML assumes a 'closed' world (domain),
        usually defined by a schema:
         O "We know all of the data describing this
           resource. The single description must
           be a valid document according to our
           schema. The data must be valid.”
         O XML's document model provides
            a neat equivalence to a
            metadata 'record‟ (and most of
            us are fairly comfortable with it)
Oslo 2013                                      2/7/13   19
Model of ‘the World’ /RDF
     O RDF assumes an 'open' world:
       O "There's an infinite amount of unknown
             data describing this resource yet to be
             discovered. It will come from an infinite
             number of providers. There will be an
             infinite number of descriptions. Those
             descriptions must be consistent."
            O RDF's statement-oriented data
             model has no notion of 'record‟
             (rather, statements can be
             aggregated for a fuller description of
             a resource)
Oslo 2013                                                2/7/13   20
The New Management
                  Strategy
     O Statement level rather than record level
         management
     O   Emphasis on evaluation coming in and
         provenance going out
     O   Shift in human effort from creating standard
         cataloging to knowledgeable human
         intervention in machine-based processes
     O   Extensive use of data created outside libraries
     O   Intelligent re-use of our legacy data

Oslo 2013                                           2/7/13   21
Is MARC Dead?
O The communication format is very dead (based on
  standards no longer updated)
O The semantics are not dead
   O They represent the distillation of decades of
     descriptive experience
   O As we move into a more machine-assisted world,
     our old concerns about the size of our legacy can
     be addressed
   O Taking the legacy records with us should be based
     on solutions developed using open and transparent
     strategies
What’s our Distribution
                    Model?
                               We know more about
        We don’t know what    what you want than you
       you want, so choose!        do. Here it is!




Oslo 2013                                     2/7/13   23
Libraries as Data Publishers & Consumers

O Data from library „publishers‟ should look like a
  supermarket—lots of choices, with decisions made
  by consumers
   O Right now we seem to be operating as Soviet
     bakeries
   O This is not what open linked data is supposed to be
     doing for us
O "Be conservative in what you send, liberal in what
  you accept”—Robustness Principle
Our Goals as Data Publishers
O If we want people outside libraries to use our
  data, we need offer them choices
O This strategy is based on mapping all of our
  legacy data
   O Not a selection
   O Filtering accomplished by data consumers, who
     know best what they need
O This requires active innovation and a new
  understanding of how to manage the data
Our Goals as Data Consumers
     O As aggregators of relevant metadata content
            O Developing methods to gather and redistribute
              without necessarily re-creating OCLC
     O Modeling and documenting best practices in
        metadata creation, improvement and
        exposure
            O Application profiles important in this effort
     O As developers of vocabularies exposing a
       variety of bibliographic relationships
     O As innovators in using social networks to
       enhance bibliographic description
Oslo 2013                                                     2/7/13   26
Mapping Legacy Data for Re-distribution

O If we want data consumers to value our data, we
 should map it all
   O We can distribute limited „flavors‟ as well, as we
     gain experience and feedback
O Current mapping strategies are based on
   O One-time, inflexible, programmatic methods that
     effectively hide the process from consumers
   O Assumptions that data must be improved at the
     time it is mapped, or never
Oslo 2013   2/7/13   28
Oslo 2013   2/7/13   29
If we don‟t distribute our best data,
            how can anybody do cool stuff with it?



                                         Isn‟t that what
                                            we want?




                          We can use the cool stuff ourselves!
Oslo 2013                                       2/7/13   30
Oslo 2013   2/7/13   31
Oslo 2013   2/7/13   32
Oslo 2013   2/7/13   33
Harvest/Ingest Plan
     O Choosing data sources
            O There are known sources out there, some
              of them are of good quality, others are
              usable, with improvement
     O Tools are needed to help pull data,
        validate it, cache it, and set it up for
        evaluation
            O Most of these tasks can/should be set up
              with automated processes, with alerts to
              human minders when something goes
Oslo 2013     wrong                                  2/7/13   34
Oslo 2013   2/7/13   35
Metadata Evaluation
     O Evaluation needs to scale well beyond
       random sampling
     O Statistical and data mining tools need to
       be brought into the process, to provide
       both „overview‟ and specifics of whole
       data sets
     O Improvement specifications, techniques,
       quality criteria and tools need to be
       iterative, granular, and shareable
Oslo 2013                                      2/7/13   36
Oslo 2013   2/7/13   37
Testing, Monitoring & Re-
                    evaluation
     O Data will change, and processes must be able
        to detect that, based on data profiles
            O Human intervention should be limited
     O Tools need to be built so that non-
        programmers can run them
            O Reading logs, monitoring error reports,
              checking results, writing specs, can/should be
              done by data specialists (a.k.a. catalogers
              w/training)
            O Looking for opportunities for programmers and
              catalogers to learn together is essential!
Oslo 2013                                               2/7/13   38
Oslo 2013   2/7/13   39
Re-distribution Plan
     O If we improve data, we need to expose
        how we did it (and what we did), for the
        use of downstream consumers
            O New metadata provenance efforts are
              designed to do this at the statement level
     O This strategy can only exist successfully
       where open licenses allow innovation and
       wide re-use
     O Ideally, distribution AND redistribution
       should be accomplished with Application
       Profiles
Oslo 2013                                             2/7/13   40
Will This Shift Cost Too
                    Much?
     O It‟s the human effort that costs us
            O Cost of traditional cataloging is far too high, for
              increasingly dubious value
     O Our current investments have reached the
        end of their usefulness
            O All the possible efficiencies for traditional
              cataloging have already been accomplished
     O Waiting for leadership from the big players
       costs us valuable time with no guarantees of
       results
     O We need to figure out how to invest in more
       distributed innovation and focused
       collaboration
Oslo 2013                                                     2/7/13   41
What About the Millions?
     O Our legacy MARC data is already a
       „graph‟, but the resources defined there
       have no internet resolvable identity
     O But even the transcribed text can be
       hugely valuable, with effort and software
       to help
            O Projects like the eXtensible Catalog have
              made an excellent start in demonstrating
              this point
     O MARC 21 is already available as basic
Oslo 2013                                            2/7/13   42
        RDF
The Bottom Line
            O Our big investment is (and has always
              been) in our data, not our systems
            O Over many changes in format of
              materials, we‟ve always struggled to keep
              our focus on the data content that
              endures, regardless of presentation
              format
            O We are in a great position to have
              influence on how the future develops, but
              we can‟t be afraid to change, or afraid to
                                                       2/7/13
              fail
Oslo 2013                                                       43
Contact info:
            metadata.maven
             @gmail.com



                 Metadata
                 Matters:
            http://managemetadata.c
                     om/blog




Oslo 2013                             44

More Related Content

PPTX
Linked data presentation to AALL 2012 boston
Diane Hillmann
 
PDF
Representing Translations on the Semantic Web
Oscar Corcho
 
PPTX
Managerial ethics (types of managerial ethics)
cidroypaes
 
PPT
Ch 5 social responsibility and managerial ethics
Nardin A
 
PPTX
DCMI/RDA Task Group Report, DC-2010 Pittsburgh
Diane Hillmann
 
PDF
RDF, RDA, and other TLAs
Dorothea Salo
 
PPTX
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
National Information Standards Organization (NISO)
 
PPTX
UNC visit
Diane Hillmann
 
Linked data presentation to AALL 2012 boston
Diane Hillmann
 
Representing Translations on the Semantic Web
Oscar Corcho
 
Managerial ethics (types of managerial ethics)
cidroypaes
 
Ch 5 social responsibility and managerial ethics
Nardin A
 
DCMI/RDA Task Group Report, DC-2010 Pittsburgh
Diane Hillmann
 
RDF, RDA, and other TLAs
Dorothea Salo
 
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
National Information Standards Organization (NISO)
 
UNC visit
Diane Hillmann
 

Similar to Challenges for a new era (20)

PPTX
Beyond the catalogue : BibFrame, Linked Data and Ending the Invisible Library
Ksenija Mincic Obradovic
 
PDF
FRBR, FRAD and RDA I don't speak cataloging why should I care
Deann Trebbe
 
PPTX
Intro to the semantic web (for libraries)
robin fay
 
PPTX
Libraries and Linked Data: Looking to the Future (2)
ALATechSource
 
PPTX
Fundamental Relationship of Bibliography Resource
nk5876167
 
PDF
Sharing data on the web (2013)
3 Round Stones
 
PDF
FRBR and RDA
Dorothea Salo
 
PDF
Clara RDA Training 1
ClaraLiao
 
ZIP
Intro to Linked Open Data in Libraries, Archives & Museums
Jon Voss
 
PDF
One Big Happy Family
Dan Brickley
 
PDF
Dublin Core: What is left to do?
knowledge Technology Week
 
PDF
"What is left to do?", Dublin Core 2012 Keynote
Dan Brickley
 
PPTX
The RDA Vocabularies: What They Are, How They Work
Diane Hillmann
 
PPTX
Diane Hillmann: RDA Vocabularies in the Semantic Web
ALATechSource
 
PDF
20111120 warsaw learning curve by b hyland notes
Bernadette Hyland-Wood
 
PDF
When?
Dan Brickley
 
ZIP
Linked Open Data in Libraries, Archives & Museums
Jon Voss
 
PPTX
Semantic Web and Related Work at W3C
Ivan Herman
 
PPTX
RDA & the New World of Metadata
Diane Hillmann
 
Beyond the catalogue : BibFrame, Linked Data and Ending the Invisible Library
Ksenija Mincic Obradovic
 
FRBR, FRAD and RDA I don't speak cataloging why should I care
Deann Trebbe
 
Intro to the semantic web (for libraries)
robin fay
 
Libraries and Linked Data: Looking to the Future (2)
ALATechSource
 
Fundamental Relationship of Bibliography Resource
nk5876167
 
Sharing data on the web (2013)
3 Round Stones
 
FRBR and RDA
Dorothea Salo
 
Clara RDA Training 1
ClaraLiao
 
Intro to Linked Open Data in Libraries, Archives & Museums
Jon Voss
 
One Big Happy Family
Dan Brickley
 
Dublin Core: What is left to do?
knowledge Technology Week
 
"What is left to do?", Dublin Core 2012 Keynote
Dan Brickley
 
The RDA Vocabularies: What They Are, How They Work
Diane Hillmann
 
Diane Hillmann: RDA Vocabularies in the Semantic Web
ALATechSource
 
20111120 warsaw learning curve by b hyland notes
Bernadette Hyland-Wood
 
Linked Open Data in Libraries, Archives & Museums
Jon Voss
 
Semantic Web and Related Work at W3C
Ivan Herman
 
RDA & the New World of Metadata
Diane Hillmann
 
Ad

More from Diane Hillmann (20)

PPTX
RDA and Linked Data: where's the beef
Diane Hillmann
 
PPTX
RDA: Alive and Well and Still Speaking MARC
Diane Hillmann
 
PPTX
Vocabulary Development for Local Use: A DIY Introduction
Diane Hillmann
 
PPTX
What Can We Do About Our Legacy Data?
Diane Hillmann
 
PPTX
Moving to an open world
Diane Hillmann
 
PPTX
Why change?
Diane Hillmann
 
PPTX
Versioning for Authorities, presentation at Midwinter Chicago 2015
Diane Hillmann
 
PPTX
RDA as linked data (RDA Forum)
Diane Hillmann
 
PPTX
What's goin' on?
Diane Hillmann
 
PPTX
Playing with Jane
Diane Hillmann
 
PPTX
What is an RDA Record?
Diane Hillmann
 
PPTX
Oregon State visit 2011
Diane Hillmann
 
PPTX
The Other Side of Linked Open Data: Managing Metadata Aggregation
Diane Hillmann
 
PPTX
Mapmakers
Diane Hillmann
 
PPTX
A Consideration of Library Holdings in the World Beyond MARC
Diane Hillmann
 
PDF
Maps & gaps: strategies for vocabulary design and development
Diane Hillmann
 
PPT
NISO Bibliographic Roadmap Meeting Proposal
Diane Hillmann
 
PDF
Lossless MARC Mapping
Diane Hillmann
 
PPTX
New World of Metadata: Growing, Shifting, Merging
Diane Hillmann
 
PPTX
Managing statements
Diane Hillmann
 
RDA and Linked Data: where's the beef
Diane Hillmann
 
RDA: Alive and Well and Still Speaking MARC
Diane Hillmann
 
Vocabulary Development for Local Use: A DIY Introduction
Diane Hillmann
 
What Can We Do About Our Legacy Data?
Diane Hillmann
 
Moving to an open world
Diane Hillmann
 
Why change?
Diane Hillmann
 
Versioning for Authorities, presentation at Midwinter Chicago 2015
Diane Hillmann
 
RDA as linked data (RDA Forum)
Diane Hillmann
 
What's goin' on?
Diane Hillmann
 
Playing with Jane
Diane Hillmann
 
What is an RDA Record?
Diane Hillmann
 
Oregon State visit 2011
Diane Hillmann
 
The Other Side of Linked Open Data: Managing Metadata Aggregation
Diane Hillmann
 
Mapmakers
Diane Hillmann
 
A Consideration of Library Holdings in the World Beyond MARC
Diane Hillmann
 
Maps & gaps: strategies for vocabulary design and development
Diane Hillmann
 
NISO Bibliographic Roadmap Meeting Proposal
Diane Hillmann
 
Lossless MARC Mapping
Diane Hillmann
 
New World of Metadata: Growing, Shifting, Merging
Diane Hillmann
 
Managing statements
Diane Hillmann
 
Ad

Challenges for a new era

  • 1. Challenges for the New Era Diane I. Hillmann Metadata Management Associates Oslo, February 8, 2013
  • 2. Big Challenges/Big Ideas O Changing our thinking from records to statements O Will RDA help? O Where you start affects where you end up O Shifting our ways from ROI to Potlach O Recognizing that our human resources are limited O So how do we manage this data-that-isn‟t records? Oslo 2013 2/7/13 2
  • 3. Statements and Records O Records are still important but not as we‟ve used them in the past O We might want to think about records as the instantiation of a point of view O News: traditional library data has a point of view O MARC required consensus because of limitations built into the technology O Now we need provenance, so we know “Who said?” Oslo 2013 2/7/13 3
  • 4. Building RDVocab: Goals O Bridge the XML and RDF worlds O Ensure ability to map between RDA and other element sets O Provide a sound platform for extension of RDA Vocabularies into new and specialized domains O Consider methods for expressing AACR2 structures in technical ways to ease the pain of transition to RDA Oslo 2013 2/7/13 4
  • 5. RDVocab Structure, Simplified O RDA Properties declared in two separate hierarchies: O An „unconstrained‟ vocabulary, with no explicit relationship to FRBR entities O A subset of classes, properties and subproperties with FRBR entities as „domains‟ O Pros: retained usability in or out of libraries; better mapping to/from non- FRBR vocabularies O Cons: still seems too complex to many SemWeb implementers (many using BIBO) Oslo 2013 2/7/13 5
  • 6. Why Unconstrained Properties? O The „bounded‟ properties should be seen as the official JSC-defined RDA Application Profile for libraries O What‟s still lacking is the addition of the necessary constraints: datatypes, cardinality, associated value vocabularies O Extensions and mapping should be built from the unconstrained properties O Unconstrained vocabularies necessary for use in domains where FRBR not assumed or inappropriate O Mapping from vocabularies not using the FRBR model directly to ones that do (and back) creates serious problems for the „Web of Data‟ Oslo 2013 2/7/13 6
  • 7. Property (Generalized, no FRBR Semantic relationship) Web Subproperty (with relationship to one FRBR entity) FRBR Entity The Simple Case: Library Applications One Property-- One FRBR Entity Oslo 2013 2/7/13 7
  • 8. Property (Generalized, no FRBR relationship) Semantic Web Subproperty (with relationship to one FRBR entity) FRBR Entity Subproperty (with relationship to one FRBR entity) FRBR Entity Library Applications The Not-So-Simple Case: One Property—more than Oslo 2013 One FRBR Entity 8 2/7/13
  • 9. Roles: Attributes or Properties? O In 2005, the DC Usage Board worked with LC to build a formal representation of the MARC Relators so that these terms could be used with DC O This work provided a template for the registration of the role terms in RDA (in Appendix I) and, by extension, the other RDA relationships O Role and relationship properties are registered at the same level as elements, rather than as attributes (as MARC does with relators, and RDA does in its XML schemas) Oslo 2013 2/7/13 9
  • 10. Vocabulary Extension O The inclusion of unconstrained properties provides a path for extension of RDA into specialized library communities and non- library communities O They may have a different notion of how FRBR „aggregates‟ (For example, a colorized version of a film may be viewed as a separate work) O They may not wish to use FRBR at all O They may have additional, domain-specific properties to add, that could benefit from a relationship to the RDA properties Oslo 2013 2/7/13 10
  • 11. RDA:adaptedAs RDA:adaptedAsARadioScript Oslo 2013 2/7/13 11
  • 12. RDA:adaptedAs RDA:adaptedAsARadioScri pt KidLit:adaptedAsAPictureBo ok Extension using Unconstrained Properties Oslo 2013 2/7/13 12
  • 13. RDA:adapted As RDA:adaptedAsARadioScr ipt KidLit:adaptedAsAPictureBo ok KidLit:adaptedAsAChapterBo ok Extension using Unconstrained Properties Oslo 2013 2/7/13 13
  • 14. Where you start affects where you end up O Simple metadata is more useful as output than input O The „long tail‟ of MARC‟s lesser used properties was built up over decades and shouldn‟t be discarded O Easier to dumb down than smarten up O Dublin Core and MARC examples of starting simple and trying to add on O Distribution models are important Oslo 2013 2/7/13 14
  • 15. Values vs. Costs O Machines cost less than people, but they can‟t replace people O Computers tend to require instructions from people to work well O But they are more consistent than people! O ROI culture vs. Potlatch culture O Is „who pays for this?‟ the right question? Oslo 2013 2/7/13 15
  • 16. The Management Conundrum O Traditional ILS‟s haven‟t worked for us for a long time O They were built to create and manage catalog data O We can no longer invest in the catalog paradigm O Libraries are data builders, data managers, data distributors O The centralized, master record model is as dead as MARC encoding Oslo 2013 2/7/13 16
  • 18. Linked Data is Inherently Chaotic O Requires creating and aggregating data in a broader context O There is no one „correct‟ record to be made from this, no objective „truth‟ O This approach is different from the cataloging tradition O BUT, the focus on vocabularies is familiar O In the SemWeb world vocabularies are more complex than the thesauri we know Oslo 2013 2/7/13 18
  • 19. Model of ‘the World’ /XML O XML assumes a 'closed' world (domain), usually defined by a schema: O "We know all of the data describing this resource. The single description must be a valid document according to our schema. The data must be valid.” O XML's document model provides a neat equivalence to a metadata 'record‟ (and most of us are fairly comfortable with it) Oslo 2013 2/7/13 19
  • 20. Model of ‘the World’ /RDF O RDF assumes an 'open' world: O "There's an infinite amount of unknown data describing this resource yet to be discovered. It will come from an infinite number of providers. There will be an infinite number of descriptions. Those descriptions must be consistent." O RDF's statement-oriented data model has no notion of 'record‟ (rather, statements can be aggregated for a fuller description of a resource) Oslo 2013 2/7/13 20
  • 21. The New Management Strategy O Statement level rather than record level management O Emphasis on evaluation coming in and provenance going out O Shift in human effort from creating standard cataloging to knowledgeable human intervention in machine-based processes O Extensive use of data created outside libraries O Intelligent re-use of our legacy data Oslo 2013 2/7/13 21
  • 22. Is MARC Dead? O The communication format is very dead (based on standards no longer updated) O The semantics are not dead O They represent the distillation of decades of descriptive experience O As we move into a more machine-assisted world, our old concerns about the size of our legacy can be addressed O Taking the legacy records with us should be based on solutions developed using open and transparent strategies
  • 23. What’s our Distribution Model? We know more about We don’t know what what you want than you you want, so choose! do. Here it is! Oslo 2013 2/7/13 23
  • 24. Libraries as Data Publishers & Consumers O Data from library „publishers‟ should look like a supermarket—lots of choices, with decisions made by consumers O Right now we seem to be operating as Soviet bakeries O This is not what open linked data is supposed to be doing for us O "Be conservative in what you send, liberal in what you accept”—Robustness Principle
  • 25. Our Goals as Data Publishers O If we want people outside libraries to use our data, we need offer them choices O This strategy is based on mapping all of our legacy data O Not a selection O Filtering accomplished by data consumers, who know best what they need O This requires active innovation and a new understanding of how to manage the data
  • 26. Our Goals as Data Consumers O As aggregators of relevant metadata content O Developing methods to gather and redistribute without necessarily re-creating OCLC O Modeling and documenting best practices in metadata creation, improvement and exposure O Application profiles important in this effort O As developers of vocabularies exposing a variety of bibliographic relationships O As innovators in using social networks to enhance bibliographic description Oslo 2013 2/7/13 26
  • 27. Mapping Legacy Data for Re-distribution O If we want data consumers to value our data, we should map it all O We can distribute limited „flavors‟ as well, as we gain experience and feedback O Current mapping strategies are based on O One-time, inflexible, programmatic methods that effectively hide the process from consumers O Assumptions that data must be improved at the time it is mapped, or never
  • 28. Oslo 2013 2/7/13 28
  • 29. Oslo 2013 2/7/13 29
  • 30. If we don‟t distribute our best data, how can anybody do cool stuff with it? Isn‟t that what we want? We can use the cool stuff ourselves! Oslo 2013 2/7/13 30
  • 31. Oslo 2013 2/7/13 31
  • 32. Oslo 2013 2/7/13 32
  • 33. Oslo 2013 2/7/13 33
  • 34. Harvest/Ingest Plan O Choosing data sources O There are known sources out there, some of them are of good quality, others are usable, with improvement O Tools are needed to help pull data, validate it, cache it, and set it up for evaluation O Most of these tasks can/should be set up with automated processes, with alerts to human minders when something goes Oslo 2013 wrong 2/7/13 34
  • 35. Oslo 2013 2/7/13 35
  • 36. Metadata Evaluation O Evaluation needs to scale well beyond random sampling O Statistical and data mining tools need to be brought into the process, to provide both „overview‟ and specifics of whole data sets O Improvement specifications, techniques, quality criteria and tools need to be iterative, granular, and shareable Oslo 2013 2/7/13 36
  • 37. Oslo 2013 2/7/13 37
  • 38. Testing, Monitoring & Re- evaluation O Data will change, and processes must be able to detect that, based on data profiles O Human intervention should be limited O Tools need to be built so that non- programmers can run them O Reading logs, monitoring error reports, checking results, writing specs, can/should be done by data specialists (a.k.a. catalogers w/training) O Looking for opportunities for programmers and catalogers to learn together is essential! Oslo 2013 2/7/13 38
  • 39. Oslo 2013 2/7/13 39
  • 40. Re-distribution Plan O If we improve data, we need to expose how we did it (and what we did), for the use of downstream consumers O New metadata provenance efforts are designed to do this at the statement level O This strategy can only exist successfully where open licenses allow innovation and wide re-use O Ideally, distribution AND redistribution should be accomplished with Application Profiles Oslo 2013 2/7/13 40
  • 41. Will This Shift Cost Too Much? O It‟s the human effort that costs us O Cost of traditional cataloging is far too high, for increasingly dubious value O Our current investments have reached the end of their usefulness O All the possible efficiencies for traditional cataloging have already been accomplished O Waiting for leadership from the big players costs us valuable time with no guarantees of results O We need to figure out how to invest in more distributed innovation and focused collaboration Oslo 2013 2/7/13 41
  • 42. What About the Millions? O Our legacy MARC data is already a „graph‟, but the resources defined there have no internet resolvable identity O But even the transcribed text can be hugely valuable, with effort and software to help O Projects like the eXtensible Catalog have made an excellent start in demonstrating this point O MARC 21 is already available as basic Oslo 2013 2/7/13 42 RDF
  • 43. The Bottom Line O Our big investment is (and has always been) in our data, not our systems O Over many changes in format of materials, we‟ve always struggled to keep our focus on the data content that endures, regardless of presentation format O We are in a great position to have influence on how the future develops, but we can‟t be afraid to change, or afraid to 2/7/13 fail Oslo 2013 43
  • 44. Contact info: metadata.maven @gmail.com Metadata Matters: http://managemetadata.c om/blog Oslo 2013 44

Editor's Notes

  • #5: Some conflicts here
  • #10: This way of handling roles should be usable in either XML or RDF (essential for RDF)
  • #15: Smörgåsbord Norway it is called koldtbord
  • #28: This practice makes it difficult for community members to effectively respond to decisions made behind the curtain or to contribute to better maps
  • #42: (a la the open source OPAC replacements)