Repositories thru the looking glass Andy Powell Eduserv Foundation [email_address] www.eduserv.org.uk
There are many methods for predicting the future. For example, you can read horoscopes, tea leaves, tarot cards, or crystal balls. Collectively, these methods are known as “nutty methods.” Or you can put well-researched facts into sophisticated computer models, more commonly referred to as “a complete waste of time”. Dilbert
Either that wallpaper goes or I do. Oscar Wilde’s last words
 
some background…
 
The DCMI Abstract Model a set of rules defining how DC metadata descriptions are constructed A description is made up of one or more statements … Each statement instantiates a property/value pair and is made up of …  … Each value string is a simple, human-readable string … … a set of human-readable statements (as per above) also formalised using UML
The DCMI Abstract Model independent of particular syntaxes but descriptions that comply with the model can be encoded using any of the recognised DCMI encodings i.e. XHTML, XML and RDF simple largely based on resource, property, value triple formally mapped to the RDF model highly extensible
The DCMI Abstract Model record (encoded as HTML, XML or RDF/XML) description set description (about a resource (URI)) statement property (URI) value (URI) vocabulary encoding scheme (URI) value string language (e.g. en-GB) syntax encoding scheme (URI)
The DCMI Abstract Model  relationships between the  descriptions  in a  description set  and the  resources  being described made explicit oddly, most metadata standards do  not  do this DC application profiles now start by defining which set of resources are being described… … then assigning the set of properties and so on that will be used to describe them
E.g. an application profile for CDs start with the set of  entities  that we want to describe and the key  relationships  between those entities e.g. a CD collection entity/relationship model… then define a set of properties for each collection CD artist owner record label owned by contained in created by released by
JISC  Information  Environment
 
are  we heading in the right direction?
open  access not ‘if’ but ‘when’
3  issues …
issue #1 have we got our terminology right?
a university-based  institutional repository  is a set of services that a university offers to the members of its community for the  management and dissemination of digital materials  created by the institution and its community members. It is most essentially an organizational commitment to the  stewardship of these digital materials, including long-term preservation where appropriate, as well as organization and access or distribution . … An institutional repository is  not simply a fixed set of software and hardware (Cliff Lynch, 2003)
a focus on ‘ making content available on the Web’  would be more intuitive to researchers
a focus on ‘ content management ’ would change our emphasis OAI-PMH out… search engine optimisation, usability, accessibility, tagging, information architecture, cool URIs in…
issue #2 service oriented vs. resource oriented
REST = Representational State Transfer an architectural style with a focus on resources, their identifiers (e.g. URIs), and a simple uniform set of operations that each resource supports (e.g. GET, PUT, POST, DELETE)
issue #3 national vs. global
The impact of Web 2.0 prosumer remote apps social API diffusion concentration
 
thinking about the future…
what would a Web 2.0 repository look like? potential impact of the Semantic Web on repositories
what would a Web 2.0 repository look like? potential impact of the Semantic Web on repositories
 
high-quality browser-based document viewer (not Acrobat!) tagging, commentary, more-like-this, favorites, … persistent (cool) URIs to content ability to form simple social groups ability to embed documents in other Web sites high visibility to Google offer RSS as primary API use of Amazon S3 to cope with scalability
a Web 2.0 repository would be a global service global concentration is an enabler of social interaction
But… they don’t do preservation they don’t handle complex workflows they don’t expose rich metadata yes, scholarly communication has some particular functional requirements which are not met by Google… author searching, citation counting, object complexity not handled well by the current Web how are these requirements best met? thru richer metadata?
what would a Web 2.0 repository look like? potential impact of the Semantic Web on repositories
SWAP The Scholarly Works Application Profile
A model based on FRBR Functional Requirements for Bibliographic Records a model for the entities that bibliographic records are intended to describe FRBR models the world using 4 key entities Work, Expression, Manifestation and Item
FRBR and scholarly works FRBR is a useful model in the context of scholarly works (eprints) because it allows us to answer questions like what is the URL of the most appropriate copy (an item) of the PDF format (a manifestation) of the pre-print version (an expression) for this eprint (the work)? are these two copies related? if so, how?
FRBR for scholarly works The eprint as a scholarly work Author’s Original 1.0 Author’s Original 1.1 Version of Record (French) html pdf publisher’s copy institutional repository copy scholarly work (work) version (expression) format (manifestation) copy (item) … Version of Record (English)
SWAP application profile model ScholarlyWork Expression 0..∞ isExpressedAs Manifestation isManifestedAs 0..∞ Copy isAvailableAs 0..∞ 0..∞ 0..∞ isCreatedBy isPublishedBy 0..∞ isEditedBy 0..∞ isFundedBy isSupervisedBy AffiliatedInstitution Agent
SWAP and FRBR ScholarlyWork FRBR Work FRBR Expression FRBR Manifestation FRBR Item Expression 0..∞ isExpressedAs Manifestation isManifestedAs 0..∞ Copy isAvailableAs 0..∞ 0..∞ 0..∞ isCreatedBy isPublishedBy 0..∞ isEditedBy 0..∞ isFundedBy isSupervisedBy AffiliatedInstitution Agent
SWAP and FRBR ScholarlyWork the eprint (an abstract concept) the ‘version of record’ or the ‘french version’ or ‘ version 2.1’ the PDF format of the version of record the publisher’s copy of the PDF … the author or the publisher Expression 0..∞ isExpressedAs Manifestation isManifestedAs 0..∞ Copy isAvailableAs 0..∞ 0..∞ 0..∞ isCreatedBy isPublishedBy 0..∞ isEditedBy 0..∞ isFundedBy isSupervisedBy AffiliatedInstitution Agent
Attributes the application model defines the entities and relationships each entity needs to be described using an agreed set of attributes
Example attributes ScholarlyWork: title subject abstract affiliated institution identifier Agent: name type of agent date of birth mailbox homepage identifier Expression: title date available status version number language genre / type copyright holder bibliographic citation identifier Manifestation: format date modified Copy: date available access rights licence identifier
time to reflect?
Repositories what can we learn from Web 2.0? user interface design matters global ‘concentration’ is an enabler of social interaction simple DC is both too simple and too complex richer DC application profiles such as SWAP may be a way forward but need to ensure that their use does not over-complicate user interfaces and workflows
Open Access in policy terms - talking about the aim, “ making content available on the Web ” more compelling and intuitive than talking about the objective, “ putting content in repositories ”
more generally… resource orientation REST Semantic Web Web architecture … are important  digital libraries ignore them at their peril
Thank you images by eNil, Poppyseed Bandits, m o d e, striatic, estherase, Gen Kanai, //bwr - Hieronymus Karl Frederick, dullhunk,  Today is a good day (all @Flickr), and yours truly

Repositories thru the looking glass

  • 1.
    Repositories thru thelooking glass Andy Powell Eduserv Foundation [email_address] www.eduserv.org.uk
  • 2.
    There are manymethods for predicting the future. For example, you can read horoscopes, tea leaves, tarot cards, or crystal balls. Collectively, these methods are known as “nutty methods.” Or you can put well-researched facts into sophisticated computer models, more commonly referred to as “a complete waste of time”. Dilbert
  • 3.
    Either that wallpapergoes or I do. Oscar Wilde’s last words
  • 4.
  • 5.
  • 6.
  • 7.
    The DCMI AbstractModel a set of rules defining how DC metadata descriptions are constructed A description is made up of one or more statements … Each statement instantiates a property/value pair and is made up of … … Each value string is a simple, human-readable string … … a set of human-readable statements (as per above) also formalised using UML
  • 8.
    The DCMI AbstractModel independent of particular syntaxes but descriptions that comply with the model can be encoded using any of the recognised DCMI encodings i.e. XHTML, XML and RDF simple largely based on resource, property, value triple formally mapped to the RDF model highly extensible
  • 9.
    The DCMI AbstractModel record (encoded as HTML, XML or RDF/XML) description set description (about a resource (URI)) statement property (URI) value (URI) vocabulary encoding scheme (URI) value string language (e.g. en-GB) syntax encoding scheme (URI)
  • 10.
    The DCMI AbstractModel relationships between the descriptions in a description set and the resources being described made explicit oddly, most metadata standards do not do this DC application profiles now start by defining which set of resources are being described… … then assigning the set of properties and so on that will be used to describe them
  • 11.
    E.g. an applicationprofile for CDs start with the set of entities that we want to describe and the key relationships between those entities e.g. a CD collection entity/relationship model… then define a set of properties for each collection CD artist owner record label owned by contained in created by released by
  • 12.
    JISC Information Environment
  • 13.
  • 14.
    are weheading in the right direction?
  • 15.
    open accessnot ‘if’ but ‘when’
  • 16.
  • 17.
    issue #1 havewe got our terminology right?
  • 18.
    a university-based institutional repository is a set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members. It is most essentially an organizational commitment to the stewardship of these digital materials, including long-term preservation where appropriate, as well as organization and access or distribution . … An institutional repository is not simply a fixed set of software and hardware (Cliff Lynch, 2003)
  • 19.
    a focus on‘ making content available on the Web’ would be more intuitive to researchers
  • 20.
    a focus on‘ content management ’ would change our emphasis OAI-PMH out… search engine optimisation, usability, accessibility, tagging, information architecture, cool URIs in…
  • 21.
    issue #2 serviceoriented vs. resource oriented
  • 22.
    REST = RepresentationalState Transfer an architectural style with a focus on resources, their identifiers (e.g. URIs), and a simple uniform set of operations that each resource supports (e.g. GET, PUT, POST, DELETE)
  • 23.
  • 24.
    The impact ofWeb 2.0 prosumer remote apps social API diffusion concentration
  • 25.
  • 26.
  • 27.
    what would aWeb 2.0 repository look like? potential impact of the Semantic Web on repositories
  • 28.
    what would aWeb 2.0 repository look like? potential impact of the Semantic Web on repositories
  • 29.
  • 30.
    high-quality browser-based documentviewer (not Acrobat!) tagging, commentary, more-like-this, favorites, … persistent (cool) URIs to content ability to form simple social groups ability to embed documents in other Web sites high visibility to Google offer RSS as primary API use of Amazon S3 to cope with scalability
  • 31.
    a Web 2.0repository would be a global service global concentration is an enabler of social interaction
  • 32.
    But… they don’tdo preservation they don’t handle complex workflows they don’t expose rich metadata yes, scholarly communication has some particular functional requirements which are not met by Google… author searching, citation counting, object complexity not handled well by the current Web how are these requirements best met? thru richer metadata?
  • 33.
    what would aWeb 2.0 repository look like? potential impact of the Semantic Web on repositories
  • 34.
    SWAP The ScholarlyWorks Application Profile
  • 35.
    A model basedon FRBR Functional Requirements for Bibliographic Records a model for the entities that bibliographic records are intended to describe FRBR models the world using 4 key entities Work, Expression, Manifestation and Item
  • 36.
    FRBR and scholarlyworks FRBR is a useful model in the context of scholarly works (eprints) because it allows us to answer questions like what is the URL of the most appropriate copy (an item) of the PDF format (a manifestation) of the pre-print version (an expression) for this eprint (the work)? are these two copies related? if so, how?
  • 37.
    FRBR for scholarlyworks The eprint as a scholarly work Author’s Original 1.0 Author’s Original 1.1 Version of Record (French) html pdf publisher’s copy institutional repository copy scholarly work (work) version (expression) format (manifestation) copy (item) … Version of Record (English)
  • 38.
    SWAP application profilemodel ScholarlyWork Expression 0..∞ isExpressedAs Manifestation isManifestedAs 0..∞ Copy isAvailableAs 0..∞ 0..∞ 0..∞ isCreatedBy isPublishedBy 0..∞ isEditedBy 0..∞ isFundedBy isSupervisedBy AffiliatedInstitution Agent
  • 39.
    SWAP and FRBRScholarlyWork FRBR Work FRBR Expression FRBR Manifestation FRBR Item Expression 0..∞ isExpressedAs Manifestation isManifestedAs 0..∞ Copy isAvailableAs 0..∞ 0..∞ 0..∞ isCreatedBy isPublishedBy 0..∞ isEditedBy 0..∞ isFundedBy isSupervisedBy AffiliatedInstitution Agent
  • 40.
    SWAP and FRBRScholarlyWork the eprint (an abstract concept) the ‘version of record’ or the ‘french version’ or ‘ version 2.1’ the PDF format of the version of record the publisher’s copy of the PDF … the author or the publisher Expression 0..∞ isExpressedAs Manifestation isManifestedAs 0..∞ Copy isAvailableAs 0..∞ 0..∞ 0..∞ isCreatedBy isPublishedBy 0..∞ isEditedBy 0..∞ isFundedBy isSupervisedBy AffiliatedInstitution Agent
  • 41.
    Attributes the applicationmodel defines the entities and relationships each entity needs to be described using an agreed set of attributes
  • 42.
    Example attributes ScholarlyWork:title subject abstract affiliated institution identifier Agent: name type of agent date of birth mailbox homepage identifier Expression: title date available status version number language genre / type copyright holder bibliographic citation identifier Manifestation: format date modified Copy: date available access rights licence identifier
  • 43.
  • 44.
    Repositories what canwe learn from Web 2.0? user interface design matters global ‘concentration’ is an enabler of social interaction simple DC is both too simple and too complex richer DC application profiles such as SWAP may be a way forward but need to ensure that their use does not over-complicate user interfaces and workflows
  • 45.
    Open Access inpolicy terms - talking about the aim, “ making content available on the Web ” more compelling and intuitive than talking about the objective, “ putting content in repositories ”
  • 46.
    more generally… resourceorientation REST Semantic Web Web architecture … are important digital libraries ignore them at their peril
  • 47.
    Thank you imagesby eNil, Poppyseed Bandits, m o d e, striatic, estherase, Gen Kanai, //bwr - Hieronymus Karl Frederick, dullhunk, Today is a good day (all @Flickr), and yours truly