From Open Access to Open
Standards, (Linked) Data
and Collaborations
Simeon Warner https://orcid.org/0000-0002-7970-7855
(Director of IT for Library Linked Data and Repository
Architecture, Cornell University Library, USA)
National Library of Finland Kirjastoverkkopäivät (Library
Network Days), Helsinki, Finland, 2017-10-25
How?
•  This was xxx.lanl.gov, now known as
arXiv.org
•  I worked in a narrow field
•  Everyone posted to one place
•  It was a newish field
•  (I was perhaps happy to not read
widely enough)
arXiv submissions
https://arxiv.org/help/stats/2016_by_area/index
New submission rate,
color = subject
Fraction of total rate
for each subject area
What have we learned?
•  Researchers are happy to use e-prints
•  E-print repositories can scale
•  Cost is low ($10-15/article)
•  Some moderation necessary
•  Not very disruptive to journal
publishing (in physics)
Demonstrates substrate for article distribution
supporting overlay, but there has not been
significant adoption of overlap model
All primary (scientific)
research outputs
should be openly
accessible
Why?
Because research will be
done more effectively if
all shoulders are
available to stand on
SCOAP3 contract values
Preprint tipping point?
•  arXiv “next generation” funding from Sloan
and Heising-Simons foundations
•  BioRxiv finding from Chen-Zuckerberg
•  ASAPbio initiative funded by
Sloan, Moore, Arnold and Simons
foundations
•  ...
New abcXiv and acquisitions
Overlap & competition
Open standards
for repository
data harvesting
Long long ago,
when XML was hard,
Unicode was merely one
possible character set,
a big hard drive was 10GB,
and HotBot & AltaVista
had a new competitor...
... it was1999 and the UPS meeting in
Santa Fe aimed to
“... identify technologies to stimulate
the adoption of the concept of [Open
Access] author self-archived systems in
scholarly communication; theorize a
framework for the integration of e-
print services in the academic
document system ...”
https://www.openarchives.org/meetings/SantaFe1999/ups-invitation-ori.htm
Thus was born OAI-PMH
v1.0 2001,
v1.1 2002,
v2.0 2003
OAI-PMH was great!
•  It works
•  Scales to millions of items
•  Easy to implement (good s/w libraries)
•  XML, which brought UTF-8 for good
multi-language support (hurrah!)
•  Widely deployed, stable since 2003 (v2.0)
•  Registries & validators
•  Community & documentation
BASE harvests
>5000 sources
>112M documents
Technical deficiencies
•  Not RESTful
•  Repository-centric
•  XML metadata only
•  Metadata is wrapped
•  Dynamic set membership bug
"Currently, OAI-PMH is the only
behavior that is uniformly exposed by
most repositories.
[But], its focus on metadata, its pull-
based paradigm, and its technological
roots that date back to the web of the
nineties put it at odds with ... current
web technologies."
COAR Next Generation Repositories
http://comment.coar-repositories.org/2-next-generation-repositories/
Photo by drivethrucafe CC BY-SA
https://www.flickr.com/photos/128758398@N07/15836296662
Google Scholar
is great, but
not the answer
Replacement with no gap
We need a new approach that:
•  Meets existing OAI-PMH use cases
•  Supports content as well as metadata
•  Scales better
•  Follows web standards
•  Is modern and developer friendly
Push-me pull-you
many items / sources
low latency / efficiency
=> push/notification
modest size
low barrier
=> pull
ResourceSync
ANSI/NISO Z39.99-2017
Sitemaps +
•  multiple sets
•  fixity
•  links
•  changes only
•  dumps
Also supports Notifications (push) as
optional extension
CORE
>6000 journals
>2400 repositories
>77M articles
(>6M full text)
metadata +
content
Slide from Petr Knoth / CORE – DPLAfest 2017 presentation -- https://goo.gl/vz3zuJ
Tested with
resync client. 20
x 25MB sitemaps,
1M items ✔
Repository
Harvesting
Conclusions
The repository
community should
agree on a
common new
approach to
harvesting
ResourceSync was
designed to meet
Repository prescription
•  Metadata and content should be web
resources
o  stable URIs, follow web standards, not hidden
behind query interfaces
•  Support ResourceSync as the primary
harvesting interface
o  see e.g.
http://hydrainabox.projecthydra.org/2017/06/22/
resourcesync.html
o  OAI-PMH as secondary where necessary
•  Distinguish and relate metadata and content
entries
Person
identifiers and
ORCID
Some of my person ids
http://orcid.org/0000-0002-7970-7855
http://www.isni.org/isni/0000000351311901
http://www.researcherid.com/rid/E-2423-2011
https://www.scopus.com/authid/detail.uri?
authorId=7103063073
https://arxiv.org/a/warner_s_1
http://vivo.cornell.edu/display/individual24416
https://github.com/zimeon
http://zimeon.com/me
ORCID
ISNI
VIAF
Scope: 8-20M active,
+2-4M/year ?
Now: 3.2M
Scope: ?M
Now: 9M
Scope: ?M
Now: 6M
Scopes and scales
Why must ORCID be different?
How many people should have ORCID iDs?
o  UNESCO 2013 estimate: 7.8 million researchers
o  OECD 2014 estimate: 25.5 million researchers
o  Average “active lifetime” 3-6 years (guess)
o  Far more than person records in authority systems
How many research and scholarship outputs
should be connected to these ORCID iDs?
o  ~2 million journal articles published per year
(https://arxiv.org/abs/1402.4578)
o  + >> more if notions of scholarly output extend to
data, code, specimens
Ø  “Sort it all out after the fact with manual effort”
solution not practical
Ø  Solve with researcher engagement and use in
publication workflows
ORCID: Open Researcher
and Contributor ID
“ORCID’s vision is a world where all who participate in
research, scholarship, and innovation are uniquely
identified and connected to their contributions across
disciplines, borders, and time.”
“ORCID provides an identifier for individuals to use with
their name as they engage in research, scholarship, and
innovation activities. We provide open tools that enable
transparent and trustworthy connections between
researchers, their contributions, and affiliations. We
provide this service to help people find information and
to simplify reporting and analysis.” (https://orcid.org/)
Ø  Research and scholarship focus
Ø  Expect use by individuals identified in workflows
C1
C3
C2
O1
O4
O2
O3
O5
Contributed-to
Cites
Contributor-Output graph
Generalize:
o  many contributor roles
o  expand “cites” to include other notions of
derivation
o  ++ add organization nodes for affiliation/funding/
etc. (and time dependence)
For full benefit ORCID
needs most researchers
to willingly use their
ORCID iD.
Links to other identities
– leverage overlaps
Biography and
information shown
under my control
... sources indicated
Researcher control
Researcher can choose
what appears on their
record
ORCID iD use
•  7000 journals use ORCID iDs, over
1500 of which require use by
corresponding authors
•  Researcher support from surveys:
o  In 2017 85.9% of respondents now believe
requiring the use of ORCID iDs is
beneficial to the global research
community, compared with 72.2% of 2015
respondents
o  In 2017 83.1% of respondents strongly
agree/agree that ORCID is “essential”,
compared with 48.8% in 2015.
ORCID community
Over 700
members
from 41
countries
https://orcid.org/statistics
3.9m researcher records,
1.5m records with at
least one connection:
24m works, 339K grants, 151K
reviews, 1.9m education and
1.5m employment items
More than 550 integrations
across all sectors of the
research community
Consortia in the UK, Denmark,
Finland, Sweden, Netherlands,
Belgium, Germany, Italy, South Africa,
Taiwan, Australia, New Zealand,
Canada and the US
ORCID Stakeholders, Actions and
Benefits
ORCID
Manuscript
submission
Review
Publication
with ORCID
ORCID
Author(s)
Readers
Reviewers
Automated record update - work
Journal article round trip
ORCID iDs are intended to be integrated into
research and publication workflows, and become
embedded in metadata. Thus ORCID iDs
associated with works when published
Ø  Ambiguity avoidance rather than disambiguation!
Linked Open
Data
Not (quite) the
semantic web
“it is clearly a good idea, and some very
nice demonstrations exist, but it has
not yet changed the world”
[out of context quote from “The Semantic Web” Berners-
Lee, Hendler and Lassila, Scientific American, May 17,
2001]
Linked Data
•  A practical
“sematic web lite”
•  Narrower focus
(“RDF standards” such
as ontologies, SPARQL,
etc. are the gateway to
a more complete
semantic web.)
https://www.w3.org/DesignIssues/LinkedData.html
Why replace
MARC with
Linked Data
formats?
1. MARC is inadequate
MARC continues to meet many needs,
but there are several areas of stress:
•  Translation of record, not descriptions
of appropriate entities
•  Use of text when we want data
•  Limited extensibility
•  Imprecise URI references (record or
RWO?)
•  ...
2. Use identifiers not names
Identifiers provide necessary layer of
indirection that authorized names do
not:
•  Identifiers more easily stable
o  e.g. no change from “Banks, Iain, 1953-” to
“Banks, Iain, 1953-2013”
•  Exact matching
•  URIs make the web work well
•  Does not replace authority ideas, just
makes them work better
3. Connect to the web
“Fortress MARC”
protects and
isolates libraries
from the web
•  Little reuse of
our data
•  Can’t use
standard tools
•  Difficult to
generalize
LibrariesLibraries
M
C
R
A
M
C
R
A
M
C
R
A
M
C
R
A
M
C
R
A
M
C
R
A
M
C
R
A
M
C
R
A
M
C
R
A
M
C
R
A
M
C
R
A
M
C
R
A
M
C
R
AM
C
R
A
M
C
R
A
M
C
R
A
M
C
R
A
M
C
R
A
Libraries
Web
The web is big ...
... most of
our users
spend most
of their time
there
[not to scale]
BIBFRAME & related ontologies
BIBFRAME2.0BIBFRAME1.0 BIBFRAME3.x? BIBFRAME4.x?
bflc extension
bibliotek-o
…others…
ArtFrame
RareMat
???
Community adoption & revision?
Community adoption & revision?
Community adoption & revision?
LD4L critique
NOW
{Extensions
Time
LD4L & LD4L Labs
Cornell, Harvard, Stanford, Iowa; 2014-2016
•  Conversion of MARC -> BIBFRAME at scale (~30M
records, ~3billion triples )
•  Blacklight-based search over combined catalogs
•  Ontology work around “LD4L ontology” which
provided significant input for BIBFRAME2.0
•  Support use of linked data authorities in the Hydra
stack via Questioning Authority gem
2016-2018
•  bibliotek-o ontology
•  Data conversion MARC & non-MARC to LD
•  VitroLib editor
•  Authority infrastructure and UI refinement including
context
https://ld4l.org/ld4l-labs/
LD4P – ... for Production
Columbia, Cornell, Harvard, LC, Princeton, Stanford –
2016-2018
•  Develop extension ontologies for
BIBFRAME2.0/bibliotek-o (ArtFrame,
Cartographic, Moving Image, Performed
Music, & Rare Materials)
•  Pilot transition of technical services
workflows to a linked data environment
o  copy cataloging
o  original cataloging
(“production” in LD4P means creation of catalog
records, not production-ready)
https://ld4l.org/ld4p
BIBFLOW (UCDavis, 2014-2016)
https://goo.gl/vwUiJY
Conservative
suggestion:
•  add URIs first
•  establish 2-way
conversions for
import/export
National Library of Finland
•  MARC to BIBFRAME to schema.org
•  Focus on web publication, hence
schema.org
http://swib.org/swib16/slides/suominen_silos.pdf
How close are we
to linked data
catalogs?
Let’s not forget utility
“Catalogers are primarily concerned
about the quality and consistency of the
data they produce, while technologists
are primarily concerned with the
techniques and tools that can be used to
manipulate it.”
[Jeff Edmunds,
https://scholarsphere.psu.edu/concern/generic_works/44558d45t ]
Discovery
system
ILS
(bib, holdings,
auth, circ)
MARC to LD
Datasharingbetweenlibraries
circ
LD cooperative
and vendor
sources
Browse and
explore with
context
Reconciliation
Lookup tools
(with
reconciliation)
Local LD
authorities
LD editors
LC marc2bibframe
LD4L Labs bib2lod
Blacklight with
LD extensions
LD4L Labs
VitroLib,
LC BFEdit
CEDAR
Vitro /
Triplestore
Non-library
web data
sources
Manual, automated and
semi-supervised
reconciliation tools
& practices
Web-based context:
Wikidata, DBpedia,
etc.
Web-scale
search
Analysis and
validation W3C SHACL
LD4L Labs
validation
OCLC schema,
LC pilots
schema.org
Authorities
with LD
descriptions
id.loc.gov, LC FAST,
VIAF, ORCID, Getty,
etc…
context data
users
Linked Data catalog ecosystem
Data
modeling &
profile
creation
Community
review and
discussion
Tool
building
Cataloging
and
conversion
Community
review and
discussion
Community
review and
discussion
Data use
(discovery)
End user
evaluation
Community
review and
discussion
Catalog system feedback cycles
Open
Collaborations
(around software)
Free and
Open
Source
Software
“Over The Wall”
•  Simply make a copy of the source
code available
•  Exemplified by many uses of
SourceForge (though has more
features)
•  Sharing but not collaboration
... better than not sharing
Open Development
•  and related: “Social Coding”
•  Share changes as they are made and
provide means of contact/input
•  Exemplified by basic use of GitHub
(other services too)
•  License for re-use
better than
“Over The Wall”
Community Development
•  aka “Community Source
Software”
•  Multiple parties working
together toward shared
goals
•  Norms
•  Coordination
•  Governance
https://commons.wikimedia.org/wiki/File:Tux.svg
https://commons.wikimedia.org/wiki/File:Apache_Software_Foundation_Logo_(2016).svg
Apache 2.0 License
Home in
Helsinki !
Samvera (formerly Hydra)
•  Framework and “solution bundles” for
repository and DAM systems
•  Blacklight/Solr + Fedora + Ruby
•  30+ partner institutions
•  Vibrant and supportive community
•  Yearly conference and other meetings
•  Training
•  Currently considering stronger
governance options
https://samvera.org/
International Image
Interoperability Framework
“A community of the world’s leading libraries
and image repositories working to produce a
community framework and interoperable
technology for image delivery.”
•  Primary outputs are specifications, software
developed by sub-groups
•  IIIF Consortium formed in 2015 to support
growth and adoption
o  > 40 members, growing rapidly
o  Memberships pay for staff (2)
o  Libraries, museums, galleries, vendors
http://iiif.io/
Final thoughts
Most of interesting big challenges
require collaboration to realize,
including the ones I’ve mentioned:
•  opening access to scholarly literature,
making it discoverable, and linking
researchers to their contributions
•  moving to the next generation of
library catalogs better integrated with
the web
Kiitos!
@zimeon
simeon.warner@cornell.edu

From Open Access to Open Standards, (Linked) Data and Collaborations

  • 1.
    From Open Accessto Open Standards, (Linked) Data and Collaborations Simeon Warner https://orcid.org/0000-0002-7970-7855 (Director of IT for Library Linked Data and Repository Architecture, Cornell University Library, USA) National Library of Finland Kirjastoverkkopäivät (Library Network Days), Helsinki, Finland, 2017-10-25
  • 4.
    How? •  This wasxxx.lanl.gov, now known as arXiv.org •  I worked in a narrow field •  Everyone posted to one place •  It was a newish field •  (I was perhaps happy to not read widely enough)
  • 5.
  • 6.
    What have welearned? •  Researchers are happy to use e-prints •  E-print repositories can scale •  Cost is low ($10-15/article) •  Some moderation necessary •  Not very disruptive to journal publishing (in physics) Demonstrates substrate for article distribution supporting overlay, but there has not been significant adoption of overlap model
  • 8.
    All primary (scientific) researchoutputs should be openly accessible
  • 9.
    Why? Because research willbe done more effectively if all shoulders are available to stand on
  • 10.
  • 11.
    Preprint tipping point? • arXiv “next generation” funding from Sloan and Heising-Simons foundations •  BioRxiv finding from Chen-Zuckerberg •  ASAPbio initiative funded by Sloan, Moore, Arnold and Simons foundations •  ...
  • 12.
    New abcXiv andacquisitions
  • 13.
  • 14.
  • 15.
    Long long ago, whenXML was hard, Unicode was merely one possible character set, a big hard drive was 10GB, and HotBot & AltaVista had a new competitor...
  • 16.
    ... it was1999and the UPS meeting in Santa Fe aimed to “... identify technologies to stimulate the adoption of the concept of [Open Access] author self-archived systems in scholarly communication; theorize a framework for the integration of e- print services in the academic document system ...” https://www.openarchives.org/meetings/SantaFe1999/ups-invitation-ori.htm
  • 17.
    Thus was bornOAI-PMH v1.0 2001, v1.1 2002, v2.0 2003
  • 18.
    OAI-PMH was great! • It works •  Scales to millions of items •  Easy to implement (good s/w libraries) •  XML, which brought UTF-8 for good multi-language support (hurrah!) •  Widely deployed, stable since 2003 (v2.0) •  Registries & validators •  Community & documentation
  • 19.
  • 21.
    Technical deficiencies •  NotRESTful •  Repository-centric •  XML metadata only •  Metadata is wrapped •  Dynamic set membership bug
  • 22.
    "Currently, OAI-PMH isthe only behavior that is uniformly exposed by most repositories. [But], its focus on metadata, its pull- based paradigm, and its technological roots that date back to the web of the nineties put it at odds with ... current web technologies." COAR Next Generation Repositories http://comment.coar-repositories.org/2-next-generation-repositories/
  • 23.
    Photo by drivethrucafeCC BY-SA https://www.flickr.com/photos/128758398@N07/15836296662
  • 24.
    Google Scholar is great,but not the answer
  • 25.
    Replacement with nogap We need a new approach that: •  Meets existing OAI-PMH use cases •  Supports content as well as metadata •  Scales better •  Follows web standards •  Is modern and developer friendly
  • 26.
    Push-me pull-you many items/ sources low latency / efficiency => push/notification modest size low barrier => pull
  • 27.
    ResourceSync ANSI/NISO Z39.99-2017 Sitemaps + • multiple sets •  fixity •  links •  changes only •  dumps Also supports Notifications (push) as optional extension
  • 28.
    CORE >6000 journals >2400 repositories >77Marticles (>6M full text) metadata + content
  • 29.
    Slide from PetrKnoth / CORE – DPLAfest 2017 presentation -- https://goo.gl/vz3zuJ Tested with resync client. 20 x 25MB sitemaps, 1M items ✔
  • 30.
    Repository Harvesting Conclusions The repository community should agreeon a common new approach to harvesting ResourceSync was designed to meet
  • 31.
    Repository prescription •  Metadataand content should be web resources o  stable URIs, follow web standards, not hidden behind query interfaces •  Support ResourceSync as the primary harvesting interface o  see e.g. http://hydrainabox.projecthydra.org/2017/06/22/ resourcesync.html o  OAI-PMH as secondary where necessary •  Distinguish and relate metadata and content entries
  • 32.
  • 33.
    Some of myperson ids http://orcid.org/0000-0002-7970-7855 http://www.isni.org/isni/0000000351311901 http://www.researcherid.com/rid/E-2423-2011 https://www.scopus.com/authid/detail.uri? authorId=7103063073 https://arxiv.org/a/warner_s_1 http://vivo.cornell.edu/display/individual24416 https://github.com/zimeon http://zimeon.com/me
  • 34.
    ORCID ISNI VIAF Scope: 8-20M active, +2-4M/year? Now: 3.2M Scope: ?M Now: 9M Scope: ?M Now: 6M Scopes and scales
  • 35.
    Why must ORCIDbe different? How many people should have ORCID iDs? o  UNESCO 2013 estimate: 7.8 million researchers o  OECD 2014 estimate: 25.5 million researchers o  Average “active lifetime” 3-6 years (guess) o  Far more than person records in authority systems How many research and scholarship outputs should be connected to these ORCID iDs? o  ~2 million journal articles published per year (https://arxiv.org/abs/1402.4578) o  + >> more if notions of scholarly output extend to data, code, specimens Ø  “Sort it all out after the fact with manual effort” solution not practical Ø  Solve with researcher engagement and use in publication workflows
  • 36.
    ORCID: Open Researcher andContributor ID “ORCID’s vision is a world where all who participate in research, scholarship, and innovation are uniquely identified and connected to their contributions across disciplines, borders, and time.” “ORCID provides an identifier for individuals to use with their name as they engage in research, scholarship, and innovation activities. We provide open tools that enable transparent and trustworthy connections between researchers, their contributions, and affiliations. We provide this service to help people find information and to simplify reporting and analysis.” (https://orcid.org/) Ø  Research and scholarship focus Ø  Expect use by individuals identified in workflows
  • 37.
    C1 C3 C2 O1 O4 O2 O3 O5 Contributed-to Cites Contributor-Output graph Generalize: o  manycontributor roles o  expand “cites” to include other notions of derivation o  ++ add organization nodes for affiliation/funding/ etc. (and time dependence)
  • 38.
    For full benefitORCID needs most researchers to willingly use their ORCID iD.
  • 39.
    Links to otheridentities – leverage overlaps Biography and information shown under my control ... sources indicated Researcher control Researcher can choose what appears on their record
  • 40.
    ORCID iD use • 7000 journals use ORCID iDs, over 1500 of which require use by corresponding authors •  Researcher support from surveys: o  In 2017 85.9% of respondents now believe requiring the use of ORCID iDs is beneficial to the global research community, compared with 72.2% of 2015 respondents o  In 2017 83.1% of respondents strongly agree/agree that ORCID is “essential”, compared with 48.8% in 2015.
  • 41.
    ORCID community Over 700 members from41 countries https://orcid.org/statistics 3.9m researcher records, 1.5m records with at least one connection: 24m works, 339K grants, 151K reviews, 1.9m education and 1.5m employment items More than 550 integrations across all sectors of the research community Consortia in the UK, Denmark, Finland, Sweden, Netherlands, Belgium, Germany, Italy, South Africa, Taiwan, Australia, New Zealand, Canada and the US
  • 43.
  • 44.
    ORCID Manuscript submission Review Publication with ORCID ORCID Author(s) Readers Reviewers Automated recordupdate - work Journal article round trip ORCID iDs are intended to be integrated into research and publication workflows, and become embedded in metadata. Thus ORCID iDs associated with works when published Ø  Ambiguity avoidance rather than disambiguation!
  • 45.
  • 46.
    Not (quite) the semanticweb “it is clearly a good idea, and some very nice demonstrations exist, but it has not yet changed the world” [out of context quote from “The Semantic Web” Berners- Lee, Hendler and Lassila, Scientific American, May 17, 2001]
  • 47.
    Linked Data •  Apractical “sematic web lite” •  Narrower focus (“RDF standards” such as ontologies, SPARQL, etc. are the gateway to a more complete semantic web.) https://www.w3.org/DesignIssues/LinkedData.html
  • 48.
  • 49.
    1. MARC isinadequate MARC continues to meet many needs, but there are several areas of stress: •  Translation of record, not descriptions of appropriate entities •  Use of text when we want data •  Limited extensibility •  Imprecise URI references (record or RWO?) •  ...
  • 50.
    2. Use identifiersnot names Identifiers provide necessary layer of indirection that authorized names do not: •  Identifiers more easily stable o  e.g. no change from “Banks, Iain, 1953-” to “Banks, Iain, 1953-2013” •  Exact matching •  URIs make the web work well •  Does not replace authority ideas, just makes them work better
  • 51.
    3. Connect tothe web “Fortress MARC” protects and isolates libraries from the web •  Little reuse of our data •  Can’t use standard tools •  Difficult to generalize LibrariesLibraries M C R A M C R A M C R A M C R A M C R A M C R A M C R A M C R A M C R A M C R A M C R A M C R A M C R AM C R A M C R A M C R A M C R A M C R A
  • 52.
    Libraries Web The web isbig ... ... most of our users spend most of their time there [not to scale]
  • 53.
    BIBFRAME & relatedontologies BIBFRAME2.0BIBFRAME1.0 BIBFRAME3.x? BIBFRAME4.x? bflc extension bibliotek-o …others… ArtFrame RareMat ??? Community adoption & revision? Community adoption & revision? Community adoption & revision? LD4L critique NOW {Extensions Time
  • 54.
    LD4L & LD4LLabs Cornell, Harvard, Stanford, Iowa; 2014-2016 •  Conversion of MARC -> BIBFRAME at scale (~30M records, ~3billion triples ) •  Blacklight-based search over combined catalogs •  Ontology work around “LD4L ontology” which provided significant input for BIBFRAME2.0 •  Support use of linked data authorities in the Hydra stack via Questioning Authority gem 2016-2018 •  bibliotek-o ontology •  Data conversion MARC & non-MARC to LD •  VitroLib editor •  Authority infrastructure and UI refinement including context https://ld4l.org/ld4l-labs/
  • 55.
    LD4P – ...for Production Columbia, Cornell, Harvard, LC, Princeton, Stanford – 2016-2018 •  Develop extension ontologies for BIBFRAME2.0/bibliotek-o (ArtFrame, Cartographic, Moving Image, Performed Music, & Rare Materials) •  Pilot transition of technical services workflows to a linked data environment o  copy cataloging o  original cataloging (“production” in LD4P means creation of catalog records, not production-ready) https://ld4l.org/ld4p
  • 56.
  • 57.
    National Library ofFinland •  MARC to BIBFRAME to schema.org •  Focus on web publication, hence schema.org http://swib.org/swib16/slides/suominen_silos.pdf
  • 58.
    How close arewe to linked data catalogs?
  • 59.
    Let’s not forgetutility “Catalogers are primarily concerned about the quality and consistency of the data they produce, while technologists are primarily concerned with the techniques and tools that can be used to manipulate it.” [Jeff Edmunds, https://scholarsphere.psu.edu/concern/generic_works/44558d45t ]
  • 60.
    Discovery system ILS (bib, holdings, auth, circ) MARCto LD Datasharingbetweenlibraries circ LD cooperative and vendor sources Browse and explore with context Reconciliation Lookup tools (with reconciliation) Local LD authorities LD editors LC marc2bibframe LD4L Labs bib2lod Blacklight with LD extensions LD4L Labs VitroLib, LC BFEdit CEDAR Vitro / Triplestore Non-library web data sources Manual, automated and semi-supervised reconciliation tools & practices Web-based context: Wikidata, DBpedia, etc. Web-scale search Analysis and validation W3C SHACL LD4L Labs validation OCLC schema, LC pilots schema.org Authorities with LD descriptions id.loc.gov, LC FAST, VIAF, ORCID, Getty, etc… context data users Linked Data catalog ecosystem
  • 61.
    Data modeling & profile creation Community review and discussion Tool building Cataloging and conversion Community reviewand discussion Community review and discussion Data use (discovery) End user evaluation Community review and discussion Catalog system feedback cycles
  • 62.
  • 63.
  • 64.
    “Over The Wall” • Simply make a copy of the source code available •  Exemplified by many uses of SourceForge (though has more features) •  Sharing but not collaboration ... better than not sharing
  • 65.
    Open Development •  andrelated: “Social Coding” •  Share changes as they are made and provide means of contact/input •  Exemplified by basic use of GitHub (other services too) •  License for re-use better than “Over The Wall”
  • 66.
    Community Development •  aka“Community Source Software” •  Multiple parties working together toward shared goals •  Norms •  Coordination •  Governance https://commons.wikimedia.org/wiki/File:Tux.svg https://commons.wikimedia.org/wiki/File:Apache_Software_Foundation_Logo_(2016).svg Apache 2.0 License Home in Helsinki !
  • 67.
    Samvera (formerly Hydra) • Framework and “solution bundles” for repository and DAM systems •  Blacklight/Solr + Fedora + Ruby •  30+ partner institutions •  Vibrant and supportive community •  Yearly conference and other meetings •  Training •  Currently considering stronger governance options https://samvera.org/
  • 68.
    International Image Interoperability Framework “Acommunity of the world’s leading libraries and image repositories working to produce a community framework and interoperable technology for image delivery.” •  Primary outputs are specifications, software developed by sub-groups •  IIIF Consortium formed in 2015 to support growth and adoption o  > 40 members, growing rapidly o  Memberships pay for staff (2) o  Libraries, museums, galleries, vendors http://iiif.io/
  • 69.
    Final thoughts Most ofinteresting big challenges require collaboration to realize, including the ones I’ve mentioned: •  opening access to scholarly literature, making it discoverable, and linking researchers to their contributions •  moving to the next generation of library catalogs better integrated with the web
  • 70.