Archives & The
Semantic Web
      Mark A. Matienzo
  The New York Public Library
New York Archivists’ Round Table
Annual Meeting, June 23, 2009
Disclaimer

   The following presentation, while
 factual, expresses opinions of my own
and not of my employer, my coworkers,
             my family, etc.
Archives & The Web
The Web isn’t new,
 even to archivists.
http://listserv.muohio.edu/scripts/wa.exe?S1=archives
http://web.archive.org/web/19970606072913/http://www.nara.gov/
Web-based archival
description isn’t new.
http://sunsite.berkeley.edu/FindingAids/EAD/bfap.html
http://web.archive.org/web/19970523152709/http://www.library.yale.edu/beinecke/aboutosb.htm
http://web.archive.org/web/19990203012659/lcweb.loc.gov/ead/
http://digilib.nypl.org/dynaweb/ead/nypl/greco/@Generic__BookView
The Web, at its essence,
    is about links.
We take links for
granted in our work.
http://www.nypl.org/research/manuscripts/result.cfm?find=1
http://www.nypl.org/research/manuscripts/result.cfm?find=1
http://www.nypl.org/research/chss/spe/brg/berg.html
http://www.nypl.org/research/manuscripts/berg/brgabbey.xml
http://catnyp.nypl.org/record=b7621732
Links go beyond the
easily accessible sort.
http://catnyp.nypl.org/record=b7621732
http://catnyp.nypl.org/search?/dAbbey+Theatre./dabbey
+theatre/-3%2C-1%2C0%2CB/exact&FF=dabbey+theatre&1%2C41%2C
http://leopac4.nypl.org/ipac20/ipac.jsp?
session=1O4572959KO37.4065&profile=dial--3&uri=link=1100083~!S908632~!
       1100001~!1100087&aspect=basic&menu=search&ri=2&source=~!
                    dial&term=Abbey+Theatre&index=SL
http://authorities.loc.gov/cgi-bin/Pwebrecon.cgi?
AuthRecID=969701&v1=1&HC=2&SEQ=20090622233419&PID=vDA4Ugr3s8SGy7-dKIByauO
Further down the
  rabbit hole.
http://en.wikipedia.org/wiki/Abbey_Theatre
http://www.abbeytheatre.ie/
Links become implicit.
Computers don’t “do”
   implicit links.
Humans must correlate
 data on both ends.
These access points don’t link to anything.
The Semantic Web
(blame this guy)
  http://www.flickr.com/photos/tanaka/3212373419/
I have a dream for the Web [in which
computers] become capable of analyzing
all the data on the Web – the content, links,
and transactions between people and
computers. A ‘Semantic Web’, which should
make this possible, has yet to emerge, but
when it does, the day-to-day mechanisms of
trade, bureaucracy and our daily lives will
be handled by mac hines talking to
machines. The ‘intelligent agents’ people
have touted for ages will finally materialize.
              Tim Berners-Lee, Weaving The Web.
Linked Data is a way
    to link better.

               Dan Chudnov, Better Living Through Linking.
  http://onebiglibrary.net/story/tcdl-2009-talk-better-living-through-linking
Linked data is not a new
   concept in archives.
If the series becomes the primary level of
classification, and the item the secondary
level, a) items are kept in their
administrative context and original order by
physical allocation to their appropriate
series, and b) series are no longer kept in
any original physical order in a record or
shelf group (if there is any such order) but
simply have their administrative context and
associations recorded on paper.

Peter J. Scott, “The Record Group Concept: A Case For Abandonment,” American Archivist 29(4), 1966.
Peter J. Scott, “The Record Group Concept: A Case For Abandonment,” American Archivist 29(4), 1966.
Peter J. Scott, “The Record Group Concept: A Case For Abandonment,” American Archivist 29(4), 1966.
Design Principles
1. Use URIs for names of things
2. Use HTTP URIs so people can look up
   those names
3. Provide useful information in standard
   formats at those URIs
4. Include links to other URIs so people
   can discover more things

Tim Berners-Lee, Linked Data - Design Issues. http://www.w3.org/DesignIssues/LinkedData.html
Naming things with
URIs tells us where
  to find them.
Using HTTP (Web)
URIs tells us how to
 find these things.
Providing data in
standard formats tells
 us what that thing is.
EAD is not a standard
 format in this sense.
RDF
1. Resource Description Framework
2. Presents relationships in a simple data
   structure
3. We can draw graphs of those relationships
4. We can represent those relationships in
   multiple formats for computers

  Tim Berners-Lee, Linked Data - Design Issues. http://www.w3.org/DesignIssues/LinkedData.html
In RDF, we say some
thing has a property
with a certain value.
<http://matienzo.org/#me> foaf:firstName “Mark”.
<http://matienzo.org/#me> foaf:firstName “Mark”.
        thing (Me)          property    value
An RDF Graph

                          foaf:firstName
                                          “Mark”



http://matienzo.org/#me
An RDF Graph

                          foaf:based_near
                                            http://dbpedia.org/page/Brooklyn
                           foaf:firstName
                                            “Mark”
                           foaf:surname
                                            “Matienzo”
http://matienzo.org/#me
Simply linking to things
    is not enough.
RDF graphs show why
we link to other things.
These links say what
the relationships are.
Links between things
become crossreferences.
Precision improves
with explicit links and
 “smart crawlers.”
http://sindice.com/search?q=abbey+theatre&qt=term
http://www4.wiwiss.fu-berlin.de/bizer/pub/lod-datasets_2009-03-27.png
Examples in Libraries
LIBRIS




http://libris.kb.se/data/bib/4721351
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix libris: <http://libris.kb.se/vocabulary/experimental#> .
@prefix bibo: <http://purl.org/ontology/bibo/> .
<http://libris.kb.se/resource/bib/4721351> rdfs:isDefinedBy <http://libris.kb.se/data/bib/4721351> .
<http://libris.kb.se/resource/bib/4721351> rdf:type bibo:Book .
<http://libris.kb.se/resource/bib/4721351> dc:title "The Abbey : Ireland's national theatre
                                                      1904-1979"@en .
<http://libris.kb.se/resource/bib/4721351> dc:creator "Hunt, Hugh" .
<http://libris.kb.se/resource/bib/4721351> dc:creator "Hugh Hunt" .
<http://libris.kb.se/resource/bib/4721351> dc:type "text" .
<http://libris.kb.se/resource/bib/4721351> dc:publisher "Columbia U.P" .
<http://libris.kb.se/resource/bib/4721351> dc:date "1979" .
<http://libris.kb.se/resource/bib/4721351> dc:description "U Can $ 33.60"@en .
<http://libris.kb.se/resource/bib/4721351> dc:identifier <URN:ISBN:0231049064> .
<http://libris.kb.se/resource/bib/4721351> bibo:isbn10 "0231049064" .
<http://libris.kb.se/resource/bib/4721351> libris:held_by <http://libris.kb.se/resource/library/G> .
<http://libris.kb.se/resource/bib/4721351> libris:held_by <http://libris.kb.se/resource/library/L> .
<http://libris.kb.se/resource/bib/4721351> libris:held_by <http://libris.kb.se/resource/library/Li> .
<http://libris.kb.se/resource/bib/4721351> libris:held_by <http://libris.kb.se/resource/library/U> .
<http://libris.kb.se/resource/bib/4721351> libris:held_by <http://libris.kb.se/resource/library/Uh> .




                   http://libris.kb.se/data/bib/4721351?format=text%2Frdf%2Bn3
id.loc.gov




http://id.loc.gov/authorities/sh96007490
Chronicling America




    http://chroniclingamerica.loc.gov/lccn/sn85066387/
NSDL Registry




   http://metadataregistry.org/
Examples in Archives
UK Archival Thesaurus




   http://www.w3.org/TR/2005/WD-swbp-skos-core-guide-20050510/
Archives de France “Thesaurus W”




    http://www.archivesdefrance.culture.gouv.fr/gerer/classement/normes-outils/thesaurus/
Agrippa (AMVC)




http://www.analogousspaces.com/media/docs/GUNS_AS_MAY08.pdf
Barriers are both
cultural and technical.
Archival description
  contains lots of
implicit information.
“Inheritance” of data in
multi-level description is
     highly implicit.
EAD is document-centric
    standard, not a
 data-centric standard.
EAC, a standard in
development, is more
    data-centric.
Archival description, in
its current state, is not
   computer-friendly.
Archival description, in
its current state, is not
 Linked Data-friendly.
EAD needs to change to
interoperate with EAC as
 well as other standards.
It is up to the archival
community to steer the
standards accordingly.
Thank You
     mark@matienzo.org
     http://matienzo.org/
http://twitter.com/anarchivist

Archives & the Semantic Web