iPhylo: SICI

Roderic D. M. Page

Showing posts with label SICI. Show all posts

Thursday, May 29, 2008

When DOIs collide and then disappear: when is a unique, resolvable identifier a bad idea?

As much as I like the idea of a globally unique, resolvable identifier, my recent experience with JSTOR is making me wonder.

JSTOR has three identifiers for articles it archives, DOIs, SICIs, and stable URLs (the later being introduced with the new platform released April 4, 2008). Previously JSTOR would publish DOIs for many of its articles. However, not all of these work, and many are now embedded in the HTML (say, in Dublin Core meta elements) but not publicly displayed.

I suspect the issue is the moving wall:

Journals in JSTOR have "moving walls" that define the time lag between the most current issue published and the content available in JSTOR. The majority of journals in the archive have moving walls of between 3 and 5 years, but publishers may elect walls anywhere from zero to 10 years.

Now, imagine that a publisher has an article on its web site, complete with a DOI, and that article is then add to JSTOR, but is still displayed on the publisher's site.

To make this concrete, consider the article by Baum et al. . On the InformaWorld site this is displayed with doi:10.1080/106351598260879. The same article is also in JSTOR, with the URL http://www.jstor.org/pss/2585367. No DOI is displayed on the page, but if you look at the HTML source, we find:
<meta name="dc.Identifier" scheme="doi" content="10.2307/2585367">. The DOI prefix 10.2307 is used for all JSTOR DOIs, and some for Systematic Biology still work, e.g. 10.2307/2413524.

Now, what happens when the JSTOR moving wall overlaps with publisher's material? What happens if a publisher digitises back issues, then assigns them DOIs? Do the JSOR DOIs then die (as some of them seem to have already done)? And what happens to the poor sap like me, who has been linking to JSTOR DOIs in the naive belief that DOIs don't die?

Suddenly separating identity from resolution is starting to look very attractive...

Sunday, October 28, 2007

Universal Serial Item Names

Following on from the discussion of BHL and DOIs, I stumbled across some remarkable work by Robert Cameron at SFU. Cameron has developed Universal Serial Item Names (USIN). The approach is spelled out in detail in Towards Universal Serial Item Names (also on Scribd). This lengthy document deals with how to develop user-friendly identifiers for journal articles, books, and other documents. The solution looks less baroque than SICIs, which I've discussed earlier.

There is also a web site (USIN.org), complete with examples and source code. Identifiers for books are straightforward, for instance bibp:ISBN/0-86542-889-1 identifies a certain book:

For journals things are slightly more complicated. However, Cameron simplified things a little in his subsequent paper Scholar-Friendly DOI Suffixes with JACC: Journal Article Citation Convention (also on Scribd).

JACC (Journal Article Citation Convention) is proposed as an alternative to SICI (Serial Item and Contribution Identifier) as a convention for specifying journal articles in DOI (Digital Object Identifier) suffixes. JACC is intended to provide a very simple tool for scholars to easily create Web links to DOIs and to also support interoperability between legacy article citation systems and DOI-based services. The simplicity of JACC in comparison to SICI should be a boon both to the scholar and to the implementor of DOI responders.

USIN and JACC use the minimal number of elements in order to identifier an article, such as journal code (e.g., ISSN or an accepted acronym), volume number, and starting page. Using ISSNs ensures globally unique identifiers for journals, but the scheme can also use acronyms, hence those journals that lack ISSNs could be catered for. The scheme is simple, and in many cases will provide the bare minimum of information necessary to locate an item via an OpenURL resolver. Indeed, one simple way to implement USIN identifiers would be to have a service that takes URIs of the form <journal-code>:<volume>@<page> and resolves them behind the scenes using OpenURL. Hence we get simple identifiers that are resolvable, without the baroque approach of SICIs.

When I get the chance I may add support for something like this to bioGUID.

Monday, May 07, 2007

Catalogue of Life, OpenURL, and taxonomic literature

Playing with the recently released "Catalogue of Life" CD, and pondering Charles Hussey's recent post to TAXACOM about the "European Virtual Library of Taxonomic Literature (E-ViTL)" (part of EDIT) has got me thinking more and more about how primitive our handling of taxonomic literature is, and how it cripples the utility of taxonomic databases such as the Catalogue of Life. For example, none of the literature listed in the Catalogue of Life is associated with any digital identifier (such as a DOI, Handle, SICI, or even a URL). In the digital age, this renders the literature section nearly useless -- a user has to search for the reference in Google. Surely we want identifiers, not (often poorly formed) bibliographic citations? For example, I think hdl:2246/4613 is more useful than

Schmidt, K. P. 1921. New species of North American lizards of the genera Holbrookia and Uta. American Museum Novitates (22)

Given the Handle hdl:2246/4613, we get straight to the bibliographic resource, and in this case, a PDF of the paper. In the digital age this is what we need.

So, how to get there? Well, I think we need to focus on developing services to associate references with identifiers. Imagine a service that takes a bibliographic record and returns a globally unique identifier for that reference. This, of course, is part of what CrossRef provides through its OpenURL resolver.

OpenURL has been around a while, and despite the fact that it is probably over complicated (see I hate library standards for more on the seeming desire of librarians to make things harder than they need to be), I think it is a useful way to think about the task of linking taxonomic names to literature, especially if we keep things simple (see Rethinking OpenURL). In particular, drop the obsession with local context -- I don't care what my local library has, my library is the cloud.

So, what if we had an OpenURL service that took a bibliographic citation and queried local and remote sources for a digital identifier, such as a DOI or a Handle, for that citation? If there is no such identifier, then the next step is to create one. For example, the service could create a SICI (see my del.icio.us bookmarks for sici) for that item. Ideally, for those items that were digitised, we could have a database that associated SICIs with the resource location. For example, most of the journal Psyche is available free online as PDFs, and has XML files for each volume providing full bibliographic details (including URLs). It would be trivial to harvest these and add this information to an OpenURL service.

These ideas need a little more fleshing out, but I think it's time the taxonomic community started thinking seriously about digital identifiers for literature, and how they would be used. CrossRef is a great example of what can be done with some simple services (Handles + OpenURL), and it's a tragedy that every time DOIs come up people get blinded by cost, and don't spend time trying to understand how CrossRef works. If nyou want a good demonstration of what can be done with CrossRef, just look at Connotea, which builds much of its functionality on top of CrossRef web services.

It is also interesting that CrossRef is much simpler to use than repositories such as DSpace (used by the AMNH's digital library) -- each DSpace installation has it's own hooks to retrieve metadata (in some cases, such as the AMNH, appallingly badly formed), and as a result there is no easy way to discover what metadata is associated with a given handle, nor given a citation whether a handle exists for that citation.

So, when projects such as EDIT start talking about taxonomic libraries, I think they need to think in terms of simple web services that will serve as the building blocks for other tools. An OpenURL service would be a major boon, and would speed us towards the day when databases such as the Catalogue of Life would not contain (often inconsistently formed) text records of bibliographic works, but actionable identifiers. Any thing less and we remain in the dark ages.