Showing posts with label ngram. Show all posts
Showing posts with label ngram. Show all posts

Friday, December 23, 2016

Taxonomic name timelines for BHL

Given a big corpus of literature one of the fun things to do is look at how the use of a term has changed over time. When did people first use a particular word? When did one word start to replace another, etc.? Google's Ngram Viewer is perhaps the best known tool for exploring these questions.

In the context of biodiversity doing something similar for BHL is an obvious thing to do. I've made various clunky attempts in the past (e.g., Biodiversity Heritage Library sparklines) but these all died.

Ryan Schenk (who did a lot of the user interface for my BioNames project) wrote a very stylish tool to display changes in names over time. Called "Synynyms" his tool is now defunct, but you can read about it here and the source code is on github. Ryan would take a name, find synonyms, then graph the changes in use of all those names over time.

Bison bison Linnaeus 1758 synynyms 1024x675

The death of Synynyms has not gone unnoticed:

I've had a tool for my own use that searches BHL for a name and displays the results after first trying to aggregate the hits in a sensible way. For example, if there is more than one hit in a scanned volume, and those hits al fall on pages in the same article in BioStor, then I display the BioStor article, instead of a list of each hit separately. Inspired by @PhyloJCAM's question I've built a simple tool to explore the use of one or more name over time.

Located in the "labs" section of BioStor, the BHL timeline takes one or more names and searches for those names in BHL, displaying the results as a chart and a list of hits. I often use it simply to search BHL for a particular name, but you can also use it to compare names, e.g. Aspidoscelis costata and Cnemidophorus costatus:

Screenshot 2016 12 23 06 38 32

The timeline tool is pretty crude, and it's slow if there are lots of hits in BHL. So, it's not as slick as Synynyms (Ryan Schenk is a clever programmer than I am). Still, it is a useful way to explore BHL and discover articles that you might not have known existed.

Wednesday, September 07, 2011

Suggested apps for BHL's Life and Literature Code Challenge


Since I won't be able to be at the Biodiversity Heritage Library's Life and Literature meeting I thought I'd share some ideas for their Life and Literature Code Challenge. The deadline is pretty close (October 17) so having ideas now isn't terribly helpful I admit. That aside, here are some thoughts inspired by the challenge. In part this post has been inspired by the Results of the PLoS and Mendeley "Call for Apps", where PLoS and Mendeley asked for people (not necessarily developers) to suggest the kind of apps they'd like to see. As an aside, one thing conspicuous by it's absence is a prize for winning the challenge. PLoS and Mendeley have a "API Binary Battle" with a prize of $US 10,001, which seems more likely to inspire people to take part.

Visual search engine
I suspect that many BHL users are looking for illustrations (exemplified by the images being gathered in BHL's Flickr group). One way to search for images would be to search within the OCR text for figure and plate captions, such as "Fig. 1". Indexing these captions by taxonomic name would provide a simple image search tool. For modern publications most figures are on the same page as the caption, but for older publications with illustrations as plates, the caption and corresponding image may be separated (e.g., on facing pages), so the search results might need to show pages around the page containing the caption. As an aside, it's a pity the Flickr images only link to the BHL item and not the BHL page. If they did the later, and the images were tagged with what they depict, you could great a visual search engine using the Flickr API (of course, this might be just the way to implement the visual search engine — harvest images, tags with PageID and taxon names, upload to Flickr).

Mobile interface
The BHL web site doesn't look great on an iPhone. It makes no concessions to the mobile device, and there are some weird things such as the way the list of pages is rendered. A number of mainstream science publishers are exploring mobile versions of their web sites, for example Taylor and Francis have a jQuery Mobile powered interface for mobile users. I've explored iPad interfaces to scientific articles in previous posts. BHL content posses some challenges, but is fundamentally the same as viewing PDFs — you have fixed pages that you may want to zoom.

OCR correction
There is a lot of scope for cleaning up the OCR text in BHL. Part of the trick would be to have a simple use interface for people to contribute to this task. In an earlier post I discussed a Firefox hOCR add-on that provides a nice way to do this. Take this as a starting point, add a way to save the cleaned up text, and you'd be well on the way to making a useful tool.

Taxon name timeline
Despite the shiny new interface, the Encyclopedia of Life still displays BHL literature in the same clunky way I described in an earlier blog post. It would great to have a timeline of the usage of a name, especially if you could compare the usage of different names (such as synonyms). In many ways this is the BHL equivalent Google Books Ngram viewer.

These are just a few hastily put together thoughts. If you have any other ideas or suggestions, feel free to add them as comments below.

- Posted using BlogPress from my iPad