iPhylo: e-Biosphere

Roderic D. M. Page

Showing posts with label e-Biosphere. Show all posts

Friday, June 05, 2009

ChrisFreeland.com: #ebio09, silverbacks, & haiku

Chris Freeland has written a thoughtful summary of his experiences of the two-day closed session to create a road map for biodiversity informatics, entitled #ebio09, silverbacks, & haiku.

Wednesday, June 03, 2009

e-Biosphere '09: Twitter rules, and all that

So, e-Biosphere '09 is over (at least for the plebs like me, the grown ups get to spend two days charting the future of biodiversity informatics). It was an interesting event, on several levels. It's late, and I'm shattered, so this post ill cover only a few things.

This was first conference I'd attended where some of the participants twittered during proceedings. A bunch of us settled on the hashtag #ebio09 (you can also see the tweets at search.twitter.com). For the uninitiated, a "hashtag" is a string preceded by a hash symbol (#), to indicate that it is a tag, such as #fail. It provides a simple way to tag tweets so that others interested in that topic can find them.

Twittering created a whole additional layer to the conference. We were able to:

Moan about the appallingly bad wifi "@Acronema: Using wifi here at #ebio09 is like wading through treacle!

Moan about the food (which in general was good) @peetucket: lime banana chocolote cheescake = worst dessert ever #ebio09 #dessert #fail

Embellish the presentations with links to related material @rdmpage: Vishwas Chavan ZooKeys example of publishing data with paper, see http://dx.doi.org/10.3897/zookeys.11.210 #ebio09

Engage with people outside the room @jatorre @rdmpage thanks Rod.For those of us who are on the booths is great to get your live reports :) #ebio09, or indeed on the other side of the planet @kejames: @Jim_Croft Re. blogger bioblitz google group's 'new members': I noticed too and have emailed Joel Sachs who promoted the site 2day. #ebio09

Twitter greatly enhanced the conversation, noticeably when a speaker said something controversial (all too rare, sadly), or when a group rapporteur's summary didn't reflect all the views in that group. It also helped document what was going on, and this can be further exploited. For fun, I grabbed tweets from days 2 and 3 and made a wordle:

As @edwbaker noted @edwbaker @rdmpage The size of 'together', 'people' & 'visionary' is somewhat telling...... In case you're wondering about the prominence of "Knowlton", it's because Nancy Knowlton gave a nice talk highlighting the every increasing number of cases where we have no names for the things we are encountering (for example, when barcoding fresh samples from poorly studied environments). This is just one example of the huge disconnect between the obsession with taxonomic names in biodiversity informatics, and the reality of metagenomics and DNA barcoding. Just as worrying is the lack of resemblance of the taxonomic classification used by the Encyclopedia of Life and our notion of the evolutionary tree of those organisms. A systematist would find much of EOL's classification laughable. I don't want to bash EOL, but it's worrying that they can continue to crank out press releases, but fail to provide something like a modern classification.

But I digress. In many ways this was less of a scientific conference and more of an event to birth a discipline, namely "biodiversity informatics" (which I'm sure some would claim as been around for quite a while). So, the event was to attract attention to the topic, and assure the outside world (and those attending) that the field exists and has something to say. It also was billed as a forum to discuss strategies for its future. Sadly, much of this discussion will take place behind closed doors, and will feature the major players who bring money and influence (but not much innovation) to the table.

Symptomatic of this lack of innovation, in a sense, was the contrast between the official "Online Conference Community", and the twitter feed. When I asked if anybody on twitter had used the official forum, @fak3r replied tellingly: @rdmpage thought we were on it ;) #ebio09. As fun as it is to use the new hotness to conduct a parallel (and slightly subversive) discussion at a conference it's worrying that, in a field that calls itself "informatics" the big beasts probably had little idea what was going on. If we are going to exploit the tools the web provides, we need people who "get it", and I'm unconvinced that the big players in this area truly grasp the web (in all it's forms). There's also a worrying degree of physics envy, which might be cured by reading The Unreasonable Effectiveness of Data (doi:10.1109/mis.2009.36).

I tried to stir things up a little (almost literally as captured in this photo by Chris Freeland), with a couple of questions, but to not much effect (other than apparently driving to despair the poor chap behind me ).

But enough grumbling. It was great to see lots of people attending the event, the were lots of interesting posters and booths (creating a market for this field may go some way towards providing an incentive to provide better, more reliable services), and my challenge entry won joint first prize, so perhaps I should sit back, enjoy the wine Joel Sachs choose as the prize (many thanks for his efforts in putting the challenge event together), and let others say what they thought of the meeting.

Saturday, May 30, 2009

e-Biosphere 09 Challenge slides

I've put the slides for my e-Biosphere 09 challenge entry on SlideShare.

e-Biosphere '09 Challenge

View more OpenOffice presentations from rdmpage.

Not much information on the other entries yet, except for the eBiosphere Citizen Science Challenge, by Joel Sachs and colleagues, which will demonstrate a "global human sensor net". Their plan is to aggregate observations posted on Flickr, Twitter, Spotter, and email. It might be fun to make use of some of this for my own entry (by default already will, because we are both using EOL's Flickr pool).

Tuesday, May 26, 2009

e-Biosphere Challenge: visualising biodiversity digitisation in real time

e-Biosphere '09 kicks off next week, and features the challenge:

Prepare and present a real-time demonstration during the days of the Conference of the capabilities in your community of practice to discover, disseminate, integrate, and explore new biodiversity-related data by:
Capturing data in private and public databases;
Conducting quality assurance on the data by automated validation and/or peer review;
Indexing, linking and/or automatically submitting the new data records to other relevant databases;
Integrating the data with other databases and data streams;
Making these data available to relevant audiences;
Make the data and links to the data widely accessible; and
Offering interfaces for users to query or explore the data.

Originally I planned to enter the wiki project I've been working on for a while, but time was running out and the deadline was too ambitious. Hence, I switched to thinking about RSS feeds. The idea was to first create a set of RSS feeds for sources that lack them, which I've been doing over at http://bioguid.info/rss, then integrate these feeds in a useful way. For example, the feeds would include images from Flickr (such as EOL's pool), geotagged sequences from GenBank, the latest papers from Zootaxa, and new names from uBio (I'd hoped to include ION as well, but they've been spectacularly hacked).

After playing with triple stores and SPARQL (incompatible vocabularies and multiple identifiers rather buggers this approach), and visualisations based on Google Maps (building on my swine flu timemap), it dawned on me what I really needed was an eye-catching way of displaying geotagged, timestamped information, just like David Troy's wonderful twittervision and flickrvision.com. In particular, David took the Poly9 Globe and added Twitter and Flickr feeds (see twittervision 3D and flickrvision 3D. So, I took hacked David's code and created this, which you can view at http://bioguid.info/ebio09/www/3d/:

It's a lot easier to simply look at it rather than describe what it does, but here's a quick sketch of what's under the hood.

Firstly, I take RSS feeds, either the raw geoFeed from Flickr, or from http://bioguid.info/rss. The bioGUID feeds include the latest papers in Zootaxa (most new animal species are described in this journal), a modified version of uBio's new names feed, and a feed of the latest, geotagged sequences in GenBank (I'd hoped to use only DNA barcodes, but it turns out rather few barcode sequences are geotagged, and few have the "BARCODE" keyword). The Flickr feeds are simple to handle because they include locality information (including latitude, longitude, and Yahoo Where-on-Earth Identifiers (WOEIDs)). Similarly, the GenBank feed I created has latitude and longitudes (although extracting this isn't always as straightforward as it should be). Other feeds require more processing. The uBio feed already has taxonomic names, but no geotagging, so I use services from Yahoo! GeoPlanet™ to find localities from article titles. For the Zootaxa feed that I created I use uBio's SOAP service to extract taxonomic names, and Yahoo! GeoPlanet™ to extract localities.

I've tried to create a useful display popup. For Zootaxa papers you get a thumbnail of the paper, and where possible an icon of the taxonomic group the paper talks about (the presence of this icon depends on the success of uBio's taxonomic name finding service, the Catalogue of Life having the same name, and my having a suitable icon). The example above shows a paper about copepods. Other papers have a icon for the journal (again, a function of my being able to determine the journal ISSN and having a suitable icon). Flickr images simply display a thumbnail of the image.

What does it all mean? Well, I could say all sorts of things about integration and mash-ups but, dammit, it's pretty. I think it's a fun way to see just what is happening in digital biodiversity. I've deliberately limited the demo to items that came online in the month of May, and I'll be adding items during the conference (June 1-3rd in London). For example, if any more papers appear in Zootaxa, or in the uBio feeds I'll add those. If anybody uploads geotagged photos to EOL's Flickr group, I'll grab those as well. It's still a bit crude, but it shows some of the potential of bringing things together, coupled with a nice visualisation. I welcome any feedback.

Wednesday, May 06, 2009

Integrating and displaying data using RSS

Although I'd been thinking of getting the wiki project ready for e-Biosphere '09 as a challenge entry, lately I've been playing with RSS has a complementary, but quicker way to achieve some simple integration.

I've been playing with RSS on and off for a while, but what reignited my interest was the swine flu timemap I made last week. The neatest thing about the timemap was how easy it was to make. Just take some RSS that is geotagged and you get the timemap (courtesy of Nick Rabinowitz's wonderful Timemap library).

So, I began to think about taking RSS feeds for, say journals and taxonomic and genomic databases and adding them together and displaying them using tools such as timemap (see here for an earlier mock up of some GenBank data). Two obstacles are in the way. The first is that not every data source of interest provides RSS feeds. To address this I've started to develop wrappers around some sources, the first of which is ZooBank.

The second obstacle is that integration requires shared content (e.g., tags, identifiers, or localities). Some integration will be possible geographically (for example, adding geotagged sequences and images to a map), but this won't work for everything. So, I need to spend some time trying to link stuff together. In the case of Zoobank there's some scope for this, as ZooBank metadata sometimes includes DOIs, which enables us to link to the original publication, as well as bookmarking services such as Connotea. I'm aiming to include these links within the feed, as shown in this snippet (see the <link rel="related"...> element):

<entry>
<title>New Protocetid Whale from the Middle Eocene of Pakistan: Birth on Land, Precocial Development, and Sexual Dimorphism</title>
<link rel="alternate" type="text/html" href="http://zoobank.org/urn:lsid:zoobank.org:pub:8625FB9A-1FC3-43C3-9A99-7A3CDE0DFC9C"/>
<updated>2009-05-06T18:37:34+01:00</updated>
<id>urn:uuid:c8f6be01-2359-1805-8bdb-02f271a95ab4</id>
<content type="html">Gingerich, Philip D., Munir ul-Haq, Wighart von Koenigswald, William J. Sanders, B. Holly Smith & Iyad S. Zalmout<br/><a href="http://dx.doi.org/10.1371/journal.pone.0004366">doi:10.1371/journal.pone.0004366</a></content>
<summary type="html">Gingerich, Philip D., Munir ul-Haq, Wighart von Koenigswald, William J. Sanders, B. Holly Smith & Iyad S. Zalmout<br/><a href="http://dx.doi.org/10.1371/journal.pone.0004366">doi:10.1371/journal.pone.0004366</a></summary>
<link rel="related" type="text/html" href="http://dx.doi.org/10.1371/journal.pone.0004366" title="doi:10.1371/journal.pone.0004366"/>
<link rel="related" type="text/html" href="http://bioguid.info/urn:lsid:zoobank.org:pub:8625FB9A-1FC3-43C3-9A99-7A3CDE0DFC9C" title="urn:lsid:zoobank.org:pub:8625FB9A-1FC3-43C3-9A99-7A3CDE0DFC9C"/>
</entry>

What I'm hoping is that there will be enough links to create something rather like my Elsevier Challenge entry, but with a much more diverse set of sources.

Wednesday, April 08, 2009

Patenting biodiversity tools

The e-Biosphere online forum has a topic entitled Why open source code? Why not patented softwares?. This thread was started by Mauri Åhlberg, who notes that the EOL codebase is open source, and asks "why not patent software." Åhlberg has patented NatureGate (US Patent 7400295), which claims:

In the method of the invention objects can be identified on the basis of location and one or more characteristics in an improved way. The method is performed by means of a user device and a service product offering a service with which objects can be identified. In the method of the invention, the object to be identified is positioned and the position of the object is informed to the service to which the user device has connected. The user of the user device selects one or more characteristics presented by the service for an object to be identified. A message containing the position of the object to be identified and selected characteristic(s) is then sent from the user device to the service product. The service fetches information on the basis of the position of the object and the selected characteristic(s) from a database. The fetched information is presented for the user device in the form of alternative objects to be identified. The system of the invention is characterized by a...

The description seems very general, and I can't see anything that qualifies as novel, but on a quick read suggests that anybody writing, say, an iPhone application to identify an organism based on where you are and what it looks like will be in trouble.

There are other patent applications in this area, such as Managing taxonomic information by the developers of uBio, which claims:

In a management of taxonomic information, a name that specifies an organism is identified. Based on the name and a database of organism names or classifications, another name that specifies the organism and that represents a link between pieces of biological identification information in the database, or a classification for the organism, is determined. Based on the other name or the classification, information associated with the organism is identified.

Now, I'm clearly no expert on software patents, and this is a contentious area, but this strikes me as a little worrying (worst case scenario, biodiversity informatics becomes a victim of patent trolls). One could argue that patents can be used defensively (i.e., to ensure that a technology is not claimed by another, say commercial, party who then limits access based on cost), but I'd like to see a little more discussion of these issues by the biodiversity community.

e-Biosphere Challenge

The e-Biosphere meeting in London June 1-3 has announced a The e-Biosphere 09 Informatics Challenge:

Prepare and present a real-time demonstration during the days of the Conference of the capabilities in your community of practice to discover, disseminate, integrate, and explore new biodiversity-related data by:
Capturing data in private and public databases
Conducting quality assurance on the data by automated validation and/or peer review
Indexing, linking and/or automatically submitting the new data records to other relevant databases
Integrating the data with other databases and data streams
Making these data available to relevant audiences
Make the data and links to the data widely accessible and
Offering interfaces for users to query or explore the data.

The "real time" aspect of the challenge seems a bit forced. I think they originally wanted a "live demo", but now they seem to be happy with a demo that unfolds over the three day meeting, without necessarily literally taking three days (what the organisers term "cooking shows"). I also think cash prizes would have been a good idea (the web site simply says "there will be prizes"). It's not the cash itself that matters, it's the fact that it indicates that the organisers are serious about wanting to attract entries. Entrants are likely to invest more time than they'd recoup in cash.

In any event, given that challenges are a great way to focus the mind on a deadline, I'll be entering the wiki of taxonomic names that I've been working on.