Showing posts with label conservation status. Show all posts
Showing posts with label conservation status. Show all posts

Thursday, October 20, 2011

Reflections on the TDWG RDF "Challenge"

This is a follow up to my previous post TDWG Challenge - what is RDF good for? where I'm being, frankly, a pain in the arse, and asking why we bother with RDF? In many ways I'm not particularly anti-RDF, but it bothers me that there's a big disconnect between the reasons we are going down this route and how we are actually using RDF. In other words, if you like RDF and buy the promise of large-scale data integration while still being decentralised ("the web as database"), then we're doing it wrong.

As an aside, my own perspective is one of data integration. I want to link all this stuff together so I can follow a path through multiple datasets and extract the information I want. In other words, "linked data" (little "l", little "d"). I'm interested in fairly light weight integration, typically through shared identifiers. There is also integration via ontologies, which strikes me as a different, if related, problem, that in many ways is closer to the original vision of the Semantic Web as a giant inference engine. I think the concerns (and experience) of these two communities are somewhat different. I don't particularly care about ontologies, I want key-value pairs and reusable identifiers so I can link stuff together. If, for example, you're working on something like Phenoscape, then I think you have a rather more circumscribed set of data, with potentially complicated interrelationships that you want to make inferences on, in which case ontologies are your friend.

So, I posted a "challenge". It wasn't a challenge so much as a set of RDF to play with. What I'm interested in is seeing how easily we can string this data together to learn stuff. For example, using the RDF I posted earlier here is a table listing the name, conservation status, publication DOI and date, and (where available) image from Wikipedia for frogs with sequences in GenBank.

SpeciesStatusDOIYear describedImage
Atelopus nanayCRhttp://dx.doi.org/10.1655/0018-0831(2002)058[0229:TNSOAA]2.0.CO;22002
Eleutherodactylus mariposaCRhttp://dx.doi.org/10.2307/14669621992
Phrynopus kauneorumCRhttp://dx.doi.org/10.2307/15659932002
Eleutherodactylus eunasterCRhttp://dx.doi.org/10.2307/15630101973
Eleutherodactylus amadeusCRhttp://dx.doi.org/10.2307/14455571987
Eleutherodactylus lamprotesCRhttp://dx.doi.org/10.2307/15630101973
Churamiti maridadiCRhttp://dx.doi.org/10.1080/21564574.2002.96354672002
Eleutherodactylus thorectesCRhttp://dx.doi.org/10.2307/14453811988
Eleutherodactylus apostatesCRhttp://dx.doi.org/10.2307/15630101973
Leptodactylus silvanimbusCRhttp://dx.doi.org/10.2307/15636911980
Eleutherodactylus sciagraphusCRhttp://dx.doi.org/10.2307/15630101973
Bufo chavinCRhttp://dx.doi.org/10.1643/0045-8511(2001)001[0216:NSOBAB]2.0.CO;22001
Eleutherodactylus fowleriCRhttp://dx.doi.org/10.2307/15630101973
Ptychohyla hypomykterCRhttp://dx.doi.org/10.2307/36720601993
Hyla suweonensisDDhttp://dx.doi.org/10.2307/14441381980
Proceratophrys concavitympanumDDhttp://dx.doi.org/10.2307/15654122000
Phrynopus bufoidesDDhttp://dx.doi.org/10.1643/CH-04-278R22005
Boophis periegetesDDhttp://dx.doi.org/10.1111/j.1096-3642.1995.tb01427.x1995
Phyllomedusa duellmaniDDhttp://dx.doi.org/10.2307/14446491982
Boophis liamiDDhttp://dx.doi.org/10.1163/1568538033224407722003
Hyalinobatrachium ignioculusDDhttp://dx.doi.org/10.1670/0022-1511(2003)037[0091:ANSOHA]2.0.CO;22003
Proceratophrys cururuDDhttp://dx.doi.org/10.2307/14477121998
Amolops bellulusDDhttp://dx.doi.org/10.1643/0045-8511(2000)000[0536:ABANSO]2.0.CO;22000
Centrolene bacatumDDhttp://dx.doi.org/10.2307/15645281994
Litoria kumaeDDhttp://dx.doi.org/10.1071/ZO030082004
Phrynopus pesantesiDDhttp://dx.doi.org/10.1643/CH-04-278R22005
Gastrotheca galeataDDhttp://dx.doi.org/10.2307/14436171978
Paratelmatobius cardosoiDDhttp://dx.doi.org/10.2307/14479761999
Rhacophorus catamitusDDhttp://dx.doi.org/10.1655/0733-1347(2002)016[0046:NAPKPF]2.0.CO;22002
Huia melasmaDDhttp://dx.doi.org/10.1643/CH-04-137R32005
Telmatobius vilamensisDDhttp://dx.doi.org/10.1655/0018-0831(2003)059[0253:ANSOTA]2.0.CO;22003
Callulina kisiwamsituENhttp://dx.doi.org/10.1670/209-03A2004
Arthroleptis nikeaeENhttp://dx.doi.org/10.1080/21564574.2003.96354862003
Eleutherodactylus amplinymphaENhttp://dx.doi.org/10.1139/z94-2971994
Eleutherodactylus glaphycompusENhttp://dx.doi.org/10.2307/15630101973
Bufo tacanensisENhttp://dx.doi.org/10.2307/14397001952
Phrynopus brackiENhttp://dx.doi.org/10.2307/14458261990
Telmatobius sibiricusENhttp://dx.doi.org/10.1655/0018-0831(2003)059[0127:ANSOTF]2.0.CO;22003
Cochranella macheENhttp://dx.doi.org/10.1655/03-742004
Eleutherodactylus melacaraENhttp://dx.doi.org/10.2307/14669621992
Plectrohyla glandulosaENhttp://dx.doi.org/10.2307/14410461964
Aglyptodactylus laticepsENhttp://dx.doi.org/10.1111/j.1439-0469.1998.tb00775.x1998
Eleutherodactylus glamyrusENhttp://dx.doi.org/10.2307/15656641997
Gastrotheca trachycepsENhttp://dx.doi.org/10.2307/15643751987
Eleutherodactylus grahamiENhttp://dx.doi.org/10.2307/15639291979
Litoria havinaLChttp://dx.doi.org/10.1071/ZO99302251993
Crinia ripariaLChttp://dx.doi.org/10.2307/14407941965
Litoria longirostrisLChttp://dx.doi.org/10.2307/14431591977
Osteocephalus mutaborLChttp://dx.doi.org/10.1163/1568538023208776092002
Leptobrachium nigropsLChttp://dx.doi.org/10.2307/14409661963
Pseudis tocantinsLChttp://dx.doi.org/10.1590/S0101-817519980004000111998
Mantidactylus argenteusLChttp://dx.doi.org/10.1111/j.1096-3642.1919.tb02128.x1919
Aglyptodactylus securiferLChttp://dx.doi.org/10.1111/j.1439-0469.1998.tb00775.x1998
Pseudis cardosoiLChttp://dx.doi.org/10.1163/1568538005072642000
Uperoleia inundataLChttp://dx.doi.org/10.1071/AJZS0791981
Litoria pronimiaLChttp://dx.doi.org/10.1071/ZO99302251993
Litoria paraewingiLChttp://dx.doi.org/10.1071/ZO97602831976
Philautus aurifasciatusLChttp://dx.doi.org/10.1163/156853887X000361987
Proceratophrys avelinoiLChttp://dx.doi.org/10.1163/156853893X001561993
Osteocephalus deridensLChttp://dx.doi.org/10.1163/1568538005075252000
Gephyromantis boulengeriLChttp://dx.doi.org/10.1111/j.1096-3642.1919.tb02128.x1919
Crossodactylus caramaschiiLChttp://dx.doi.org/10.2307/14469071995
Rana yavapaiensisLChttp://dx.doi.org/10.2307/14453381984
Boophis lichenoidesLChttp://dx.doi.org/10.1163/156853898X000251998
Megistolotis lignariusLChttp://dx.doi.org/10.1071/ZO97901351979
Ansonia endauensisNEhttp://dx.doi.org/10.1655/0018-0831(2006)62[466:ANSOAS]2.0.CO;22006
Ansonia kraensisNEhttp://dx.doi.org/10.2108/zsj.22.8092005
Arthroleptella landdrosiaNThttp://dx.doi.org/10.2307/15653592000
Litoria jungguyNThttp://dx.doi.org/10.1071/ZO020692004
Phrynobatrachus phyllophilusNThttp://dx.doi.org/10.2307/15659252002
Philautus ingeriVUhttp://dx.doi.org/10.1163/156853887X000361987
Gastrotheca dendronastesVUhttp://dx.doi.org/10.2307/14450881983
Hyperolius cystocandicansVUhttp://dx.doi.org/10.2307/14439111977
Boophis sambiranoVUhttp://dx.doi.org/10.1080/21564574.2005.96355202005
Ansonia torrentisVUhttp://dx.doi.org/10.1163/156853883X000211983
Telmatobufo australisVUhttp://dx.doi.org/10.2307/15630861972
Stefania coxiVUhttp://dx.doi.org/10.1655/0018-0831(2002)058[0327:EDOSAH]2.0.CO;22002
Oreolalax multipunctatusVUhttp://dx.doi.org/10.2307/15648281993
Eleutherodactylus guantanameraVUhttp://dx.doi.org/10.2307/14669621992
Spicospina flammocaeruleaVUhttp://dx.doi.org/10.2307/14477571997
Cycloramphus acangatanVUhttp://dx.doi.org/10.1655/02-782003
Leiopelma pakekaVUhttp://dx.doi.org/10.1080/03014223.1998.95175541998
Rana okaloosaeVUhttp://dx.doi.org/10.2307/14448471985
Phrynobatrachus uzungwensisVUhttp://dx.doi.org/10.1163/156853883X000301983


This is a small fraction of the frog species actually in GenBank because I've filtered it down to those that have been linked to Wikipedia (from where we get the conservation status) and which were described in papers with DOIs (from which we get the date of description).

I generated this result using this SPARQL query on a triple store that had the primary data sources (Uniprot, Dbpedia, CrossRef, ION) loaded, together with the all-important "glue" datasets that link ION to CrossRef, and Uniprot to Dbpedia (see previous post for details):


PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
PREFIX uniprot: <http://purl.uniprot.org/core/>
PREFIX tdwg_tn: <http://rs.tdwg.org/ontology/voc/TaxonName#>
PREFIX tdwg_co: <http://rs.tdwg.org/ontology/voc/Common#>
PREFIX dcterms: <http://purl.org/dc/terms/>

SELECT ?name ?status ?doi ?date ?thumbnail
WHERE {
?ncbi uniprot:scientificName ?name .
?ncbi rdfs:seeAlso ?dbpedia .
?dbpedia dbpedia-owl:conservationStatus ?status .
?ion tdwg_tn:nameComplete ?name .
?ion tdwg_co:publishedInCitation ?doi .
?doi dcterms:date ?date .

OPTIONAL
{
?dbpedia dbpedia-owl:thumbnail ?thumbnail
}
}
ORDER BY ASC(?status)


This table doesn't tell us a great deal, but we could, for example, graph date of description against conservation status (CR=critical, EN=endangered, VU=vulnerable, NT=not threatened, LC=least concern, DD=data deficient):
Chart
In other words, is it the case that more recently described species are more likely to be endangered than taxa we've known about for some time (based on the assumption that we've found all the common species already)? We could imagine extending this query to retrieve sequences for a class of frog (e.g., critically endangered) so we could compute a measure population genetic variation, etc. We shouldn't take the graph above too seriously because it's based on small fraction of the data, but you get the idea. As more frog taxonomy goes online (there's a lot of stuff in BHL and BioStor, for example) we could add more dates and build a dataset worth analysing properly.

It seems to me that these should be fairly simple things to do, yet they are the sort of thing that if we attempt today it's a world of hurt involving scripts, Excel, data cleaning, etc. before we can do the science.

The thing is, without the "glue" files mapping identifiers across different databases even this simple query isn't possible. Obviously we have no say in how many organisations publish RDF, but within the biodiversity informatics community we should make every effort to use external identifiers wherever possible so that we can make these links. This is the core of my complaint. If we are using RDF to foster data integration so we can query across the diverse data sets that speak to biodiversity, then we are doing it wrong.

Update
Here is a nice visualisation of this dataset from @orovellotti (original here), made using ecoRelevé:

AcNbdh2CMAA3ysc png large