Agile Descriptions Tony Hammond Nature Publishing Group [email_address]
Web 2.0 Stuff Some topics to consider:  Web Feeds Markup Tagging Collaborative Filtering Social Networking Text Mining
Web Feeds  Feeds = news, blogs, podcasts, ... Feed formats are RSS / Atom Feeds are just  Rivers of Metadata XML documents polled by subscribers <item> / <entry> =   <title>, <link>, <description> / <summary> Administrative metadata Descriptive metadata (dc, prism, mods, ...) Common description model - RDF
Markup  Two main approaches to adding semantics to documents: Embedded Metadata Content hidden from user ‘ Blob’ added to page content Exposed Metadata Microformats (  F’s) RDFa = RDF w/attributes eRDF = embedded RDF
Embedded Metadata (i) HTTP headers - exchanged in request HTTP entities - in GET, PUT, POST HTML <head> <meta> HTML <head> / <body> HTML comments HTML comments  as RDF ‘islands’
Embedded Metadata (ii) Binary data: XMP PDF’s (and arbitrary binary files) EXIF JPEG’s (digital photos) ID3 tags MP3’s (audio files)
Exposed Metadata -   F’s   F’s qualify display content  F’s add semantic tier to existing structure  F’s limit search indexing abuse  F = nested markup layered on top of host   markup environment (HTML/XML) Examples: rel-license, rel-tag, ... hCard, hCalendar, hReview, xFolk, XFN, ...
 F’s Design patterns with semantics Content becomes ‘live’ Can replace custom APIs A poorboy semantic web XMDP profiles GRDDL (W3C Draft - Last Call)  F -> RDF
rel-license - Licenses <a  rel=&quot;license&quot;   href=&quot; http://.../licenses &quot; > CC </a> Target of URI is license for current page
rel-tag - Tags <a  rel=&quot;tag&quot;   href=&quot; http://.../tag/microformats &quot; > uF’s </a> Target of URI is tag for current page Last URI path element is tag Preceding URI string is tag space
XFN – Social Networks Example: Ingrid’s blogroll: <a href=&quot; http://.../... &quot;  rel=&quot;friend met&quot; >Josh</a> <a href=&quot; http://.../... &quot;  rel=&quot;met acquaintance&quot; >Kat</a> <a href=&quot; http://.../... &quot;  rel=&quot;co-worker friend met&quot; >Mary</a> <a href=&quot; http://.../... &quot;> Nick </a> Target of URI is typically blog or personal site List of rel attributes indicates personal relationship
XFN (i)
XFN (ii) #1 #4 #2 #3
hCard - Addresses <div  class=&quot;vcard&quot; > <a  class=&quot;url fn&quot;  href=&quot; http://.../... &quot; > Tony Hammond </a> <span  class=&quot;org&quot; > Nature Publishing Group </span> <span  class=&quot;tel&quot; > <span  class=&quot;type&quot; > work </span> <span  class=&quot;value&quot; > +44-20-7843-4659 </span> </span> </div> Use vCard (RFC 2426) as basis Nest vCard objects into nested XHTML Make vCard object/property names lcase
hCalendar - Events <div  class=&quot;vevent&quot; > <a  class=&quot;url&quot;  href=&quot; http://.../... &quot; > <span  class=&quot;summary&quot; > Future of ... </span> <abbr  class=&quot;dtstart&quot; title=&quot; 2007-03-08 &quot; > March 8 </abbr> <abbr  class=&quot;dtend&quot; title=&quot; 2007-03-08 &quot; > March 8 </abbr> at <span  class=“location&quot; > Google, Inc. </span> </a> </div> Use iCalendar (RFC 2445) as basis Nest iCalendar objects into nested XHTML Make iCalendar object/property names lcase
hReview - Reviews <li  class=&quot;hreview&quot; > <h3  class=&quot;item summary&quot; ><a href=&quot;/reviews/programming/web+standards&quot;  class=&quot;fn&quot;   rel=&quot;self bookmark&quot; > Web standards </a></h3> <div  class=&quot;reviewer vcard&quot; > <p  class=&quot;icon&quot; > <a href=&quot;/users/katemonkey&quot;><img src=&quot;/images/icons/katemonkey.png&quot; class=&quot;photo&quot;/></a> </p> <p  class=&quot;description&quot; > Web standards help designers and developers create the pedantic web. </p> <p  class=&quot;by-line&quot; > a  <a href=&quot;/reviews/programming&quot;> programming </a>  review by   <a href=&quot;/users/katemonkey&quot;  class=&quot;fn url&quot; > katemonkey </a> that was written <a title=&quot;2007-01-27 11:09:21&quot;  class=&quot;dtreviewed&quot;  href=&quot;/reviews/date/2007/01/27/&quot;> on 27 th  January </a>  and has been rated by 3 users as a good review. </p> </div> </li>
xFolk - Bookmarks <div  class=&quot;folkentry&quot; > <div> <a  class=&quot;taggedlink&quot;  href=&quot; http://.../... &quot; > bookmark </a> </div> <div  class=&quot;description&quot; > A description ... </div> <div  class=&quot;meta&quot; > <a  rel=&quot;tag&quot;  href=&quot; http://.../tag/tag &quot; > tag </a> </div> </div>
hcite - Citations <li  class=&quot;hcite&quot;  xml:lang=&quot;en&quot;> <div  class=&quot;publisher vcard&quot; >   <!-- publisher data as hCard --> <span  class=&quot;fn org&quot; > ABC Publishing Co. </span> <span  class=&quot;country-name&quot; > United Kingdom </span> ... </div> <div  class=&quot;creator vcard&quot; >   <!-- author(s) data as hCard --> <span  class=&quot;fn n&quot; > <span  class=&quot;given-name&quot; > John   <span  class=&quot;family-name&quot; > Doe </span></span> ... </div> <span  class=&quot;fn&quot; > Foobar! </span> <span  class=&quot;description&quot; > World Class Book about foobar </span> <span  class=&quot;volume&quot; > 1 </span> <span  class=&quot;issue&quot; > 1 </span> <span  class=&quot;pages&quot; > 1-10 </span> <span  class=&quot;format&quot; > article </span> <span  class=&quot;identifier&quot; > 12345678 </span> <a  class=&quot;keyword&quot; rel=&quot;tag&quot;  href=&quot; /tags/foo &quot;> foo </a> Published <abbr  class=&quot;dtpublished&quot;  title=&quot; 20060101 &quot;> January 1st 2006 </abbr> Copyright <abbr  class=&quot;copyright&quot;  title=&quot; 20060101 &quot;> 2006 </abbr> </li>
Tagging  Tags as simple labels Tags as personal ‘aides-memoire’  Bottom-up rather than top-down Just-in-time vs just-in-case Folksonomies user categories group tag clouds, clustering
Tag Namespaces  Tags as simple text tokens Tags as globally unique URIs Naming authority issues DNS- & Registry-based authorities Persistence Per application namespaces Per user, per application namespaces
Tag Terms Any lessons from dynamic vs static typed programming languages? Static compilation provides for type safety Dynamic compilation allows typing on demand Unit tests substitute for type checking at compilation time
Flickr / Photo Clusters
Connotea / Many Eyes
Unalog / Starlight Tag “python” Tag “iceland”
Tagging Tools Let’s look at some cases: del.icio.us  – general tagging tool Flickr  – photo tagging tool CiteULike  – scolarly tagging Connotea  – scholarly tagging  Unalog  – library tagging tool HubMed  – an interface to PubMed
del.icio.us <item rdf:about=&quot; http://orlabs.oclc.org/Identities/ &quot;> <title> WorldCat Identities </title> <link> http://orlabs.oclc.org/Identities/ </link> <description> pages for people </description> <dc:creator> hublicious </dc:creator> <dc:date> 2007-02-14T10:37:46Z </dc:date> <dc:subject> person </dc:subject> <dc:subject> identifier </dc:subject> <dc:subject> oclc </dc:subject> <taxo:topics> <rdf:Bag> <rdf:li resource=&quot; http://del.icio.us/tag/person &quot;/> <rdf:li resource=&quot; http://del.icio.us/tag/identifier &quot;/> <rdf:li resource=&quot; http://del.icio.us/tag/oclc &quot;/> </rdf:Bag> </taxo:topics> </item>
Flickr <entry> <title> The Eclipsified Moon </title> <link rel=&quot;alternate&quot; type=&quot;text/html“   href=&quot; http://www.flickr.com/photos/alf/409522855/ &quot;/> <id> tag:flickr.com,2005:/photo/409522855 </id> <published> 2007-03-04T04:53:33Z </published> <updated> 2007-03-04T04:53:33Z </updated> <dc:date.Taken> 2007-03-03T23:34:23-08:00 </dc:date.Taken> <content type=&quot;html”> ... </content> <author> <name> balloon in a sock </name> <uri> http://www.flickr.com/people/alf/ </uri> </author> <link rel=&quot;license&quot; type=&quot;text/html&quot; href=&quot; http://.../licenses/by-nc/2.0/ &quot; /> <link rel=&quot;enclosure&quot; type=&quot;image/jpeg“   href=&quot; http://.../133/409522855_6ed97bbd2a_o.jpg &quot; /> <category term=&quot; moon &quot; scheme=&quot; http://www.flickr.com/photos/tags/ &quot; /> <category term=&quot; eclipse &quot; scheme=&quot; http://www.flickr.com/photos/tags/ &quot; /> </entry>
Cite U Like  CiteULike (i)
Cite U Like  CiteULike (ii) <item rdf:about= http://www.citeulike.org/user/timflutre/article/1139879 > <title> Dating the tree of life. </title> <link> http://www.citeulike.org/user/timflutre/article/1139879 </link> <description> ... </description> <dc:title> Dating the tree of life. </dc:title> <dc:creator> MJ Benton </dc:creator> <dc:creator> FJ Ayala </dc:creator> <dc:identifier> doi:10.1126/science.1077795 </dc:identifier> <dc:source> Science, Vol. 300, No. 5626. (13 June 2003), pp. 1698-1700. </dc:source> <prism:publicationName> Science </prism:publicationName> <prism:issn> 1095-9203 </prism:issn> <prism:volume> 300 </prism:volume> <prism:number> 5626 </prism:number> <prism:startingPage> 1698 </prism:startingPage> <prism:endingPage> 1700 </prism:endingPage> <category> evolution </category> <category> molecular_clock </category> </item>
Connotea (i) <item rdf:about=&quot; http://.../user/penguin/uri/ceac242dfd66979fe07ac5ebf8381afc &quot;> <title> Induction of IL-10 suppressors in lung transplant patients by CD4+25+   regulatory T cells through CTLA-4 signaling. </title> <link> http://.../user/penguin/uri/ceac242dfd66979fe07ac5ebf8381afc </link> <description> Posted by penguin to test on Wed Feb 21 2007 </description> <dc:creator> penguin </dc:creator> <dc:date> 2007-02-21T10:22:00Z </dc:date> <dc:subject> test </dc:subject> <slash:comments> 0 </slash:comments> <content:encoded><![CDATA[ ... ]]></content:encoded> <connotea:uri> <dcTerms:URI/> </connotea:uri> <annotate:reference rdf:resource=&quot; http://.../comments/uri/ceac242dfd66979fe07ac5ebf8381afc &quot;/> </item>
Connotea (ii) <dcterms:URI rdf:about=&quot; http://www.hubmed.org/display.cgi?uids=17015751 &quot;> <dc:title> Induction of IL-10 suppressors in lung transplant patients by CD4+25+   regulatory T cells through CTLA-4 signaling. </dc:title> <dc:creator> Ankit Bharat </dc:creator> <dc:creator> Ryan Fields </dc:creator> <dc:creator> Elbert Trulock </dc:creator> <dc:creator> G Patterson </dc:creator> <dc:creator> Thalachallour Mohanakumar </dc:creator> <dc:identifier> info:pmid/17015751 </dc:identifier> <dc:date> 2006-10-15 </dc:date> <prism:publicationName> Journal of immunology (Baltimore, Md. :   1950) </prism:publicationName> <prism:issn></prism:issn> <prism:volume> 177 </prism:volume> <prism:number> 8 </prism:number> <prism:startingPage> 5631 </prism:startingPage> <prism:endingPage> 5638 </prism:endingPage> </dcterms:URI>
Unalog (i)
Unalog (ii) <mods> <titleInfo><title> 3quarksdaily </title></titleInfo> <typeOfResource> text </typeOfResource> <physicalDescription> <form authority=&quot;marcform&quot;> electronic </form> </physicalDescription> <location> <url> http://.../2006/07/random_walks_he.html </url> </location> <note> Mode of access: Internet. </note> <subject> <topic> jackchick </topic> <topic> fear </topic> <topic> altercall </topic> </subject> <recordInfo> ... </recordInfo> </mods>
HubMed (i) <item rdf:about=&quot; info:pmid/17015751 &quot;> <rdf:type rdf:resource=&quot; http://purl.oclc.org/NET/nknouf/ns/bibtex#Article &quot;/> <title> Induction of IL-10 suppressors in lung transplant patients by CD4+25+   regulatory T cells through CTLA-4 signaling. </title> <link> http://www.hubmed.org/display.cgi?uids=17015751 </link> <description> stored by aorchid </description> <dc:date> 2006 </dc:date> <dc:title> Induction of IL-10 suppressors in lung transplant patients by CD4+25+   regulatory T cells through CTLA-4 signaling. </dc:title> <dc:subject> bos </dc:subject> <dc:subject> treg </dc:subject> </item> <tags:Tagging> ... <tags:taggedResource rdf:resource=&quot; info:pmid/17015751 &quot;/> ... </tags:Tagging>
HubMed (ii) <tags:Tagging rdf:about=&quot; http://.../tags/users/aorchid/item/17015751 &quot;> <tags:taggedBy rdf:resource=&quot; http://.../tags/users#aorchid &quot;/> <tags:taggedOn></tags:taggedOn> <tags:taggedResource rdf:resource=&quot; info:pmid/17015751 &quot;/> <tags:taggedWithTag> <tags:Tag rdf:about=&quot; http://.../tags/users/aorchid/tags#bos &quot;> <tags:tagName> bos </tags:tagName> </tags:Tag> </tags:taggedWithTag> <tags:taggedWithTag> <tags:Tag rdf:about=&quot; http://.../tags/users/aorchid/tags#treg &quot;> <tags:tagName> treg </tags:tagName> </tags:Tag> </tags:taggedWithTag> </tags:Tagging>
Collaborative Filtering Amazon Wikipedia Last.fm Technorati Digg Postgenomic Dissect Medicine
Postgenomic
Chemical blogspace
Dissect Medicine
Dissect Medicine
Social Networking MySpace   YouTube Facebook Friendster LinkedIn Nature   Network
Nature Network
Nature Network
Nature Network
Nature Network
Nature Network
Nature Network
Nature Network
Nature Network
Text Mining OTMI
OTMI Open Text Mining Interface [email_address] nature.com/otmi Repository *.tar.gz, *.otmi - content *.opml - navigation   opentextmining.org/ Wiki Resources (draft spec, scripts)
OTMI - Goals Publish machine-readable full text to Enable text mining Allow document categorization Allow domain entities (e.g. chemical compounds, genomes, etc) to be mapped Entity maps can be published in turn and related back to original document
OTMI – Details OTMI = Atom  <entry>  document OTMI += document sections  OTMI += vectors (word frequencies) OTMI += ‘snippets’   (nonlinear full text) OTMI += references (with DOI) OTMI filters stopwords (e.g. NLM)
OTMI – Entry (i) <atom:entry xmlns:otmi=‘...’ xmlns:prism=‘...‘ xmlns:atom=‘...'> <atom:title>Structural biology Dangerous liaisons on neurons</atom:title> <atom:author> <atom:name>Giampietro Schiavo</atom:name> </atom:author> <atom:id>info:doi/10.1038/nature05410</atom:id> <atom:link href='http://dx.doi.org/10.1038/nature05410' /> <atom:link href='http://.../nature/journal/v444/n7122/otmi/nature05410.otmi‘   rel='self' /> <atom:link href='http://opentextmining.org/' rel='related' /> <atom:published>2006-12-21T00:00:00Z</atom:published> <atom:updated>2006-12-21T00:00:00Z</atom:updated> <atom:rights type='html'>(c) 2006 Nature Publishing Group</atom:rights> <prism:publicationName>Nature</prism:publicationName> <prism:volume>444</prism:volume> <prism:number>7122</prism:number> <prism:startingPage>1019</prism:startingPage> <prism:endingPage>1020</prism:endingPage> <prism:issn>0028-0836</prism:issn> <prism:eIssn /> <otmi:data/> </atom:entry>
OTMI – Entry (ii) <otmi:data> <otmi:stoplist href='http://www.nature.com/nature/journal/v444/n7122/otmi/otmi-stoplist.xml' /> <otmi:section name='body'> <otmi:section name='other'> <otmi:vectors> ... </otmi:vectors> <otmi:snippets> ... </otmi:snippets> </otmi:section> </otmi:section> <otmi:figure> ... </otmi:figure> <otmi:references> ... </otmi:references> </otmi:data>
OTMI – Entry (ii) / Vectors <otmi:vectors> <otmi:split-regex>(?-mix:\s+\W+|\W+\s+|\s+|\/)</otmi:split-regex> ... <otmi:vector count='9'>vesicles</otmi:vector> <otmi:vector count='8'>al</otmi:vector> <otmi:vector count='8'>et</otmi:vector> <otmi:vector count='8'>protein</otmi:vector> <otmi:vector count='8'>synaptic</otmi:vector> <otmi:vector count='8'>vesicle</otmi:vector> <otmi:vector count='7'>chain</otmi:vector> <otmi:vector count='7'>neuron</otmi:vector> ... </otmi:vectors>
OTMI – Entry (iii) <otmi:data> <otmi:stoplist href='http://www.nature.com/nature/journal/v444/n7122/otmi/otmi-stoplist.xml' /> <otmi:section name='body'> <otmi:section name='other'> <otmi:vectors> ... </otmi:vectors> <otmi:snippets> ... </otmi:snippets> </otmi:section> </otmi:section> <otmi:figure> ... </otmi:figure> <otmi:references> ... </otmi:references> </otmi:data>
OTMI – Entry (iii) / Snippets <otmi:snippets> <otmi:split-regex>(?-mix:\.\s+(?=[A-Z]))</otmi:split-regex> ... <otmi:snippet>The amino acids lining this cleft are very similar to those found in BoNT/G (ref. 10 ), but differ in the other toxin family members, which explains why different BoNTs recognize distinct protein receptors</otmi:snippet> <otmi:snippet>The model predicts that the interaction of BoNTs with both PSGs and protein receptors is necessary to explain their awesome potency , with a different protein receptor being recognized by each BoNT</otmi:snippet> <otmi:snippet>The rigid character of this interaction might be further enhanced by the association of the toxins heavy chain with nearby negatively charged lipid molecules, which play an accessory role in stabilizing the toxin on membranes </otmi:snippet> <otmi:snippet>The simplest possibility is that BoNT/B binds to PSGs and synaptotagmin within the lumen of a synaptic vesicle that is fused to the neuron membrane</otmi:snippet> <otmi:snippet>The toxins then escape from the vesicle lumen when the vesicles are acidified as they reload with neurotransmitters</otmi:snippet> <otmi:snippet>The two binding sites would firmly anchor the tip of BoNT/B to the vesicles inner surface, constraining the toxins mobility</otmi:snippet> ... </otmi:snippets>
OTMI – Entry (iv) <otmi:data> <otmi:stoplist href='http://www.nature.com/nature/journal/v444/n7122/otmi/otmi-stoplist.xml' /> <otmi:section name='body'> <otmi:section name='other'> <otmi:vectors> ... </otmi:vectors> <otmi:snippets> ... </otmi:snippets> </otmi:section> </otmi:section> <otmi:figure> ... </otmi:figure> <otmi:references> ... </otmi:references> </otmi:data>
OTMI – Entry (iv) / Figs <otmi:figure> <otmi:title> <otmi:reduced-text>Possible binding sites botulinum neurotoxin B (BoNT/B) neurons. Crystal studies Jin et al . Chai et al . suggest BoNT/B invades neurons stowing away carriers known synaptic vesicles. forming complex lipid molecules (polysialogangliosides, PSGs) vesicle protein ( synaptotagmin or synaptotagmin II) neuronal membrane. complex stabilized interactions neighbouring acidic lipid molecules (orange). BoNT/B enter open vesicles neurons membrane, one three possible sequences. , BoNT/B enters vesicle directly forms required complex. b , BoNT/B binds first PSGs membrane, transferred synaptic vesicle containing synaptotagmin. c , BoNT/B forms full complex membrane, synaptotagmin left behind inaccurate vesicle recycling. transferred lumen vesicle.</otmi:reduced-text> </otmi:title> <otmi:caption> <otmi:reduced-text>Possible binding sites botulinum neurotoxin B (BoNT/B) neurons. Crystal studies Jin et al . Chai et al . suggest BoNT/B invades neurons stowing away carriers known synaptic vesicles. forming complex lipid molecules (polysialogangliosides, PSGs) vesicle protein ( synaptotagmin or synaptotagmin II) neuronal membrane. complex stabilized interactions neighbouring acidic lipid molecules (orange). BoNT/B enter open vesicles neurons membrane, one three possible sequences. , BoNT/B enters vesicle directly forms required complex. b , BoNT/B binds first PSGs membrane, transferred synaptic vesicle containing synaptotagmin. c , BoNT/B forms full complex membrane, synaptotagmin left behind inaccurate vesicle recycling. transferred lumen vesicle.</otmi:reduced-text> </otmi:caption> </otmi:figure>
OTMI – Entry (v) <otmi:data> <otmi:stoplist href='http://www.nature.com/nature/journal/v444/n7122/otmi/otmi-stoplist.xml' /> <otmi:section name='body'> <otmi:section name='other'> <otmi:vectors> ... </otmi:vectors> <otmi:snippets> ... </otmi:snippets> </otmi:section> </otmi:section> <otmi:figure> ... </otmi:figure> <otmi:references> ... </otmi:references> </otmi:data>
OTMI – Entry (v) / Refs <otmi:references> <otmi:ref-id>info:doi/10.1038/nature05387</otmi:ref-id> <otmi:ref-id>info:doi/10.1038/nature05411</otmi:ref-id> <otmi:ref-id>info:doi/10.1016/0968-0004(86)90282-3</otmi:ref-id> <otmi:ref-id>info:doi/10.1016/0014-5793(95)01471-3</otmi:ref-id> <otmi:ref-id>info:doi/10.1083/jcb.200305098</otmi:ref-id> <otmi:ref-id>info:doi/10.1074/jbc.M403945200</otmi:ref-id> <otmi:ref-id>info:doi/10.1126/science.1123654</otmi:ref-id> <otmi:ref-id>info:doi/10.1016/j.febslet.2006.02.074</otmi:ref-id> <otmi:ref-id>info:doi/10.1083/jcb.200508170</otmi:ref-id> <otmi:refs-noid>3</otmi:refs-noid> </otmi:references>
snoitpircseD eligA Tony Hammond Nature Publishing Group [email_address]

Agile Descriptions

  • 1.
    Agile Descriptions TonyHammond Nature Publishing Group [email_address]
  • 2.
    Web 2.0 StuffSome topics to consider: Web Feeds Markup Tagging Collaborative Filtering Social Networking Text Mining
  • 3.
    Web Feeds Feeds = news, blogs, podcasts, ... Feed formats are RSS / Atom Feeds are just Rivers of Metadata XML documents polled by subscribers <item> / <entry> = <title>, <link>, <description> / <summary> Administrative metadata Descriptive metadata (dc, prism, mods, ...) Common description model - RDF
  • 4.
    Markup Twomain approaches to adding semantics to documents: Embedded Metadata Content hidden from user ‘ Blob’ added to page content Exposed Metadata Microformats (  F’s) RDFa = RDF w/attributes eRDF = embedded RDF
  • 5.
    Embedded Metadata (i)HTTP headers - exchanged in request HTTP entities - in GET, PUT, POST HTML <head> <meta> HTML <head> / <body> HTML comments HTML comments as RDF ‘islands’
  • 6.
    Embedded Metadata (ii)Binary data: XMP PDF’s (and arbitrary binary files) EXIF JPEG’s (digital photos) ID3 tags MP3’s (audio files)
  • 7.
    Exposed Metadata -  F’s  F’s qualify display content  F’s add semantic tier to existing structure  F’s limit search indexing abuse  F = nested markup layered on top of host markup environment (HTML/XML) Examples: rel-license, rel-tag, ... hCard, hCalendar, hReview, xFolk, XFN, ...
  • 8.
     F’s Designpatterns with semantics Content becomes ‘live’ Can replace custom APIs A poorboy semantic web XMDP profiles GRDDL (W3C Draft - Last Call)  F -> RDF
  • 9.
    rel-license - Licenses<a rel=&quot;license&quot; href=&quot; http://.../licenses &quot; > CC </a> Target of URI is license for current page
  • 10.
    rel-tag - Tags<a rel=&quot;tag&quot; href=&quot; http://.../tag/microformats &quot; > uF’s </a> Target of URI is tag for current page Last URI path element is tag Preceding URI string is tag space
  • 11.
    XFN – SocialNetworks Example: Ingrid’s blogroll: <a href=&quot; http://.../... &quot; rel=&quot;friend met&quot; >Josh</a> <a href=&quot; http://.../... &quot; rel=&quot;met acquaintance&quot; >Kat</a> <a href=&quot; http://.../... &quot; rel=&quot;co-worker friend met&quot; >Mary</a> <a href=&quot; http://.../... &quot;> Nick </a> Target of URI is typically blog or personal site List of rel attributes indicates personal relationship
  • 12.
  • 13.
    XFN (ii) #1#4 #2 #3
  • 14.
    hCard - Addresses<div class=&quot;vcard&quot; > <a class=&quot;url fn&quot; href=&quot; http://.../... &quot; > Tony Hammond </a> <span class=&quot;org&quot; > Nature Publishing Group </span> <span class=&quot;tel&quot; > <span class=&quot;type&quot; > work </span> <span class=&quot;value&quot; > +44-20-7843-4659 </span> </span> </div> Use vCard (RFC 2426) as basis Nest vCard objects into nested XHTML Make vCard object/property names lcase
  • 15.
    hCalendar - Events<div class=&quot;vevent&quot; > <a class=&quot;url&quot; href=&quot; http://.../... &quot; > <span class=&quot;summary&quot; > Future of ... </span> <abbr class=&quot;dtstart&quot; title=&quot; 2007-03-08 &quot; > March 8 </abbr> <abbr class=&quot;dtend&quot; title=&quot; 2007-03-08 &quot; > March 8 </abbr> at <span class=“location&quot; > Google, Inc. </span> </a> </div> Use iCalendar (RFC 2445) as basis Nest iCalendar objects into nested XHTML Make iCalendar object/property names lcase
  • 16.
    hReview - Reviews<li class=&quot;hreview&quot; > <h3 class=&quot;item summary&quot; ><a href=&quot;/reviews/programming/web+standards&quot; class=&quot;fn&quot; rel=&quot;self bookmark&quot; > Web standards </a></h3> <div class=&quot;reviewer vcard&quot; > <p class=&quot;icon&quot; > <a href=&quot;/users/katemonkey&quot;><img src=&quot;/images/icons/katemonkey.png&quot; class=&quot;photo&quot;/></a> </p> <p class=&quot;description&quot; > Web standards help designers and developers create the pedantic web. </p> <p class=&quot;by-line&quot; > a <a href=&quot;/reviews/programming&quot;> programming </a> review by <a href=&quot;/users/katemonkey&quot; class=&quot;fn url&quot; > katemonkey </a> that was written <a title=&quot;2007-01-27 11:09:21&quot; class=&quot;dtreviewed&quot; href=&quot;/reviews/date/2007/01/27/&quot;> on 27 th January </a> and has been rated by 3 users as a good review. </p> </div> </li>
  • 17.
    xFolk - Bookmarks<div class=&quot;folkentry&quot; > <div> <a class=&quot;taggedlink&quot; href=&quot; http://.../... &quot; > bookmark </a> </div> <div class=&quot;description&quot; > A description ... </div> <div class=&quot;meta&quot; > <a rel=&quot;tag&quot; href=&quot; http://.../tag/tag &quot; > tag </a> </div> </div>
  • 18.
    hcite - Citations<li class=&quot;hcite&quot; xml:lang=&quot;en&quot;> <div class=&quot;publisher vcard&quot; > <!-- publisher data as hCard --> <span class=&quot;fn org&quot; > ABC Publishing Co. </span> <span class=&quot;country-name&quot; > United Kingdom </span> ... </div> <div class=&quot;creator vcard&quot; > <!-- author(s) data as hCard --> <span class=&quot;fn n&quot; > <span class=&quot;given-name&quot; > John <span class=&quot;family-name&quot; > Doe </span></span> ... </div> <span class=&quot;fn&quot; > Foobar! </span> <span class=&quot;description&quot; > World Class Book about foobar </span> <span class=&quot;volume&quot; > 1 </span> <span class=&quot;issue&quot; > 1 </span> <span class=&quot;pages&quot; > 1-10 </span> <span class=&quot;format&quot; > article </span> <span class=&quot;identifier&quot; > 12345678 </span> <a class=&quot;keyword&quot; rel=&quot;tag&quot; href=&quot; /tags/foo &quot;> foo </a> Published <abbr class=&quot;dtpublished&quot; title=&quot; 20060101 &quot;> January 1st 2006 </abbr> Copyright <abbr class=&quot;copyright&quot; title=&quot; 20060101 &quot;> 2006 </abbr> </li>
  • 19.
    Tagging Tagsas simple labels Tags as personal ‘aides-memoire’ Bottom-up rather than top-down Just-in-time vs just-in-case Folksonomies user categories group tag clouds, clustering
  • 20.
    Tag Namespaces Tags as simple text tokens Tags as globally unique URIs Naming authority issues DNS- & Registry-based authorities Persistence Per application namespaces Per user, per application namespaces
  • 21.
    Tag Terms Anylessons from dynamic vs static typed programming languages? Static compilation provides for type safety Dynamic compilation allows typing on demand Unit tests substitute for type checking at compilation time
  • 22.
  • 23.
  • 24.
    Unalog / StarlightTag “python” Tag “iceland”
  • 25.
    Tagging Tools Let’slook at some cases: del.icio.us – general tagging tool Flickr – photo tagging tool CiteULike – scolarly tagging Connotea – scholarly tagging Unalog – library tagging tool HubMed – an interface to PubMed
  • 26.
    del.icio.us <item rdf:about=&quot;http://orlabs.oclc.org/Identities/ &quot;> <title> WorldCat Identities </title> <link> http://orlabs.oclc.org/Identities/ </link> <description> pages for people </description> <dc:creator> hublicious </dc:creator> <dc:date> 2007-02-14T10:37:46Z </dc:date> <dc:subject> person </dc:subject> <dc:subject> identifier </dc:subject> <dc:subject> oclc </dc:subject> <taxo:topics> <rdf:Bag> <rdf:li resource=&quot; http://del.icio.us/tag/person &quot;/> <rdf:li resource=&quot; http://del.icio.us/tag/identifier &quot;/> <rdf:li resource=&quot; http://del.icio.us/tag/oclc &quot;/> </rdf:Bag> </taxo:topics> </item>
  • 27.
    Flickr <entry> <title>The Eclipsified Moon </title> <link rel=&quot;alternate&quot; type=&quot;text/html“ href=&quot; http://www.flickr.com/photos/alf/409522855/ &quot;/> <id> tag:flickr.com,2005:/photo/409522855 </id> <published> 2007-03-04T04:53:33Z </published> <updated> 2007-03-04T04:53:33Z </updated> <dc:date.Taken> 2007-03-03T23:34:23-08:00 </dc:date.Taken> <content type=&quot;html”> ... </content> <author> <name> balloon in a sock </name> <uri> http://www.flickr.com/people/alf/ </uri> </author> <link rel=&quot;license&quot; type=&quot;text/html&quot; href=&quot; http://.../licenses/by-nc/2.0/ &quot; /> <link rel=&quot;enclosure&quot; type=&quot;image/jpeg“ href=&quot; http://.../133/409522855_6ed97bbd2a_o.jpg &quot; /> <category term=&quot; moon &quot; scheme=&quot; http://www.flickr.com/photos/tags/ &quot; /> <category term=&quot; eclipse &quot; scheme=&quot; http://www.flickr.com/photos/tags/ &quot; /> </entry>
  • 28.
    Cite U Like CiteULike (i)
  • 29.
    Cite U Like CiteULike (ii) <item rdf:about= http://www.citeulike.org/user/timflutre/article/1139879 > <title> Dating the tree of life. </title> <link> http://www.citeulike.org/user/timflutre/article/1139879 </link> <description> ... </description> <dc:title> Dating the tree of life. </dc:title> <dc:creator> MJ Benton </dc:creator> <dc:creator> FJ Ayala </dc:creator> <dc:identifier> doi:10.1126/science.1077795 </dc:identifier> <dc:source> Science, Vol. 300, No. 5626. (13 June 2003), pp. 1698-1700. </dc:source> <prism:publicationName> Science </prism:publicationName> <prism:issn> 1095-9203 </prism:issn> <prism:volume> 300 </prism:volume> <prism:number> 5626 </prism:number> <prism:startingPage> 1698 </prism:startingPage> <prism:endingPage> 1700 </prism:endingPage> <category> evolution </category> <category> molecular_clock </category> </item>
  • 30.
    Connotea (i) <itemrdf:about=&quot; http://.../user/penguin/uri/ceac242dfd66979fe07ac5ebf8381afc &quot;> <title> Induction of IL-10 suppressors in lung transplant patients by CD4+25+ regulatory T cells through CTLA-4 signaling. </title> <link> http://.../user/penguin/uri/ceac242dfd66979fe07ac5ebf8381afc </link> <description> Posted by penguin to test on Wed Feb 21 2007 </description> <dc:creator> penguin </dc:creator> <dc:date> 2007-02-21T10:22:00Z </dc:date> <dc:subject> test </dc:subject> <slash:comments> 0 </slash:comments> <content:encoded><![CDATA[ ... ]]></content:encoded> <connotea:uri> <dcTerms:URI/> </connotea:uri> <annotate:reference rdf:resource=&quot; http://.../comments/uri/ceac242dfd66979fe07ac5ebf8381afc &quot;/> </item>
  • 31.
    Connotea (ii) <dcterms:URIrdf:about=&quot; http://www.hubmed.org/display.cgi?uids=17015751 &quot;> <dc:title> Induction of IL-10 suppressors in lung transplant patients by CD4+25+ regulatory T cells through CTLA-4 signaling. </dc:title> <dc:creator> Ankit Bharat </dc:creator> <dc:creator> Ryan Fields </dc:creator> <dc:creator> Elbert Trulock </dc:creator> <dc:creator> G Patterson </dc:creator> <dc:creator> Thalachallour Mohanakumar </dc:creator> <dc:identifier> info:pmid/17015751 </dc:identifier> <dc:date> 2006-10-15 </dc:date> <prism:publicationName> Journal of immunology (Baltimore, Md. : 1950) </prism:publicationName> <prism:issn></prism:issn> <prism:volume> 177 </prism:volume> <prism:number> 8 </prism:number> <prism:startingPage> 5631 </prism:startingPage> <prism:endingPage> 5638 </prism:endingPage> </dcterms:URI>
  • 32.
  • 33.
    Unalog (ii) <mods><titleInfo><title> 3quarksdaily </title></titleInfo> <typeOfResource> text </typeOfResource> <physicalDescription> <form authority=&quot;marcform&quot;> electronic </form> </physicalDescription> <location> <url> http://.../2006/07/random_walks_he.html </url> </location> <note> Mode of access: Internet. </note> <subject> <topic> jackchick </topic> <topic> fear </topic> <topic> altercall </topic> </subject> <recordInfo> ... </recordInfo> </mods>
  • 34.
    HubMed (i) <itemrdf:about=&quot; info:pmid/17015751 &quot;> <rdf:type rdf:resource=&quot; http://purl.oclc.org/NET/nknouf/ns/bibtex#Article &quot;/> <title> Induction of IL-10 suppressors in lung transplant patients by CD4+25+ regulatory T cells through CTLA-4 signaling. </title> <link> http://www.hubmed.org/display.cgi?uids=17015751 </link> <description> stored by aorchid </description> <dc:date> 2006 </dc:date> <dc:title> Induction of IL-10 suppressors in lung transplant patients by CD4+25+ regulatory T cells through CTLA-4 signaling. </dc:title> <dc:subject> bos </dc:subject> <dc:subject> treg </dc:subject> </item> <tags:Tagging> ... <tags:taggedResource rdf:resource=&quot; info:pmid/17015751 &quot;/> ... </tags:Tagging>
  • 35.
    HubMed (ii) <tags:Taggingrdf:about=&quot; http://.../tags/users/aorchid/item/17015751 &quot;> <tags:taggedBy rdf:resource=&quot; http://.../tags/users#aorchid &quot;/> <tags:taggedOn></tags:taggedOn> <tags:taggedResource rdf:resource=&quot; info:pmid/17015751 &quot;/> <tags:taggedWithTag> <tags:Tag rdf:about=&quot; http://.../tags/users/aorchid/tags#bos &quot;> <tags:tagName> bos </tags:tagName> </tags:Tag> </tags:taggedWithTag> <tags:taggedWithTag> <tags:Tag rdf:about=&quot; http://.../tags/users/aorchid/tags#treg &quot;> <tags:tagName> treg </tags:tagName> </tags:Tag> </tags:taggedWithTag> </tags:Tagging>
  • 36.
    Collaborative Filtering AmazonWikipedia Last.fm Technorati Digg Postgenomic Dissect Medicine
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
    Social Networking MySpace YouTube Facebook Friendster LinkedIn Nature Network
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
    OTMI Open TextMining Interface [email_address] nature.com/otmi Repository *.tar.gz, *.otmi - content *.opml - navigation opentextmining.org/ Wiki Resources (draft spec, scripts)
  • 52.
    OTMI - GoalsPublish machine-readable full text to Enable text mining Allow document categorization Allow domain entities (e.g. chemical compounds, genomes, etc) to be mapped Entity maps can be published in turn and related back to original document
  • 53.
    OTMI – DetailsOTMI = Atom <entry> document OTMI += document sections OTMI += vectors (word frequencies) OTMI += ‘snippets’ (nonlinear full text) OTMI += references (with DOI) OTMI filters stopwords (e.g. NLM)
  • 54.
    OTMI – Entry(i) <atom:entry xmlns:otmi=‘...’ xmlns:prism=‘...‘ xmlns:atom=‘...'> <atom:title>Structural biology Dangerous liaisons on neurons</atom:title> <atom:author> <atom:name>Giampietro Schiavo</atom:name> </atom:author> <atom:id>info:doi/10.1038/nature05410</atom:id> <atom:link href='http://dx.doi.org/10.1038/nature05410' /> <atom:link href='http://.../nature/journal/v444/n7122/otmi/nature05410.otmi‘ rel='self' /> <atom:link href='http://opentextmining.org/' rel='related' /> <atom:published>2006-12-21T00:00:00Z</atom:published> <atom:updated>2006-12-21T00:00:00Z</atom:updated> <atom:rights type='html'>(c) 2006 Nature Publishing Group</atom:rights> <prism:publicationName>Nature</prism:publicationName> <prism:volume>444</prism:volume> <prism:number>7122</prism:number> <prism:startingPage>1019</prism:startingPage> <prism:endingPage>1020</prism:endingPage> <prism:issn>0028-0836</prism:issn> <prism:eIssn /> <otmi:data/> </atom:entry>
  • 55.
    OTMI – Entry(ii) <otmi:data> <otmi:stoplist href='http://www.nature.com/nature/journal/v444/n7122/otmi/otmi-stoplist.xml' /> <otmi:section name='body'> <otmi:section name='other'> <otmi:vectors> ... </otmi:vectors> <otmi:snippets> ... </otmi:snippets> </otmi:section> </otmi:section> <otmi:figure> ... </otmi:figure> <otmi:references> ... </otmi:references> </otmi:data>
  • 56.
    OTMI – Entry(ii) / Vectors <otmi:vectors> <otmi:split-regex>(?-mix:\s+\W+|\W+\s+|\s+|\/)</otmi:split-regex> ... <otmi:vector count='9'>vesicles</otmi:vector> <otmi:vector count='8'>al</otmi:vector> <otmi:vector count='8'>et</otmi:vector> <otmi:vector count='8'>protein</otmi:vector> <otmi:vector count='8'>synaptic</otmi:vector> <otmi:vector count='8'>vesicle</otmi:vector> <otmi:vector count='7'>chain</otmi:vector> <otmi:vector count='7'>neuron</otmi:vector> ... </otmi:vectors>
  • 57.
    OTMI – Entry(iii) <otmi:data> <otmi:stoplist href='http://www.nature.com/nature/journal/v444/n7122/otmi/otmi-stoplist.xml' /> <otmi:section name='body'> <otmi:section name='other'> <otmi:vectors> ... </otmi:vectors> <otmi:snippets> ... </otmi:snippets> </otmi:section> </otmi:section> <otmi:figure> ... </otmi:figure> <otmi:references> ... </otmi:references> </otmi:data>
  • 58.
    OTMI – Entry(iii) / Snippets <otmi:snippets> <otmi:split-regex>(?-mix:\.\s+(?=[A-Z]))</otmi:split-regex> ... <otmi:snippet>The amino acids lining this cleft are very similar to those found in BoNT/G (ref. 10 ), but differ in the other toxin family members, which explains why different BoNTs recognize distinct protein receptors</otmi:snippet> <otmi:snippet>The model predicts that the interaction of BoNTs with both PSGs and protein receptors is necessary to explain their awesome potency , with a different protein receptor being recognized by each BoNT</otmi:snippet> <otmi:snippet>The rigid character of this interaction might be further enhanced by the association of the toxins heavy chain with nearby negatively charged lipid molecules, which play an accessory role in stabilizing the toxin on membranes </otmi:snippet> <otmi:snippet>The simplest possibility is that BoNT/B binds to PSGs and synaptotagmin within the lumen of a synaptic vesicle that is fused to the neuron membrane</otmi:snippet> <otmi:snippet>The toxins then escape from the vesicle lumen when the vesicles are acidified as they reload with neurotransmitters</otmi:snippet> <otmi:snippet>The two binding sites would firmly anchor the tip of BoNT/B to the vesicles inner surface, constraining the toxins mobility</otmi:snippet> ... </otmi:snippets>
  • 59.
    OTMI – Entry(iv) <otmi:data> <otmi:stoplist href='http://www.nature.com/nature/journal/v444/n7122/otmi/otmi-stoplist.xml' /> <otmi:section name='body'> <otmi:section name='other'> <otmi:vectors> ... </otmi:vectors> <otmi:snippets> ... </otmi:snippets> </otmi:section> </otmi:section> <otmi:figure> ... </otmi:figure> <otmi:references> ... </otmi:references> </otmi:data>
  • 60.
    OTMI – Entry(iv) / Figs <otmi:figure> <otmi:title> <otmi:reduced-text>Possible binding sites botulinum neurotoxin B (BoNT/B) neurons. Crystal studies Jin et al . Chai et al . suggest BoNT/B invades neurons stowing away carriers known synaptic vesicles. forming complex lipid molecules (polysialogangliosides, PSGs) vesicle protein ( synaptotagmin or synaptotagmin II) neuronal membrane. complex stabilized interactions neighbouring acidic lipid molecules (orange). BoNT/B enter open vesicles neurons membrane, one three possible sequences. , BoNT/B enters vesicle directly forms required complex. b , BoNT/B binds first PSGs membrane, transferred synaptic vesicle containing synaptotagmin. c , BoNT/B forms full complex membrane, synaptotagmin left behind inaccurate vesicle recycling. transferred lumen vesicle.</otmi:reduced-text> </otmi:title> <otmi:caption> <otmi:reduced-text>Possible binding sites botulinum neurotoxin B (BoNT/B) neurons. Crystal studies Jin et al . Chai et al . suggest BoNT/B invades neurons stowing away carriers known synaptic vesicles. forming complex lipid molecules (polysialogangliosides, PSGs) vesicle protein ( synaptotagmin or synaptotagmin II) neuronal membrane. complex stabilized interactions neighbouring acidic lipid molecules (orange). BoNT/B enter open vesicles neurons membrane, one three possible sequences. , BoNT/B enters vesicle directly forms required complex. b , BoNT/B binds first PSGs membrane, transferred synaptic vesicle containing synaptotagmin. c , BoNT/B forms full complex membrane, synaptotagmin left behind inaccurate vesicle recycling. transferred lumen vesicle.</otmi:reduced-text> </otmi:caption> </otmi:figure>
  • 61.
    OTMI – Entry(v) <otmi:data> <otmi:stoplist href='http://www.nature.com/nature/journal/v444/n7122/otmi/otmi-stoplist.xml' /> <otmi:section name='body'> <otmi:section name='other'> <otmi:vectors> ... </otmi:vectors> <otmi:snippets> ... </otmi:snippets> </otmi:section> </otmi:section> <otmi:figure> ... </otmi:figure> <otmi:references> ... </otmi:references> </otmi:data>
  • 62.
    OTMI – Entry(v) / Refs <otmi:references> <otmi:ref-id>info:doi/10.1038/nature05387</otmi:ref-id> <otmi:ref-id>info:doi/10.1038/nature05411</otmi:ref-id> <otmi:ref-id>info:doi/10.1016/0968-0004(86)90282-3</otmi:ref-id> <otmi:ref-id>info:doi/10.1016/0014-5793(95)01471-3</otmi:ref-id> <otmi:ref-id>info:doi/10.1083/jcb.200305098</otmi:ref-id> <otmi:ref-id>info:doi/10.1074/jbc.M403945200</otmi:ref-id> <otmi:ref-id>info:doi/10.1126/science.1123654</otmi:ref-id> <otmi:ref-id>info:doi/10.1016/j.febslet.2006.02.074</otmi:ref-id> <otmi:ref-id>info:doi/10.1083/jcb.200508170</otmi:ref-id> <otmi:refs-noid>3</otmi:refs-noid> </otmi:references>
  • 63.
    snoitpircseD eligA TonyHammond Nature Publishing Group [email_address]