SemTechBiz 2012, San Francisco, June 4th 2012




Domeo: a web-based tool for semantic
  annotation of online documents
        http://www.annotationframework.org/



          Paolo Ciccarese, PhD
            http://www.paolociccarese.info/
               paolo.ciccarese@gmail.com



           Mass General Hospital               Harvard Medical School
About Me
     •   Assistant in Neurology at Mass General Hospital
     •   Research faculty at Harvard Medical School
     •   Author of 30+ scientific publications
     •   Senior software and knowledge engineer
     •   Member of W3C HCLS Interest Group
     •   Co-chair of the W3C Open Annotation
         Community Group

         http://www.paolociccarese.info/



Paolo Ciccarese, PhD                          SemTechBiz 2012, June 4th 2012
As (biomedical) scientists…
     • We deal with an increasing amount of digital
       resources: documents, images, videos,
       datasets, vocabularies, databases, software…
          – About 150-200 articles a week
          – 10mins/article ≈ 34hours/week?
          – How can we manage it?


         http://www.ncbi.nlm.nih.gov/pubmed/



Paolo Ciccarese, PhD                           SemTechBiz 2012, June 4th 2012
… we commonly use annotation
     • We annotate prints, HTML and PDFs
     • We bookmark/tag web pages…
     • … and publications (citations/references)
     • We comment on web pages, blogs, forums
       and emails
     • We tweet…
     • …


Paolo Ciccarese, PhD                   SemTechBiz 2012, June 4th 2012
Are we efficient and effective?
     •   Can we integrate our annotations?
     •   Can we leverage machine computation?
     •   Can we share it easily with our colleagues?
     •   Can we capitalize on the work of colleagues?
     •   Can we integrate it with other resources?
     •   Can we easily observe science evolution?
     •   Can we easily detect the up-to-date science?
     •   Can we discover valuable resources?
Paolo Ciccarese, PhD                      SemTechBiz 2012, June 4th 2012
A ‘semantic’ view of a publication
                                    Semantic Web Applications in Neuromedicine
                                              (SWAN) project [2007]




              classic publication

                                      scientific discourse ‘semantic’ representation



      http://tinyurl.com/cgyna2m

Paolo Ciccarese, PhD                                     SemTechBiz 2012, June 4th 2012
graph representation




Paolo Ciccarese, PhD     SemTechBiz 2012, June 4th 2012
SWAN Creation/Curation Process




Paolo Ciccarese, PhD          SemTechBiz 2012, June 4th 2012
How do we empower ‘Joe Scientist’?
     • Even simple linking tasks are not ‘standardized’, hard
       to share and not easy to perform




                        http://antibodyregistry.org/antibody17/antibodyform.html?
                        gui_type=advanced&ab_id=2266850




                                                    antibodyregistry.org




Paolo Ciccarese, PhD                                                            SemTechBiz 2012, June 4th 2012
Enable manual annotation
               of digital resources
     • Visually and effectively annotate - better
       semantically annotate - any digital resource
       and resource fragment, while performing our
       regular browsing/reading activities
   http://www.ncbi.nlm.nih.gov/pubmed/19822029       http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2874257/




                                                 ≈




Paolo Ciccarese, PhD                                                  SemTechBiz 2012, June 4th 2012
Leverage text mining and
                         community curation
     • Run text mining and entities recognition
       algorithms on scientific documents and persist
       the results in a standard format
     • Benefit from crowdsourcing by supporting
       curation of manual and automatic annotation




Paolo Ciccarese, PhD                       SemTechBiz 2012, June 4th 2012
Enable semantic tagging (ontologies)




                   http://purl.obolibrary.org/obo/PR_000004168
                   Label ‘amyloid beta A4 protein’
                   Exact synonyms ‘APP’, ‘amyloidogenic glycoprotein’, …
                   Related Synonyms ‘A4’, ‘ABPP’,

                   Is a
                   http://purl.obolibrary.org/obo/PR_000000001
                   Label ‘protein’
                   Definition ‘An amino acid chain that…’

       Source: Protein Ontology (PRO) https://pir5.georgetown.edu/wiki/PRO
Paolo Ciccarese, PhD                                                SemTechBiz 2012, June 4th 2012
APPs for the Semantic Resources Project, May 2010
Paolo Ciccarese, PhD   SemTechBiz 2012, June 4th 2012
Zooming in




                             APPs for the Semantic Resources Project, May 2010




Paolo Ciccarese, PhD                               SemTechBiz 2012, June 4th 2012
…and more
     •   Share the annotation in a common format
     •   Efficiently search (inference, rules) the annotation
     •   Reuse/integrate the annotation
     •   Exercise access control
     •   Subscribe to feeds related to topics of interest
          – Proteins, Cells, Authors, Papers…
     • Retrieve additional content (mashups)
     • Find new resources
     • Find collaborators

Paolo Ciccarese, PhD                             SemTechBiz 2012, June 4th 2012
Annotation Ontology (AO)
     • OWL vocabulary for representing and sharing
       annotation of digital resources and their fragments
     • Not only for biomedicine!


                  Ciccarese et al, 2011
                  An open annotation ontology for science on web 3.0
                  http://www.jbiomedsem.com/content/2/S2/S4

                  http://purl.org/ao/home (Website/Wiki)




Paolo Ciccarese, PhD                                           SemTechBiz 2012, June 4th 2012
AO Overview
  AO allows to annotate:
     Resources: Documents (HTML, PDF, Word, Excel), Images,
     Databases, Web Services... (and their fragments)
  Specifying (or not) an:
     Annotation Type: through one of the already available
     types (errata, highlight, qualifiers...) or the ones the users
     will define.
  With (or without) a:
     Topic: free text, structured text, URIs, RDF entities,
     RDF graphs, domain ontologies…
  Tracing:
     Provenance: who created what, when, with which
     software, with what expectations…
Paolo Ciccarese, PhD                             SemTechBiz 2012, June 4th 2012
AlzSWAN: http://tinyurl.com/18r
                                    Annotating a document




Paolo Ciccarese, PhD                                    SemTechBiz 2012, June 4th 2012
Annotating a document fragment




Protein Ontology – PRO: http://purl.org/obo/owl/PRO

Paolo Ciccarese, PhD                                  SemTechBiz 2012, June 4th 2012
HyQue triples



                                                                                Experiments
                                                                    Workflows




Paolo Ontology 2.0: http://code.google.com/p/swan-ontology/
SWAN Ciccarese, PhD                                                                           SemTechBiz 2012, June 4th 2012
Annotation Ontology Network

                                          Biotea




           The Living Document
                  Project

Paolo Ciccarese, PhD             SemTechBiz 2012, June 4th 2012
Open Annotation Community Group
     • Annotation Ontology is going to be replaced in
       our applications by the Open Annotation
       Model developed through the W3C Open
       Annotation Community Group

          – Website http://www.w3.org/community/openannotation/
          – Core Model http://www.openannotation.org/spec/core/
          – Extensions http://www.openannotation.org/spec/extension/



Paolo Ciccarese, PhD                                    SemTechBiz 2012, June 4th 2012
• DOMEO Annotation Toolkit is a web
       application for producing and sharing manual,
       semi-automatic and automatic annotation

                  Ciccarese et al, 2012
                  Open semantic annotation of scientific publications using DOMEO
                  http://www.jbiomedsem.com/content/3/S1/S1

                  http://annotationframework.org




Paolo Ciccarese, PhD                                           SemTechBiz 2012, June 4th 2012
DOMEO: Document Metadata Organizer




Paolo Ciccarese, PhD             SemTechBiz 2012, June 4th 2012
Semantic Tags or Qualifiers [1]




Paolo Ciccarese, PhD              SemTechBiz 2012, June 4th 2012
Semantic Tags or Qualifiers [2]




Paolo Ciccarese, PhD              SemTechBiz 2012, June 4th 2012
Semantic Tags or Qualifiers [3]




Paolo Ciccarese, PhD              SemTechBiz 2012, June 4th 2012
Domeo and the NCBO Annotator




                                                                    http://www.bioontology.org/annotator-service
     • Domeo allows automatic/manual annotation with
       terms coming from selected ontologies managed by
       the BioPortal




Paolo Ciccarese, PhD                      SemTechBiz 2012, June 4th 2012
Running NCBO Annotator




            Additional text mining services
            will be listed here




Paolo Ciccarese, PhD                          SemTechBiz 2012, June 4th 2012
NCBO Annotator Results in Domeo




        List of recognized
        entities



Paolo Ciccarese, PhD         SemTechBiz 2012, June 4th 2012
Results Curation

                                          Customizable




Paolo Ciccarese, PhD                      SemTechBiz 2012, June 4th 2012
Cumulative Results Curation
     • One item only
     • All instances with the same text match
     • All instances independently from the text
       match




Paolo Ciccarese, PhD                    SemTechBiz 2012, June 4th 2012
Serialization in AO/RDF (Share)




Paolo Ciccarese, PhD              SemTechBiz 2012, June 4th 2012
http://www.slideshare.net/paolociccarese/domeo-and-text-mining



                       UIMA, Clerezza and AO

                                                                  Evaluating Performance
                                                                  Comparing Algorithms
                                                                  Learning
                                                                  …


      Text
                                                      Curated
     Mining
     Results
                       AO RDF                          Text
                                                      Mining
                                                      Results

                                                                         Applications
                                                    AO RDF               Publishing


Paolo Ciccarese, PhD                                             SemTechBiz 2012, June 4th 2012
SemTechBiz 2012, San Francisco, June 4th 2012




Thank you!
    Paolo Ciccarese, PhD
      http://www.paolociccarese.info/
         paolo.ciccarese@gmail.com



     Mass General Hospital               Harvard Medical School

SemTechBiz 2012: Domeo: a web-based tool for semantic annotation of online documents

  • 1.
    SemTechBiz 2012, SanFrancisco, June 4th 2012 Domeo: a web-based tool for semantic annotation of online documents http://www.annotationframework.org/ Paolo Ciccarese, PhD http://www.paolociccarese.info/ [email protected] Mass General Hospital Harvard Medical School
  • 2.
    About Me • Assistant in Neurology at Mass General Hospital • Research faculty at Harvard Medical School • Author of 30+ scientific publications • Senior software and knowledge engineer • Member of W3C HCLS Interest Group • Co-chair of the W3C Open Annotation Community Group http://www.paolociccarese.info/ Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
  • 3.
    As (biomedical) scientists… • We deal with an increasing amount of digital resources: documents, images, videos, datasets, vocabularies, databases, software… – About 150-200 articles a week – 10mins/article ≈ 34hours/week? – How can we manage it? http://www.ncbi.nlm.nih.gov/pubmed/ Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
  • 4.
    … we commonlyuse annotation • We annotate prints, HTML and PDFs • We bookmark/tag web pages… • … and publications (citations/references) • We comment on web pages, blogs, forums and emails • We tweet… • … Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
  • 5.
    Are we efficientand effective? • Can we integrate our annotations? • Can we leverage machine computation? • Can we share it easily with our colleagues? • Can we capitalize on the work of colleagues? • Can we integrate it with other resources? • Can we easily observe science evolution? • Can we easily detect the up-to-date science? • Can we discover valuable resources? Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
  • 6.
    A ‘semantic’ viewof a publication Semantic Web Applications in Neuromedicine (SWAN) project [2007] classic publication scientific discourse ‘semantic’ representation http://tinyurl.com/cgyna2m Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
  • 7.
    graph representation Paolo Ciccarese,PhD SemTechBiz 2012, June 4th 2012
  • 8.
    SWAN Creation/Curation Process PaoloCiccarese, PhD SemTechBiz 2012, June 4th 2012
  • 9.
    How do weempower ‘Joe Scientist’? • Even simple linking tasks are not ‘standardized’, hard to share and not easy to perform http://antibodyregistry.org/antibody17/antibodyform.html? gui_type=advanced&ab_id=2266850 antibodyregistry.org Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
  • 10.
    Enable manual annotation of digital resources • Visually and effectively annotate - better semantically annotate - any digital resource and resource fragment, while performing our regular browsing/reading activities http://www.ncbi.nlm.nih.gov/pubmed/19822029 http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2874257/ ≈ Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
  • 11.
    Leverage text miningand community curation • Run text mining and entities recognition algorithms on scientific documents and persist the results in a standard format • Benefit from crowdsourcing by supporting curation of manual and automatic annotation Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
  • 12.
    Enable semantic tagging(ontologies) http://purl.obolibrary.org/obo/PR_000004168 Label ‘amyloid beta A4 protein’ Exact synonyms ‘APP’, ‘amyloidogenic glycoprotein’, … Related Synonyms ‘A4’, ‘ABPP’, Is a http://purl.obolibrary.org/obo/PR_000000001 Label ‘protein’ Definition ‘An amino acid chain that…’ Source: Protein Ontology (PRO) https://pir5.georgetown.edu/wiki/PRO Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
  • 13.
    APPs for theSemantic Resources Project, May 2010 Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
  • 14.
    Zooming in APPs for the Semantic Resources Project, May 2010 Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
  • 15.
    …and more • Share the annotation in a common format • Efficiently search (inference, rules) the annotation • Reuse/integrate the annotation • Exercise access control • Subscribe to feeds related to topics of interest – Proteins, Cells, Authors, Papers… • Retrieve additional content (mashups) • Find new resources • Find collaborators Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
  • 16.
    Annotation Ontology (AO) • OWL vocabulary for representing and sharing annotation of digital resources and their fragments • Not only for biomedicine! Ciccarese et al, 2011 An open annotation ontology for science on web 3.0 http://www.jbiomedsem.com/content/2/S2/S4 http://purl.org/ao/home (Website/Wiki) Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
  • 17.
    AO Overview AO allows to annotate: Resources: Documents (HTML, PDF, Word, Excel), Images, Databases, Web Services... (and their fragments) Specifying (or not) an: Annotation Type: through one of the already available types (errata, highlight, qualifiers...) or the ones the users will define. With (or without) a: Topic: free text, structured text, URIs, RDF entities, RDF graphs, domain ontologies… Tracing: Provenance: who created what, when, with which software, with what expectations… Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
  • 18.
    AlzSWAN: http://tinyurl.com/18r Annotating a document Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
  • 19.
    Annotating a documentfragment Protein Ontology – PRO: http://purl.org/obo/owl/PRO Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
  • 20.
    HyQue triples Experiments Workflows Paolo Ontology 2.0: http://code.google.com/p/swan-ontology/ SWAN Ciccarese, PhD SemTechBiz 2012, June 4th 2012
  • 21.
    Annotation Ontology Network Biotea The Living Document Project Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
  • 22.
    Open Annotation CommunityGroup • Annotation Ontology is going to be replaced in our applications by the Open Annotation Model developed through the W3C Open Annotation Community Group – Website http://www.w3.org/community/openannotation/ – Core Model http://www.openannotation.org/spec/core/ – Extensions http://www.openannotation.org/spec/extension/ Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
  • 23.
    • DOMEO AnnotationToolkit is a web application for producing and sharing manual, semi-automatic and automatic annotation Ciccarese et al, 2012 Open semantic annotation of scientific publications using DOMEO http://www.jbiomedsem.com/content/3/S1/S1 http://annotationframework.org Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
  • 24.
    DOMEO: Document MetadataOrganizer Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
  • 25.
    Semantic Tags orQualifiers [1] Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
  • 26.
    Semantic Tags orQualifiers [2] Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
  • 27.
    Semantic Tags orQualifiers [3] Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
  • 28.
    Domeo and theNCBO Annotator http://www.bioontology.org/annotator-service • Domeo allows automatic/manual annotation with terms coming from selected ontologies managed by the BioPortal Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
  • 29.
    Running NCBO Annotator Additional text mining services will be listed here Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
  • 30.
    NCBO Annotator Resultsin Domeo List of recognized entities Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
  • 31.
    Results Curation Customizable Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
  • 32.
    Cumulative Results Curation • One item only • All instances with the same text match • All instances independently from the text match Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
  • 33.
    Serialization in AO/RDF(Share) Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
  • 34.
    http://www.slideshare.net/paolociccarese/domeo-and-text-mining UIMA, Clerezza and AO Evaluating Performance Comparing Algorithms Learning … Text Curated Mining Results AO RDF Text Mining Results Applications AO RDF Publishing Paolo Ciccarese, PhD SemTechBiz 2012, June 4th 2012
  • 35.
    SemTechBiz 2012, SanFrancisco, June 4th 2012 Thank you! Paolo Ciccarese, PhD http://www.paolociccarese.info/ [email protected] Mass General Hospital Harvard Medical School

Editor's Notes

  • #19 The topic can be an antibody (NIF Antibody registry)