INCOT.NET
René Deplanque | International Chemical Ontology Network
@The International Information Conference on Search, Data Mining and Visualization.
y
SUMMERY
• The final goal of this project is to build a system that is
available to all member of the network.
• It will be developed to make Big Data collections from various
fields of chemistry / pharmacology manageable using parent
ontologies.
• Thus the development of an innovative industry 4.0 approach
will be simplified and accelerated.
PAINPOINTS OF TODAY'S DATA COLLECTIONS:
Access Issues => Problems with finding and/or getting access to
data
Audience Issues => who is looking at data, how they perceive it,
perspectives, language of discipline
Chemical Structure Representation Issues
=>
what areas are problems - inorganic,
organometallic, large molecules, mixtures, chiral
centers
Community Issues => policies, procedures, and best practices we need to
adopt to move things forwards
Data Issues => standardization/interoperability, metadata, gaps,
scale, and sharing, dark data
Ontology/Vocabulary Issues => consensus on terms, maintenance, versions,
optimal vocabularies, areas where needed
Tools to Help Data/Metadata Capture
Issues =>
adding metadata, feedback, consistency,
synchronization
InternetofThings
AI
3D-Printing
VirtualReality
CloudComputing
SocialMedia
Mobility
Analytics
Security
Energy / Utilities
Consumer goods
Entertainment / media
Administration
Insurance
IT-technologies
Pharmaceuticals
Productions Industries
Trade
Telecommunication
Banks
Important
Unchanged
Unimportant
Technologies Trends of the coming 3-5 YearsBasis 3700 Manager worldwide
Source: Krallinger, M. et al. (2005) Text-mining approaches in molecular biology and biomedicine. DDT 10(6) 440
Ontology Defined
Google Definitions on the web
• An ontology is a controlled vocabulary that describes objects and the
relations between them in a formal way, and has a grammar for using the
vocabulary terms to express something meaningful within a specified domain
of interest. Source: members.optusnet.com.au/~webindexing/Webbook2Ed/glossary.htm
• Ontology is the newest label attached to some KOSs. Ontologies are being
developed as specific concept models by the Knowledge Management
community. They can represent complex relationships between objects, and
include the rules and axioms missing from semantic networks. Ontologies
that describe knowledge in a specific area are often connected with systems
for data mining and knowledge management.
Source: www.und.nodak.edu/dept/library/Departments/abc/SACSEM-SemInGlossary.htm
CREATING A COMPUTABLE CHEMICAL TAXONOMY
REQUIRES THREE KEY COMPONENTS:
A well-defined hierarchical taxonomic structure;
A dictionary of chemical classes (with full definitions
and category mappings); and
Computable rules or algorithms for assigning chemicals
to taxonomic categories.
Semantic Web
The Semantic Web "layer cake" as presented by Tim Berners-Lee.
Source: Hendler, J. (2001) Agents and the semantic web. http://www.cs.umd.edu/users/hendler/AgentWeb.html
KNOWN CLASSIFICATION SYSTEMS OF
CHEMICAL SUBSTANCES
 Classification as defined by EU regulations
 Regulation (EC) No 1272/2008 on classification, labelling and packaging of
substances and mixtures (the 'CLP Regulation').
 Classification as defined by UBA (Germany’s environmental protection
agency)
 These criteria and limiting values help to determine hazardous physical-chemical
properties as well as health and environmental hazards
 The Anatomical Therapeutic Chemical Classification System (ATC/DDD of
the world health organisation WHO)
 The purpose of the ATC/DDD system is to serve as a tool for drug utilization research
in order to improve quality of drug use.
 GUIDANCE ON THE CLASSIFICATION OF HAZARDOUS CHEMICALS UNDER
THE WHS REGULATIONS
 This Guidance is intended for manufacturers and importers of
substances, mixtures and articles who have a duty under the World
Health and Safety (WHS) Act and Regulations to classify them
 the Globally Harmonised System of Classification and Labelling of
Chemicals (the GHS).
 The WHS Regulations also implement the harmonised hazard
communication elements of the GHS that are to appear on labels and
safety data sheets (SDS)
 The Chemical Fragmentation Coding system
 It was developed in 1963 by the Derwent World Patent Index (DWPI) to
facilitate the manual classification of chemical compounds reported in
patents.
 The system consists of 2200 numerical codes corresponding to a set of
pre-defined, chemically significant structure fragments
Tools for developing Chemical Ontologies
 HOSE (Hierarchical Organisation of Spherical Environments) code.
 This hierarchical substructure system, allows one to automatically
characterize atoms and complete rings in terms of their spherical
environment
 Gene Ontology (GO) system,
 was one of the first open-source, automated functional group ontologies
to be formalized.
 CO functional groups can be automatically assigned to a given structure
by Checkmol a freely available program. CO’s assignment of functional
groups is accurate and consistent, and it has been applied to several
small datasets. However,
 the CO system is limited to just ~200 chemical groups
 SODIAC tool for automatic compound classification.
 It uses a comprehensive chemical ontology and an elegant structure-
based reasoning logic.
 The underlying chemical ontology can be freely downloaded and the
SODIAC software, which is closed-source, is free for academics
WHAT ARE THE MAJOR PROBLEMS
➢ In contrast to biology, geology, and many other scientific
disciplines, the world of chemistry still lacks a standardized
chemical ontology or taxonomy
➢ The chemical classification of a compound could help predict its
metabolic fate in humans, its drug ability or potential hazards
associated with it.
➢ The sheer number (tens of millions of compounds) and complexity
of chemical structures is such that any manual classification effort
would prove to be near impossible
two-ring heterocyclic compounds
isoquinolines
isoquinoline alkaloids
morphinans
morphine
grouped_by_chemistry
FRAGMENT OF CHEMICAL ONTOLOGY
molecules
organic molecules
heterocyclic compounds
bridged-ring heterocyclic compounds
morphinans
morphine
IsA
O
N
OH
OH
CH3
H
NH
H
morphine
morphinan
IsA
Source: Ennis, M. (2004) ChEBI A Dictionary of Chemical Entities with an Associated Ontology.
SOFG-2, Philadelphia, October 23-26 2004
CH3
O
NH2
H
O
OHCH2
OH
NH2
H
O
O
-
CH3
O
NH2
H
O
O
-
CH2
OH
NH2
H
O
OH
CH3
O
H NH2
O
OH
CH2
OH
H NH2
O
O
- CH2
OH
H NH2
O
OH CH3
O
H NH2
O
O
-
L-Amino acid
D-Amino acid
Amino acid
CO2H
OH =O
NH2
CO2
¯
is_a is_part_of
is_enantiomer_of is_conjugate_base_of
is_tautomer_of
Source: Ennis, M. (2004) ChEBI A Dictionary of Chemical
Entities with an Associated Ontology. SOFG-2,
Philadelphia, October 23-26 2004
AUTOMATED CHEMICAL
CLASSIFICATION SYSTEM
However, as in PubChem, the annotation is incomplete. Class assignments to “clavams” and
“azetidines”, among others, are missing
ONTOLOGIE ENGINEERING
HOW TO WORK WITH ONTOLOGIES
Michael Büttner Ontology Learning
THE ONTOLOGY DEVELOPMENT PROCESS
Michael Büttner Ontology Learning
WHAT DO WE HAVE - WHAT DO WE NEED
➢Chemists have a standardized nomenclature (IUPAC, CAS,
REAXIS)
➢Chemists have standardized methods for drawing or
exchanging chemical structures
➢Chemistry still lacks a standardized, comprehensive, and
clearly defined chemical taxonomy or chemical ontology
WHAT WAS DONE
➢ Chemist have developed domain specific ontologies
➢ Medical Chemist classify according to pharmaceutical
activities (antibacterial antihypertensive)
➢ Biochemist classify according biosynthetic origin
(nucleic acids, terpenoids)
➢ They do not fit
➢ In the PubChem database only 0.12% of the >91,000,000 compounds (as
of June 2016) are classified via the MeSH thesaurus
WHO AND WHAT IS INCOT.NET
 The Problem of defining overlapping Ontologies is of such
a magnitude that it can not be solved by a single
Organisation.
 INCOT.NET is an organisation based on an idea, need and
interest of major Chemical Companies.
 It is organized as independent Partnership
 It is attempting to coordinate a large variety of
Organisations to solve major pre-production problems.
 One of the prototype problems will be: The use of
Ontologies in the development of new methodologies for
the development of new Antibiotics.
Thank you for your patience
you will need it for your future

II-SDV 2017: The "International Chemical Ontology Network"

  • 1.
    INCOT.NET René Deplanque |International Chemical Ontology Network @The International Information Conference on Search, Data Mining and Visualization.
  • 2.
  • 3.
    SUMMERY • The finalgoal of this project is to build a system that is available to all member of the network. • It will be developed to make Big Data collections from various fields of chemistry / pharmacology manageable using parent ontologies. • Thus the development of an innovative industry 4.0 approach will be simplified and accelerated.
  • 4.
    PAINPOINTS OF TODAY'SDATA COLLECTIONS: Access Issues => Problems with finding and/or getting access to data Audience Issues => who is looking at data, how they perceive it, perspectives, language of discipline Chemical Structure Representation Issues => what areas are problems - inorganic, organometallic, large molecules, mixtures, chiral centers Community Issues => policies, procedures, and best practices we need to adopt to move things forwards Data Issues => standardization/interoperability, metadata, gaps, scale, and sharing, dark data Ontology/Vocabulary Issues => consensus on terms, maintenance, versions, optimal vocabularies, areas where needed Tools to Help Data/Metadata Capture Issues => adding metadata, feedback, consistency, synchronization
  • 5.
    InternetofThings AI 3D-Printing VirtualReality CloudComputing SocialMedia Mobility Analytics Security Energy / Utilities Consumergoods Entertainment / media Administration Insurance IT-technologies Pharmaceuticals Productions Industries Trade Telecommunication Banks Important Unchanged Unimportant Technologies Trends of the coming 3-5 YearsBasis 3700 Manager worldwide
  • 6.
    Source: Krallinger, M.et al. (2005) Text-mining approaches in molecular biology and biomedicine. DDT 10(6) 440
  • 7.
    Ontology Defined Google Definitionson the web • An ontology is a controlled vocabulary that describes objects and the relations between them in a formal way, and has a grammar for using the vocabulary terms to express something meaningful within a specified domain of interest. Source: members.optusnet.com.au/~webindexing/Webbook2Ed/glossary.htm • Ontology is the newest label attached to some KOSs. Ontologies are being developed as specific concept models by the Knowledge Management community. They can represent complex relationships between objects, and include the rules and axioms missing from semantic networks. Ontologies that describe knowledge in a specific area are often connected with systems for data mining and knowledge management. Source: www.und.nodak.edu/dept/library/Departments/abc/SACSEM-SemInGlossary.htm
  • 8.
    CREATING A COMPUTABLECHEMICAL TAXONOMY REQUIRES THREE KEY COMPONENTS: A well-defined hierarchical taxonomic structure; A dictionary of chemical classes (with full definitions and category mappings); and Computable rules or algorithms for assigning chemicals to taxonomic categories.
  • 9.
    Semantic Web The SemanticWeb "layer cake" as presented by Tim Berners-Lee. Source: Hendler, J. (2001) Agents and the semantic web. http://www.cs.umd.edu/users/hendler/AgentWeb.html
  • 10.
    KNOWN CLASSIFICATION SYSTEMSOF CHEMICAL SUBSTANCES  Classification as defined by EU regulations  Regulation (EC) No 1272/2008 on classification, labelling and packaging of substances and mixtures (the 'CLP Regulation').  Classification as defined by UBA (Germany’s environmental protection agency)  These criteria and limiting values help to determine hazardous physical-chemical properties as well as health and environmental hazards  The Anatomical Therapeutic Chemical Classification System (ATC/DDD of the world health organisation WHO)  The purpose of the ATC/DDD system is to serve as a tool for drug utilization research in order to improve quality of drug use.
  • 11.
     GUIDANCE ONTHE CLASSIFICATION OF HAZARDOUS CHEMICALS UNDER THE WHS REGULATIONS  This Guidance is intended for manufacturers and importers of substances, mixtures and articles who have a duty under the World Health and Safety (WHS) Act and Regulations to classify them  the Globally Harmonised System of Classification and Labelling of Chemicals (the GHS).  The WHS Regulations also implement the harmonised hazard communication elements of the GHS that are to appear on labels and safety data sheets (SDS)  The Chemical Fragmentation Coding system  It was developed in 1963 by the Derwent World Patent Index (DWPI) to facilitate the manual classification of chemical compounds reported in patents.  The system consists of 2200 numerical codes corresponding to a set of pre-defined, chemically significant structure fragments
  • 12.
    Tools for developingChemical Ontologies  HOSE (Hierarchical Organisation of Spherical Environments) code.  This hierarchical substructure system, allows one to automatically characterize atoms and complete rings in terms of their spherical environment  Gene Ontology (GO) system,  was one of the first open-source, automated functional group ontologies to be formalized.  CO functional groups can be automatically assigned to a given structure by Checkmol a freely available program. CO’s assignment of functional groups is accurate and consistent, and it has been applied to several small datasets. However,  the CO system is limited to just ~200 chemical groups  SODIAC tool for automatic compound classification.  It uses a comprehensive chemical ontology and an elegant structure- based reasoning logic.  The underlying chemical ontology can be freely downloaded and the SODIAC software, which is closed-source, is free for academics
  • 13.
    WHAT ARE THEMAJOR PROBLEMS ➢ In contrast to biology, geology, and many other scientific disciplines, the world of chemistry still lacks a standardized chemical ontology or taxonomy ➢ The chemical classification of a compound could help predict its metabolic fate in humans, its drug ability or potential hazards associated with it. ➢ The sheer number (tens of millions of compounds) and complexity of chemical structures is such that any manual classification effort would prove to be near impossible
  • 14.
    two-ring heterocyclic compounds isoquinolines isoquinolinealkaloids morphinans morphine grouped_by_chemistry FRAGMENT OF CHEMICAL ONTOLOGY molecules organic molecules heterocyclic compounds bridged-ring heterocyclic compounds morphinans morphine IsA O N OH OH CH3 H NH H morphine morphinan IsA Source: Ennis, M. (2004) ChEBI A Dictionary of Chemical Entities with an Associated Ontology. SOFG-2, Philadelphia, October 23-26 2004
  • 16.
    CH3 O NH2 H O OHCH2 OH NH2 H O O - CH3 O NH2 H O O - CH2 OH NH2 H O OH CH3 O H NH2 O OH CH2 OH H NH2 O O -CH2 OH H NH2 O OH CH3 O H NH2 O O - L-Amino acid D-Amino acid Amino acid CO2H OH =O NH2 CO2 ¯ is_a is_part_of is_enantiomer_of is_conjugate_base_of is_tautomer_of Source: Ennis, M. (2004) ChEBI A Dictionary of Chemical Entities with an Associated Ontology. SOFG-2, Philadelphia, October 23-26 2004
  • 17.
    AUTOMATED CHEMICAL CLASSIFICATION SYSTEM However,as in PubChem, the annotation is incomplete. Class assignments to “clavams” and “azetidines”, among others, are missing
  • 18.
  • 19.
    HOW TO WORKWITH ONTOLOGIES Michael Büttner Ontology Learning
  • 20.
    THE ONTOLOGY DEVELOPMENTPROCESS Michael Büttner Ontology Learning
  • 21.
    WHAT DO WEHAVE - WHAT DO WE NEED ➢Chemists have a standardized nomenclature (IUPAC, CAS, REAXIS) ➢Chemists have standardized methods for drawing or exchanging chemical structures ➢Chemistry still lacks a standardized, comprehensive, and clearly defined chemical taxonomy or chemical ontology
  • 22.
    WHAT WAS DONE ➢Chemist have developed domain specific ontologies ➢ Medical Chemist classify according to pharmaceutical activities (antibacterial antihypertensive) ➢ Biochemist classify according biosynthetic origin (nucleic acids, terpenoids) ➢ They do not fit ➢ In the PubChem database only 0.12% of the >91,000,000 compounds (as of June 2016) are classified via the MeSH thesaurus
  • 23.
    WHO AND WHATIS INCOT.NET  The Problem of defining overlapping Ontologies is of such a magnitude that it can not be solved by a single Organisation.  INCOT.NET is an organisation based on an idea, need and interest of major Chemical Companies.  It is organized as independent Partnership  It is attempting to coordinate a large variety of Organisations to solve major pre-production problems.  One of the prototype problems will be: The use of Ontologies in the development of new methodologies for the development of new Antibiotics.
  • 24.
    Thank you foryour patience you will need it for your future