SlideShare a Scribd company logo
Day	
  4:	
  KNIME	
  Tutorial	
  
George	
  Papadatos,	
  PhD	
  
Francis	
  Atkinson,	
  PhD	
  
ChEMBL	
  group	
  
Outline	
  
•  Introduc>on	
  to	
  KNIME	
  
•  Basic	
  components	
  
        •  Desktop,	
  nodes,	
  dialogs,	
  workflows	
  
•  Exercise	
  
        •  Compound	
  selec>on	
  for	
  focused	
  screening	
  
             •    Read	
  chemical	
  data	
  
             •    Calculate	
  proper>es	
  
             •    Apply	
  drug-­‐	
  and	
  lead-­‐	
  likeness	
  filters	
  
             •    Remove	
  “nasty”	
  compounds	
  
             •    Pick	
  diverse	
  molecules	
  
             •    Visualize	
  results	
  and	
  plot	
  proper>es	
  




2	
        05/07/2012	
                    Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
What	
  is	
  KNIME?	
  
•  KNIME	
  =	
  Konstanz	
  Informa>on	
  Miner	
  
•  Developed	
  at	
  University	
  of	
  Konstanz	
  in	
  Germany	
  
•  Desktop	
  version	
  available	
  free	
  of	
  charge	
  (Open	
  Source)	
  
•  Modular	
  plaWorm	
  for	
  building	
  and	
  execu>ng	
  workflows	
  using	
  
   predefined	
  components,	
  called	
  nodes	
  
•  Core	
  func>onality	
  available	
  for	
  tasks	
  such	
  as	
  standard	
  data	
  
   mining,	
  analysis	
  and	
  manipula>on	
  
•  Extra	
  features	
  and	
  func>onality	
  available	
  in	
  KNIME	
  through	
  
   extensions	
  from	
  various	
  groups	
  and	
  vendors	
  
•  WriYen	
  in	
  Java	
  based	
  on	
  the	
  Eclipse	
  SDK	
  plaWorm	
  



3	
     05/07/2012	
           Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
KNIME	
  resources	
  
•  Web	
  pages	
  (documenta>on)	
  
        •  www.knime.org	
  |	
  tech.knime.org	
  |	
  tech.knime.org/installa>on-­‐0	
  
•  Downloads	
  
        •  knime.org/download-­‐desktop	
  
•  Community	
  forum	
  
        •  tech.knime.org/forum	
  
•  KNIME	
  User	
  Training	
  Manual	
  
•  Books	
  and	
  white	
  papers	
  
        •  knime.org/node/33079	
  
•  Myself	
  
        •  georgep@ebi.ac.uk	
  




4	
        05/07/2012	
           Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
What	
  can	
  you	
  do	
  with	
  KNIME?	
  
•  Data	
  manipula>on	
  and	
  analysis	
  
        •  File	
  &	
  database	
  I/O,	
  sor>ng,	
  filtering,	
  grouping,	
  joining,	
  pivo>ng	
  
•  Data	
  mining	
  /	
  machine	
  learning	
  
        •  R,	
  WEKA,	
  interac>ve	
  plofng	
  
•  Chemoinforma>cs	
  
        •  Conversions,	
  similarity,	
  clustering,	
  (Q)SAR	
  analysis,	
  reac>on	
  
           enumera>on	
  
•  Scrip>ng	
  integra>on	
  
        •  R,	
  Perl,	
  Python,	
  Matlab,	
  Octave,	
  Groovy	
  
•  Repor>ng	
  
•  Much	
  more	
  
        •  Bioinforma>cs,	
  image	
  analysis,	
  network	
  &	
  text	
  mining	
  



5	
         05/07/2012	
                Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
Community	
  contributions	
  
•  hYp://tech.knime.org/community	
  
•  Chemoinforma>cs	
  
        •  CDK	
  (EBI),	
  RDKit	
  (Novar>s),	
  Indigo	
  (GGA),	
  ErlWood	
  (Eli	
  Lilly),	
  Enalos	
  
           (NovaMechanics)	
  
•  Bioinforma>cs	
  
        •  HCS	
  (MPI),	
  	
  NGS	
  (Konstanz)	
  
•  Text	
  mining	
  
        •  Palladian	
  
•  Integra>on	
  
        •  Python,	
  Perl,	
  R,	
  Groovy,	
  Matlab	
  (MPI),	
  PDB	
  web	
  services	
  client	
  (Vernalis)	
  




6	
         05/07/2012	
                 Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
Installation	
  &	
  updates	
  
•  Download	
  and	
  unzip	
  KNIME	
  
        •  No	
  further	
  setup	
  required	
  
        •  Addi>onal	
  nodes	
  aker	
  first	
  launch	
  
        •  knime.ini	
  contains	
  arguments	
  &	
  parameters	
  for	
  launch	
  
•  New	
  sokware	
  (nodes)	
  from	
  update	
  sites	
  
        •  hYp://tech.knime.org/update/community-­‐contribu>ons/release	
  
•  Workflows	
  and	
  data	
  are	
  stored	
  in	
  a	
  workspace	
  
        •  /Users/georgep/knime/workspace_mac_new	
  
        •  C:knime_2.5.4workspace	
  
•  Customiza>on	
  in:	
  FileàPreferencesàKNIME	
  


7	
        05/07/2012	
         Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
Auto-­‐layout	
   Execute	
  Execute	
  all	
  nodes	
  
KNIME	
  Workbench	
  
                                                                                                               Node	
  descrip>on	
  


                                                    tabs	
  


        workflow	
  projects
                          	
  


           favorite	
  nodes
                           	
  

                                                                                     public	
  server
                                                                                                    	
  



                                            workflow	
  editor	
  




          node	
  repository	
        outline	
                                                                         console	
  


8	
             05/07/2012	
       Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
KNIME	
  nodes:	
  Overview	
  	
  
Node	
  =	
  basic	
  processing	
  unit	
  of	
  KNIME	
  workflow	
  which	
  performs	
  a	
  par>cular	
  task	
  
                            Input	
  port(s)	
  –	
  on	
  the	
  lek	
  of	
  icon	
  
              Title	
  
                                                                                                               Output	
  port(s)	
  –	
  on	
  the	
  right	
  of	
  icon	
  




         Icon	
  

              Status	
  display	
  (‘traffic	
  lights’)	
  
                                                                                                                                       Right-­‐click	
  menu	
  
              	
                                     Sequence	
  number	
                                                              	
  
              •  Red	
  (not	
  ready)	
  
                                                                                                                                       To	
  configure	
  and	
  
              •  Amber	
  (ready)	
  
                                                                                                                                       execute	
  the	
  node,	
  
              •  Green	
  (executed)	
  
                                                                                                                                       display	
  the	
  output	
  
                                                                                                                                       views,	
  edit	
  the	
  
              •  Blue	
  bar	
  during	
  execu>on	
  
                                                                                                                                       node,	
  and	
  display	
  
                 (with	
  percentage	
  or	
  flashing)	
  
                                                                                                                                       data	
  for	
  the	
  ports	
  


 9	
           05/07/2012	
                           Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
KNIME	
  nodes:	
  Dialogs	
  
   Double	
  click	
  to	
  configure…	
  
                                                          Configura>on	
  menus	
  for	
  
                                                          selected	
  nodes	
  




                                                            Explicit	
  column	
  type	
  




10	
         05/07/2012	
                   Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
An	
  example	
  completed	
  workGlow	
  
•  Workflows	
  can	
  be	
  imported	
  and	
  exported	
  as	
  .zip	
  files	
  
         •  With	
  or	
  without	
  the	
  underlying	
  data	
  
         •  File	
  à	
  Import	
  KNIME	
  workflow…	
  
         •  File	
  à	
  Export	
  KNIME	
  workflow…	
  




11	
         05/07/2012	
           Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
Any	
  questions	
  so	
  far?	
  

12	
         05/07/2012	
     Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
Compound	
  selection	
  for	
  focused	
  screening	
  
1.  Read	
  chemical	
  data	
  
2.  Remove	
  duplicates	
  
         •     Iden>ty	
  ensured	
  by	
  InChi	
  keys	
  
3.  Filter	
  out	
  compounds	
  in	
  ChEMBL	
  
         •     Iden>ty	
  ensured	
  by	
  InChI	
  keys	
  
4.        Calculate	
  phys/chem	
  proper>es	
  
5.        Apply	
  drug-­‐	
  and	
  lead-­‐likeness	
  filters	
  
6.        Apply	
  more	
  filters	
  (e.g.	
  remove	
  solubility	
  liabili>es)	
  
7.        Apply	
  substructural	
  filters	
  (PAINS	
  subset)	
  
8.        Pick	
  diverse	
  molecules	
  



13	
          05/07/2012	
               Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
Your	
  objective	
  




14	
     05/07/2012	
     Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
First	
  steps	
  -­‐	
  I	
  
•  Locate	
  the	
  directory	
  with	
  today’s	
  
   material	
                                1

                                                                                                               2
•  Copy	
  and	
  paste	
  it	
  to	
  your	
  desktop	
  
         •  You	
  can	
  take	
  it	
  with	
  you	
  too	
  
•  Open	
  the	
  presenta>on	
  file	
  
•  Import	
  the	
  
   FocusedScreeningSelec>on.zip	
  to	
  
   KNIME	
  
         •  Menu	
  à	
  File	
  à	
  Import	
  workflow	
  
            to	
  KNIME	
  


                                                                                                           3




15	
           05/07/2012	
                       Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
First	
  steps	
  -­‐	
  II	
  
    •  Open	
  a	
  new	
  workflow	
  
             •  Right	
  click	
  on	
  the	
  workflow	
  projects	
  area	
  
1
                                                                                                 2




                                                                                                     3




    16	
         05/07/2012	
           Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
Part	
  1:	
  Reading	
  and	
  cleaning	
  up	
  




17	
     05/07/2012	
     Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
SDF	
  Reader	
  
         .dataSMDC_cleaned.sdf	
  
                                    1                                                       3




                                                                                            4



2                                                                                       5




18	
      05/07/2012	
         Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
Inspect	
  the	
  structures…	
  




Right	
  click	
  on	
  the	
  node	
  




        19	
             05/07/2012	
     Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
GroupBy	
  


            1
                                                                           3
                                                                                       2



                                                                                           5
                                                                                   4




20	
     05/07/2012	
     Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
GroupBy	
  Example	
  

                            Name            Course                           Grade
                            George          German                               68
                            George            Maths                              86
                            George          Physics                              99


Group	
  by	
  Name	
  and	
                                                                  Group	
  by	
  Name	
  and	
  
 then	
  take	
  first	
  row	
                                                                then	
  average	
  Grade  	
  

 Name        Course (first)          Grade (first)                                        Name           Grade (avg.)
George            German                      68                                        George               84.33




 21	
      05/07/2012	
              Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
File	
  Reader	
                                 1




                                                                         .dataall_human_chembl.csv	
  
               2               3




22	
     05/07/2012	
     Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
Reference	
  Row	
  Filter	
  




23	
     05/07/2012	
     Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
Molecule	
  to	
  RDKit	
  




24	
     05/07/2012	
     Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
Any	
  questions	
  so	
  far?	
  

25	
         05/07/2012	
     Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
Part	
  2:	
  Property-­‐based	
  Giltering	
  




26	
     05/07/2012	
     Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
Descriptor	
  Calculation	
  	
  



          1
                                                         2




                                                                                   3




27	
     05/07/2012	
     Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
Java	
  Snippet	
  	
  




                                                                       1      .codeLipinski.txt	
  



                                                           3



           2




28	
     05/07/2012	
     Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
Numeric	
  Row	
  Splitter	
  	
  




29	
     05/07/2012	
     Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
Inspect	
  the	
  Lipinski	
  fails…	
  	
  
Right	
  click	
  on	
  the	
  node	
  




   30	
             05/07/2012	
          Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
Java	
  Snippet	
  




                                                                       1      .codeOprea.txt	
  




                                                           3




            2




31	
     05/07/2012	
     Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
Numeric	
  Row	
  Splitter	
  	
  




32	
     05/07/2012	
     Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
Inspect	
  the	
  Oprea	
  fails…	
  	
  
Right	
  click	
  on	
  the	
  node	
  




   33	
             05/07/2012	
          Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
Numeric	
  Row	
  Splitter	
  	
  




34	
     05/07/2012	
     Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
Inspect	
  the	
  Solubility	
  fails…	
  	
  

Right	
  click	
  on	
  the	
  node	
  




   35	
             05/07/2012	
          Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
Any	
  questions	
  so	
  far?	
  

36	
         05/07/2012	
     Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
Part	
  3:	
  Substructure-­‐based	
  Giltering	
  




37	
     05/07/2012	
     Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
Molecule	
  to	
  Indigo	
  




38	
     05/07/2012	
     Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
File	
  reader	
   .dataPAINS_clean_half.sdf	
  




39	
     05/07/2012	
      Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
Query	
  Molecule	
  to	
  Indigo	
  




40	
     05/07/2012	
     Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
Inspect	
  the	
  SMARTS	
  rules	
  




41	
     05/07/2012	
     Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
Chunk	
  Loop	
  Start	
  




42	
     05/07/2012	
     Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
Substructure	
  Matcher	
  




43	
     05/07/2012	
     Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
Loop	
  End	
  




44	
     05/07/2012	
     Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
Inspect	
  matched	
  structures…	
  	
  
Right	
  click	
  on	
  the	
  node	
  




45	
             05/07/2012	
             Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
Reference	
  Row	
  Filter	
  




46	
     05/07/2012	
     Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
Any	
  questions	
  so	
  far?	
  

47	
         05/07/2012	
     Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
Part	
  4:	
  Diversity	
  picking	
  and	
  plotting	
  	
  




48	
     05/07/2012	
     Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
RDKit	
  Fingerprint	
  




49	
     05/07/2012	
     Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
Inspect	
  the	
  Gingerprints…	
  
Right	
  click	
  on	
  the	
  node	
  




    50	
             05/07/2012	
         Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
RDKit	
  Diversity	
  Picker	
  




51	
     05/07/2012	
     Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
2D/3D	
  Scatterplot	
  




52	
     05/07/2012	
     Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
Inspect	
  the	
  plot…	
  

   Right	
  click	
  on	
  the	
  node	
  




53	
             05/07/2012	
                Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
Any	
  questions	
  so	
  far?	
  

54	
         05/07/2012	
     Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
Conclusions	
  
•  Compound	
  selec>on	
  for	
  focused	
  screening	
  
         •  Theory	
  and	
  prac>ce	
  
         •  Typical	
  scenario	
  
•  KNIME	
  
         •  Open	
  and	
  free	
  
         •  Chemoinforma>cs	
  toolkits	
  
               •  Erl	
  Wood,	
  RDKit	
  and	
  Indigo	
  
         •  Not	
  perfect	
  




55	
         05/07/2012	
                 Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
Further	
  reading	
  
•  Open	
  data	
  and	
  tools	
  
	
  1.  A freeJ. J.; Sterling, T.; Mysinger, M. M.; Bolstad, E. S.; Coleman, R. G., ZINC:
        Irwin,
               tool to discover chemistry for biology. Journal of Chemical Information
         and Modeling 2012 ASAP.
2.       Saubern, S.; Guha, R.; Baell, J. B., KNIME workflow to assess PAINS filters in
         SMARTS format. Comparison of RDKit and Indigo cheminformatics libraries.
         Molecular Informatics 2011, 30, (10), 847-850.
3.       Barnes, M. R.; Harland, L.; Foord, S. M.; Hall, M. D.; Dix, I.; Thomas, S.;
         Williams-Jones, B. I.; Brouwer, C. R., Lowering industry firewalls: pre-
         competitive informatics initiatives in drug discovery. Nature Reviews Drug
         Discovery 2009, 8, (9), 701-708.
4.       Berthold, M. R.; Cebron, N.; Dill, F.; Gabriel, T. R.; Kötter, T.; Meinl, T.; Ohl, P.;
         Sieb, C.; Thiel, K.; Wiswedel, B., KNIME: The Konstanz Information Miner. In
         Data Analysis, Machine Learning and Applications, Preisach, C.; Burkhardt, H.;
         Schmidt-Thieme, L.; Decker, R., Eds. Springer: Berlin, 2008; pp 319-326.
5.       Tiwari, A.; Sekhar, A. K. T., Workflow based framework for life science
         informatics. Computational Biology and Chemistry 2007, 31, (5-6), 305-319.



56	
         05/07/2012	
           Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
Further	
  reading	
  
•  High	
  throughput	
  screening	
  
1.  Bajorath, J., Integration of virtual and high-throughput screening. Nature
    Reviews Drug Discovery 2002, 1, (11), 882-894.
2.  Harper, G.; Pickett, S. D.; Green, D. V. S., Design of a compound
    screening collection for use in High Throughput Screening. Combinatorial
    Chemistry & High Throughput Screening 2004, 7, (1), 63-70.
•  Lead-­‐	
  and	
  drug-­‐likeness	
  
1.  Chuprina, A.; Lukin, O.; Demoiseaux, R.; Buzko, A.; Shivanyuk, A., Drug- and
    lead-likeness, target class, and molecular diversity analysis of 7.9 million
    commercially available organic compounds provided by 29 suppliers. Journal of
    Chemical Information and Modeling 2010, 50, (4), 470-479.
2.  Lipinski, C. A., Lead- and drug-like compounds: the rule-of-five revolution. Drug
    Discovery Today: Technologies 2004, 1, (4), 337-341.
3.  Oprea, T. I.; Davis, A. M.; Teague, S. J.; Leeson, P. D., Is there a difference
    between leads and drugs? A historical perspective. Journal of Chemical
    Information and Computer Sciences 2001, 41, (5), 1308-1315.



57	
     05/07/2012	
            Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
Further	
  reading	
  
•  Physicochemical	
  proper>es	
  and	
  drug	
  discovery	
  
1.  Brüstle, M.; Beck, B.; Schindler, T.; King, W.; Mitchell, T.; Clark, T., Descriptors,
    physical properties, and drug-likeness. Journal of Medicinal Chemistry 2002, 45,
    (16), 3345-3355.
2.  Hill, A. P.; Young, R. J., Getting physical in drug discovery: A contemporary
    perspective on solubility and hydrophobicity. Drug Discovery Today 2010, 15,
    (15/16), 648-655.
3.  Leeson, P. D.; Springthorpe, B., The influence of drug-like concepts on decision-
    making in medicinal chemistry. Nature Reviews Drug Discovery 2007, 6, (11),
    881-890.

•  Structural	
  alerts	
  in	
  HTS	
  
1.  Baell, J. B.; Holloway, G. A., New substructure filters for removal of Pan Assay
    Interference Compounds (PAINS) from screening libraries and for their exclusion in
    bioassays. Journal of Medicinal Chemistry 2010, 53, (7), 2719-2740.
2.  Rishton, G. M., Reactive compounds and in vitro false positives in HTS. Drug
    Discovery Today 1997, 2, (9), 382-384.



58	
     05/07/2012	
             Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
Further	
  reading	
  
•  Similarity	
  and	
  diversity	
  
1.  Ashton, M.; Barnard, J.; Casset, F.; Charlton, M.; Downs, G.; Gorse, D.; Holliday,
	
   J.; Lahana, R.; Willett, P., Identification of diverse database subsets using
         property-based and fragment-based molecular descriptions. Quantitative
         Structure-Activity Relationships 2002, 21, (6), 598-604.
2.       Bender, A.; Glen, R. C., Molecular similarity: a key technique in molecular
         informatics. Organic and Biomolecular Chemistry 2004, 2, 3204-3218.
3.       Gorse, A.-D., Diversity in medicinal chemistry space. Current Topics in Medicinal
         Chemistry 2006, 6, (1), 3-18.
4.       Maldonado, A.; Doucet, J.; Petitjean, M.; Fan, B.-T., Molecular similarity and
         diversity in chemoinformatics: From theory to applications. Molecular Diversity
         2006, 10, (1), 39-79.
5.       Rogers, D.; Hahn, M., Extended-connectivity fingerprints. Journal of Chemical
         Information and Modeling 2010, 50, (5), 742-754.
6.       Schuffenhauer, A.; Brown, N., Chemical diversity and biological activity. Drug
         Discovery Today: Technologies 2006, 3, (4), 387-395.
7.       Willett, P.; Barnard, J. M.; Downs, G. M., Chemical similarity searching. Journal
         of Chemical Information and Computer Sciences 1998, 38, (6), 983-996.

59	
         05/07/2012	
         Resources	
  for	
  Computa5onal	
  Drug	
  Design	
  
Day	
  4:	
  KNIME	
  Tutorial	
  
George	
  Papadatos,	
  PhD	
  
Francis	
  Atkinson,	
  PhD	
  
ChEMBL	
  group	
  

More Related Content

What's hot (20)

PDF
Automating the Cloud with Terraform, and Ansible
Brian Hogan
 
PDF
Data Visualization(s) Using Python
Aniket Maithani
 
PPTX
Support Vector Machine ppt presentation
AyanaRukasar
 
PDF
Anomaly detection
Hitesh Mohapatra
 
PDF
Pegasus
Hangil Kim
 
PDF
A Universe of Knowledge Graphs
Neo4j
 
PDF
Machine Learning in 5 Minutes— Classification
Brian Lange
 
PPTX
Screw DevOps, Let's Talk DataOps
Kellyn Pot'Vin-Gorman
 
PDF
VictoriaLogs: Open Source Log Management System - Preview
VictoriaMetrics
 
PDF
Predictive Maintenance Systems, Technologies & Equipment Management Softwares...
proudmehta
 
PDF
MLOps Bridging the gap between Data Scientists and Ops.
Knoldus Inc.
 
PPTX
Natural language processing and transformer models
Ding Li
 
PDF
Grafana overview deck - Tech - 2023 May v1.pdf
BillySin5
 
PDF
Databricks Overview for MLOps
Databricks
 
PPTX
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
Simplilearn
 
PDF
Unsupervised Anomaly Detection with Isolation Forest - Elena Sharova
PyData
 
PPTX
[Paper review] BERT
JEE HYUN PARK
 
PDF
Introduction to MLflow
Databricks
 
PDF
Dmitry Spodarets: Modern MLOps toolchain 2023
Lviv Startup Club
 
PPTX
MLOps - The Assembly Line of ML
Jordan Birdsell
 
Automating the Cloud with Terraform, and Ansible
Brian Hogan
 
Data Visualization(s) Using Python
Aniket Maithani
 
Support Vector Machine ppt presentation
AyanaRukasar
 
Anomaly detection
Hitesh Mohapatra
 
Pegasus
Hangil Kim
 
A Universe of Knowledge Graphs
Neo4j
 
Machine Learning in 5 Minutes— Classification
Brian Lange
 
Screw DevOps, Let's Talk DataOps
Kellyn Pot'Vin-Gorman
 
VictoriaLogs: Open Source Log Management System - Preview
VictoriaMetrics
 
Predictive Maintenance Systems, Technologies & Equipment Management Softwares...
proudmehta
 
MLOps Bridging the gap between Data Scientists and Ops.
Knoldus Inc.
 
Natural language processing and transformer models
Ding Li
 
Grafana overview deck - Tech - 2023 May v1.pdf
BillySin5
 
Databricks Overview for MLOps
Databricks
 
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
Simplilearn
 
Unsupervised Anomaly Detection with Isolation Forest - Elena Sharova
PyData
 
[Paper review] BERT
JEE HYUN PARK
 
Introduction to MLflow
Databricks
 
Dmitry Spodarets: Modern MLOps toolchain 2023
Lviv Startup Club
 
MLOps - The Assembly Line of ML
Jordan Birdsell
 

Viewers also liked (8)

PPT
Bootp and dhcp
Mohd Arif
 
PPT
Arp and rarp
Mohd Arif
 
PDF
V5I3_IJERTV5IS031157
ahmad abdelhafeez
 
PPTX
Firewall, Trusted Systems,IP Security ,ESP Encryption and Authentication
Gopal Sakarkar
 
PPTX
Trusted systems
ahmad abdelhafeez
 
PPTX
Computer Virus powerpoint presentation
shohrabkhan
 
PPT
Network management
Mohd Arif
 
PPTX
ANTIVIRUS AND VIRUS Powerpoint presentation
abhijit chintamani
 
Bootp and dhcp
Mohd Arif
 
Arp and rarp
Mohd Arif
 
V5I3_IJERTV5IS031157
ahmad abdelhafeez
 
Firewall, Trusted Systems,IP Security ,ESP Encryption and Authentication
Gopal Sakarkar
 
Trusted systems
ahmad abdelhafeez
 
Computer Virus powerpoint presentation
shohrabkhan
 
Network management
Mohd Arif
 
ANTIVIRUS AND VIRUS Powerpoint presentation
abhijit chintamani
 
Ad

Similar to KNIME tutorial (20)

PDF
KNIME For Data Analytics Course Overview
BakhtiarAmaludin
 
PPTX
KNIME_Introduction_panduan mengggunakan knimepptx
muhajirin10
 
PDF
Big Data with KNIME.pdf
James Vp
 
PPTX
Building an AI and ML Model Using KNIME and Python.pptx
ssuser448ad3
 
PPTX
201204quickstartguide
pluskjw
 
PPTX
Introduction of vertical crawler
Jinglun Li
 
PDF
NeuroDimension Neuro Solutions HELP
ESCOM
 
PDF
Knime & bioinformatics
BioinformaticsInstitute
 
PPTX
Gimias Architecture
CISTIB, The University of Sheffield
 
PDF
EUGM 2014 - Brock Luty (Dart Neuroscience): A ChemAxon/KNIME based tool for ...
ChemAxon
 
PDF
Code camp 2015 visual programming mm
Mitch Miller
 
PDF
What's New in KNIME Analytics Platform 4.1
KNIMESlides
 
PDF
Cyberinfrastructure Day 2010: Applications in Biocomputing
Jeremy Yang
 
PDF
UKLUG 2009 - Extending Domino Designer on Eclipse
René Winkelmeyer
 
PPTX
Exploring Chemical and Biological Knowledge Spaces with PubChem
Paul Thiessen
 
PPTX
KNIME in Life Science, Cheminformatics and Computational Chemistry
Girinath Pillai
 
PDF
Programming Without Coding Technology (PWCT) Environment
Mahmoud Samir Fayed
 
PDF
Programming Without Coding Technology (PWCT) - Telephone Database
Mahmoud Samir Fayed
 
PDF
WebGUI Developers Workshop
Plain Black Corporation
 
DOCX
Microstation Training Chapter-1 By Misson Choudhury.docx
Misson Choudhury
 
KNIME For Data Analytics Course Overview
BakhtiarAmaludin
 
KNIME_Introduction_panduan mengggunakan knimepptx
muhajirin10
 
Big Data with KNIME.pdf
James Vp
 
Building an AI and ML Model Using KNIME and Python.pptx
ssuser448ad3
 
201204quickstartguide
pluskjw
 
Introduction of vertical crawler
Jinglun Li
 
NeuroDimension Neuro Solutions HELP
ESCOM
 
Knime & bioinformatics
BioinformaticsInstitute
 
EUGM 2014 - Brock Luty (Dart Neuroscience): A ChemAxon/KNIME based tool for ...
ChemAxon
 
Code camp 2015 visual programming mm
Mitch Miller
 
What's New in KNIME Analytics Platform 4.1
KNIMESlides
 
Cyberinfrastructure Day 2010: Applications in Biocomputing
Jeremy Yang
 
UKLUG 2009 - Extending Domino Designer on Eclipse
René Winkelmeyer
 
Exploring Chemical and Biological Knowledge Spaces with PubChem
Paul Thiessen
 
KNIME in Life Science, Cheminformatics and Computational Chemistry
Girinath Pillai
 
Programming Without Coding Technology (PWCT) Environment
Mahmoud Samir Fayed
 
Programming Without Coding Technology (PWCT) - Telephone Database
Mahmoud Samir Fayed
 
WebGUI Developers Workshop
Plain Black Corporation
 
Microstation Training Chapter-1 By Misson Choudhury.docx
Misson Choudhury
 
Ad

Recently uploaded (20)

PDF
BUCAS supporting DOH 8 Health Priorities
pedrofamorca
 
PPT
Upfront Upgrades for Mantle Cell Lymphoma: Integrating the First-Line Use of ...
PVI, PeerView Institute for Medical Education
 
PPTX
UPDATE on NEWER MALARIA VACCINE.pptx
AshwaniSood12
 
PPTX
molecular pharmaceutics m pharm ppt.pptx
abdulubed0701
 
PPTX
2.5 Role of Nasal & Pharyngeal Cavity in Voice Production (aqsa mehsood).pptx
Aqsa Mehsood
 
DOCX
Why Inflammation Markers Are Reshaping Heart Disease Risk Assessment
Ram Gopal Varma
 
PPTX
Intra-uterine fetal circulation in obstetrics
aniyakhan948
 
PPTX
Beyond Compliance Embracing Quality by Design (QbD) for Next-Generation Pharm...
Dr. Smita Kumbhar
 
PPTX
Therapeutic-Diet-PPT. for B.sc nursing 2nd sem
Rekhanjali Gupta
 
PPTX
CHILD PSYCHIATRY. CAUSES , PREVENTION.pptx ABHILAL KURUPALATH
Abhilal Kurupalath
 
PPTX
maternal pelvis and it's diameters in obstetrics
aniyakhan948
 
PPTX
Fundamentals of computer aided drug design.pptx
Onkar589550
 
PDF
Balance and Equilibrium - The Vestibular System
MedicoseAcademics
 
PPT
Natural-Resources UNIT -1st EVS ( B.pharm 2nd sem)
surya singh
 
PPT
The Road Ahead for Enhanced Sequential Care in CLL—Updates With Targeted Therapy
PVI, PeerView Institute for Medical Education
 
PPTX
Decoding the Optic Disc: A Beginner’s Guide to OCT Imaging & Analysis
KafrELShiekh University
 
PDF
Future Drug Development Approaches: A New Era with Artificial Intelligence
TRUSTLIFE
 
PPTX
OMODELE MORENIKE PRESENTATION NAVDOC-4.pptx
Omodelemorenike
 
PPTX
Sterilization of Endodontic Instruments and Cold Sterilization.pptx
Srinjoy Chatterjee
 
PPTX
Esophagus Debate NCTRT vs NACT by Dr Kanhu Charan Patro
Kanhu Charan
 
BUCAS supporting DOH 8 Health Priorities
pedrofamorca
 
Upfront Upgrades for Mantle Cell Lymphoma: Integrating the First-Line Use of ...
PVI, PeerView Institute for Medical Education
 
UPDATE on NEWER MALARIA VACCINE.pptx
AshwaniSood12
 
molecular pharmaceutics m pharm ppt.pptx
abdulubed0701
 
2.5 Role of Nasal & Pharyngeal Cavity in Voice Production (aqsa mehsood).pptx
Aqsa Mehsood
 
Why Inflammation Markers Are Reshaping Heart Disease Risk Assessment
Ram Gopal Varma
 
Intra-uterine fetal circulation in obstetrics
aniyakhan948
 
Beyond Compliance Embracing Quality by Design (QbD) for Next-Generation Pharm...
Dr. Smita Kumbhar
 
Therapeutic-Diet-PPT. for B.sc nursing 2nd sem
Rekhanjali Gupta
 
CHILD PSYCHIATRY. CAUSES , PREVENTION.pptx ABHILAL KURUPALATH
Abhilal Kurupalath
 
maternal pelvis and it's diameters in obstetrics
aniyakhan948
 
Fundamentals of computer aided drug design.pptx
Onkar589550
 
Balance and Equilibrium - The Vestibular System
MedicoseAcademics
 
Natural-Resources UNIT -1st EVS ( B.pharm 2nd sem)
surya singh
 
The Road Ahead for Enhanced Sequential Care in CLL—Updates With Targeted Therapy
PVI, PeerView Institute for Medical Education
 
Decoding the Optic Disc: A Beginner’s Guide to OCT Imaging & Analysis
KafrELShiekh University
 
Future Drug Development Approaches: A New Era with Artificial Intelligence
TRUSTLIFE
 
OMODELE MORENIKE PRESENTATION NAVDOC-4.pptx
Omodelemorenike
 
Sterilization of Endodontic Instruments and Cold Sterilization.pptx
Srinjoy Chatterjee
 
Esophagus Debate NCTRT vs NACT by Dr Kanhu Charan Patro
Kanhu Charan
 

KNIME tutorial

  • 1. Day  4:  KNIME  Tutorial   George  Papadatos,  PhD   Francis  Atkinson,  PhD   ChEMBL  group  
  • 2. Outline   •  Introduc>on  to  KNIME   •  Basic  components   •  Desktop,  nodes,  dialogs,  workflows   •  Exercise   •  Compound  selec>on  for  focused  screening   •  Read  chemical  data   •  Calculate  proper>es   •  Apply  drug-­‐  and  lead-­‐  likeness  filters   •  Remove  “nasty”  compounds   •  Pick  diverse  molecules   •  Visualize  results  and  plot  proper>es   2   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 3. What  is  KNIME?   •  KNIME  =  Konstanz  Informa>on  Miner   •  Developed  at  University  of  Konstanz  in  Germany   •  Desktop  version  available  free  of  charge  (Open  Source)   •  Modular  plaWorm  for  building  and  execu>ng  workflows  using   predefined  components,  called  nodes   •  Core  func>onality  available  for  tasks  such  as  standard  data   mining,  analysis  and  manipula>on   •  Extra  features  and  func>onality  available  in  KNIME  through   extensions  from  various  groups  and  vendors   •  WriYen  in  Java  based  on  the  Eclipse  SDK  plaWorm   3   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 4. KNIME  resources   •  Web  pages  (documenta>on)   •  www.knime.org  |  tech.knime.org  |  tech.knime.org/installa>on-­‐0   •  Downloads   •  knime.org/download-­‐desktop   •  Community  forum   •  tech.knime.org/forum   •  KNIME  User  Training  Manual   •  Books  and  white  papers   •  knime.org/node/33079   •  Myself   •  [email protected]   4   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 5. What  can  you  do  with  KNIME?   •  Data  manipula>on  and  analysis   •  File  &  database  I/O,  sor>ng,  filtering,  grouping,  joining,  pivo>ng   •  Data  mining  /  machine  learning   •  R,  WEKA,  interac>ve  plofng   •  Chemoinforma>cs   •  Conversions,  similarity,  clustering,  (Q)SAR  analysis,  reac>on   enumera>on   •  Scrip>ng  integra>on   •  R,  Perl,  Python,  Matlab,  Octave,  Groovy   •  Repor>ng   •  Much  more   •  Bioinforma>cs,  image  analysis,  network  &  text  mining   5   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 6. Community  contributions   •  hYp://tech.knime.org/community   •  Chemoinforma>cs   •  CDK  (EBI),  RDKit  (Novar>s),  Indigo  (GGA),  ErlWood  (Eli  Lilly),  Enalos   (NovaMechanics)   •  Bioinforma>cs   •  HCS  (MPI),    NGS  (Konstanz)   •  Text  mining   •  Palladian   •  Integra>on   •  Python,  Perl,  R,  Groovy,  Matlab  (MPI),  PDB  web  services  client  (Vernalis)   6   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 7. Installation  &  updates   •  Download  and  unzip  KNIME   •  No  further  setup  required   •  Addi>onal  nodes  aker  first  launch   •  knime.ini  contains  arguments  &  parameters  for  launch   •  New  sokware  (nodes)  from  update  sites   •  hYp://tech.knime.org/update/community-­‐contribu>ons/release   •  Workflows  and  data  are  stored  in  a  workspace   •  /Users/georgep/knime/workspace_mac_new   •  C:knime_2.5.4workspace   •  Customiza>on  in:  FileàPreferencesàKNIME   7   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 8. Auto-­‐layout   Execute  Execute  all  nodes   KNIME  Workbench   Node  descrip>on   tabs   workflow  projects   favorite  nodes   public  server   workflow  editor   node  repository   outline   console   8   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 9. KNIME  nodes:  Overview     Node  =  basic  processing  unit  of  KNIME  workflow  which  performs  a  par>cular  task   Input  port(s)  –  on  the  lek  of  icon   Title   Output  port(s)  –  on  the  right  of  icon   Icon   Status  display  (‘traffic  lights’)   Right-­‐click  menu     Sequence  number     •  Red  (not  ready)   To  configure  and   •  Amber  (ready)   execute  the  node,   •  Green  (executed)   display  the  output   views,  edit  the   •  Blue  bar  during  execu>on   node,  and  display   (with  percentage  or  flashing)   data  for  the  ports   9   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 10. KNIME  nodes:  Dialogs   Double  click  to  configure…   Configura>on  menus  for   selected  nodes   Explicit  column  type   10   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 11. An  example  completed  workGlow   •  Workflows  can  be  imported  and  exported  as  .zip  files   •  With  or  without  the  underlying  data   •  File  à  Import  KNIME  workflow…   •  File  à  Export  KNIME  workflow…   11   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 12. Any  questions  so  far?   12   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 13. Compound  selection  for  focused  screening   1.  Read  chemical  data   2.  Remove  duplicates   •  Iden>ty  ensured  by  InChi  keys   3.  Filter  out  compounds  in  ChEMBL   •  Iden>ty  ensured  by  InChI  keys   4.  Calculate  phys/chem  proper>es   5.  Apply  drug-­‐  and  lead-­‐likeness  filters   6.  Apply  more  filters  (e.g.  remove  solubility  liabili>es)   7.  Apply  substructural  filters  (PAINS  subset)   8.  Pick  diverse  molecules   13   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 14. Your  objective   14   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 15. First  steps  -­‐  I   •  Locate  the  directory  with  today’s   material   1 2 •  Copy  and  paste  it  to  your  desktop   •  You  can  take  it  with  you  too   •  Open  the  presenta>on  file   •  Import  the   FocusedScreeningSelec>on.zip  to   KNIME   •  Menu  à  File  à  Import  workflow   to  KNIME   3 15   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 16. First  steps  -­‐  II   •  Open  a  new  workflow   •  Right  click  on  the  workflow  projects  area   1 2 3 16   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 17. Part  1:  Reading  and  cleaning  up   17   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 18. SDF  Reader   .dataSMDC_cleaned.sdf   1 3 4 2 5 18   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 19. Inspect  the  structures…   Right  click  on  the  node   19   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 20. GroupBy   1 3 2 5 4 20   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 21. GroupBy  Example   Name Course Grade George German 68 George Maths 86 George Physics 99 Group  by  Name  and   Group  by  Name  and   then  take  first  row   then  average  Grade   Name Course (first) Grade (first) Name Grade (avg.) George German 68 George 84.33 21   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 22. File  Reader   1 .dataall_human_chembl.csv   2 3 22   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 23. Reference  Row  Filter   23   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 24. Molecule  to  RDKit   24   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 25. Any  questions  so  far?   25   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 26. Part  2:  Property-­‐based  Giltering   26   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 27. Descriptor  Calculation     1 2 3 27   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 28. Java  Snippet     1 .codeLipinski.txt   3 2 28   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 29. Numeric  Row  Splitter     29   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 30. Inspect  the  Lipinski  fails…     Right  click  on  the  node   30   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 31. Java  Snippet   1 .codeOprea.txt   3 2 31   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 32. Numeric  Row  Splitter     32   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 33. Inspect  the  Oprea  fails…     Right  click  on  the  node   33   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 34. Numeric  Row  Splitter     34   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 35. Inspect  the  Solubility  fails…     Right  click  on  the  node   35   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 36. Any  questions  so  far?   36   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 37. Part  3:  Substructure-­‐based  Giltering   37   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 38. Molecule  to  Indigo   38   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 39. File  reader   .dataPAINS_clean_half.sdf   39   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 40. Query  Molecule  to  Indigo   40   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 41. Inspect  the  SMARTS  rules   41   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 42. Chunk  Loop  Start   42   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 43. Substructure  Matcher   43   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 44. Loop  End   44   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 45. Inspect  matched  structures…     Right  click  on  the  node   45   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 46. Reference  Row  Filter   46   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 47. Any  questions  so  far?   47   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 48. Part  4:  Diversity  picking  and  plotting     48   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 49. RDKit  Fingerprint   49   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 50. Inspect  the  Gingerprints…   Right  click  on  the  node   50   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 51. RDKit  Diversity  Picker   51   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 52. 2D/3D  Scatterplot   52   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 53. Inspect  the  plot…   Right  click  on  the  node   53   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 54. Any  questions  so  far?   54   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 55. Conclusions   •  Compound  selec>on  for  focused  screening   •  Theory  and  prac>ce   •  Typical  scenario   •  KNIME   •  Open  and  free   •  Chemoinforma>cs  toolkits   •  Erl  Wood,  RDKit  and  Indigo   •  Not  perfect   55   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 56. Further  reading   •  Open  data  and  tools    1.  A freeJ. J.; Sterling, T.; Mysinger, M. M.; Bolstad, E. S.; Coleman, R. G., ZINC: Irwin, tool to discover chemistry for biology. Journal of Chemical Information and Modeling 2012 ASAP. 2.  Saubern, S.; Guha, R.; Baell, J. B., KNIME workflow to assess PAINS filters in SMARTS format. Comparison of RDKit and Indigo cheminformatics libraries. Molecular Informatics 2011, 30, (10), 847-850. 3.  Barnes, M. R.; Harland, L.; Foord, S. M.; Hall, M. D.; Dix, I.; Thomas, S.; Williams-Jones, B. I.; Brouwer, C. R., Lowering industry firewalls: pre- competitive informatics initiatives in drug discovery. Nature Reviews Drug Discovery 2009, 8, (9), 701-708. 4.  Berthold, M. R.; Cebron, N.; Dill, F.; Gabriel, T. R.; Kötter, T.; Meinl, T.; Ohl, P.; Sieb, C.; Thiel, K.; Wiswedel, B., KNIME: The Konstanz Information Miner. In Data Analysis, Machine Learning and Applications, Preisach, C.; Burkhardt, H.; Schmidt-Thieme, L.; Decker, R., Eds. Springer: Berlin, 2008; pp 319-326. 5.  Tiwari, A.; Sekhar, A. K. T., Workflow based framework for life science informatics. Computational Biology and Chemistry 2007, 31, (5-6), 305-319. 56   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 57. Further  reading   •  High  throughput  screening   1.  Bajorath, J., Integration of virtual and high-throughput screening. Nature Reviews Drug Discovery 2002, 1, (11), 882-894. 2.  Harper, G.; Pickett, S. D.; Green, D. V. S., Design of a compound screening collection for use in High Throughput Screening. Combinatorial Chemistry & High Throughput Screening 2004, 7, (1), 63-70. •  Lead-­‐  and  drug-­‐likeness   1.  Chuprina, A.; Lukin, O.; Demoiseaux, R.; Buzko, A.; Shivanyuk, A., Drug- and lead-likeness, target class, and molecular diversity analysis of 7.9 million commercially available organic compounds provided by 29 suppliers. Journal of Chemical Information and Modeling 2010, 50, (4), 470-479. 2.  Lipinski, C. A., Lead- and drug-like compounds: the rule-of-five revolution. Drug Discovery Today: Technologies 2004, 1, (4), 337-341. 3.  Oprea, T. I.; Davis, A. M.; Teague, S. J.; Leeson, P. D., Is there a difference between leads and drugs? A historical perspective. Journal of Chemical Information and Computer Sciences 2001, 41, (5), 1308-1315. 57   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 58. Further  reading   •  Physicochemical  proper>es  and  drug  discovery   1.  Brüstle, M.; Beck, B.; Schindler, T.; King, W.; Mitchell, T.; Clark, T., Descriptors, physical properties, and drug-likeness. Journal of Medicinal Chemistry 2002, 45, (16), 3345-3355. 2.  Hill, A. P.; Young, R. J., Getting physical in drug discovery: A contemporary perspective on solubility and hydrophobicity. Drug Discovery Today 2010, 15, (15/16), 648-655. 3.  Leeson, P. D.; Springthorpe, B., The influence of drug-like concepts on decision- making in medicinal chemistry. Nature Reviews Drug Discovery 2007, 6, (11), 881-890. •  Structural  alerts  in  HTS   1.  Baell, J. B.; Holloway, G. A., New substructure filters for removal of Pan Assay Interference Compounds (PAINS) from screening libraries and for their exclusion in bioassays. Journal of Medicinal Chemistry 2010, 53, (7), 2719-2740. 2.  Rishton, G. M., Reactive compounds and in vitro false positives in HTS. Drug Discovery Today 1997, 2, (9), 382-384. 58   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 59. Further  reading   •  Similarity  and  diversity   1.  Ashton, M.; Barnard, J.; Casset, F.; Charlton, M.; Downs, G.; Gorse, D.; Holliday,   J.; Lahana, R.; Willett, P., Identification of diverse database subsets using property-based and fragment-based molecular descriptions. Quantitative Structure-Activity Relationships 2002, 21, (6), 598-604. 2.  Bender, A.; Glen, R. C., Molecular similarity: a key technique in molecular informatics. Organic and Biomolecular Chemistry 2004, 2, 3204-3218. 3.  Gorse, A.-D., Diversity in medicinal chemistry space. Current Topics in Medicinal Chemistry 2006, 6, (1), 3-18. 4.  Maldonado, A.; Doucet, J.; Petitjean, M.; Fan, B.-T., Molecular similarity and diversity in chemoinformatics: From theory to applications. Molecular Diversity 2006, 10, (1), 39-79. 5.  Rogers, D.; Hahn, M., Extended-connectivity fingerprints. Journal of Chemical Information and Modeling 2010, 50, (5), 742-754. 6.  Schuffenhauer, A.; Brown, N., Chemical diversity and biological activity. Drug Discovery Today: Technologies 2006, 3, (4), 387-395. 7.  Willett, P.; Barnard, J. M.; Downs, G. M., Chemical similarity searching. Journal of Chemical Information and Computer Sciences 1998, 38, (6), 983-996. 59   05/07/2012   Resources  for  Computa5onal  Drug  Design  
  • 60. Day  4:  KNIME  Tutorial   George  Papadatos,  PhD   Francis  Atkinson,  PhD   ChEMBL  group