SlideShare a Scribd company logo
3
Most read
4
Most read
5
Most read
SUMOylation-site Prediction Denis C. Bauer Fabian A. Buske Mikael Bod én
Overview Background SUMOylation - what is that ? Published predictors Our approach What makes SUMO hard to tackle
SUMO is not  相撲 S mall  U biquitin-related  Mo difier is a small protein of 97 amino acids.  20% homology to ubiquitin Post-translational modification Covalently attached to  Lysines Involved in many pathways/mechanisms Transcriptional regulation Compartmentisation
SUMOylation pathway
SUMOylation motif One consensus motif  [ILV]K.E  for about 60% of known sites However Not all  [ILV]K.E  -sites are SUMOylated Not all SUMOylated sites have the consensus motif  TP FP FN
Baseline prediction Method CC Regular Expression scanner 0.68
Comparison with existing predictors + Xu J.,  BMC Bioinformatics  2008, 9:8 ‡  Xue Y.,  Nucleic Acid Res  2006,  W254 -W 257 †  http://www.abgent.com/doc/sumoplot (commercial) Method CC Regular Expression scanner 0.68 SUMOpre + 0.64 SUMOsp ‡ 0.26 SUMOplot † 0.48
Case study : Core histones in yeast Identified SUMOylation sites + H2B : K6/7, K16/17 H2A : K2, K126 H4 : somewhere in the tail  No SUMOylation consensus site Predictor to date are not able to predict even a single SUMOylation site in the histone sequence  + Nathan D.,  Genes Dev  2006, 20(8):966-76
Our approach Identify  window size which ML method is best Voil á: better predictor ! Sequence xxxx K xxxx SUMOylation 1/0 ML
Training in more Detail w U w D Protein  Sequence K Imbalance in the dataset - more negatives than positives  SUMOylated K Not SUMOylated K K K ML T 0 1 0 P 1 1 0 K K
Prediction in more Detail w U w D Protein  Sequence K K K Trained ML 1 1 0 K K SUMOylated K Not SUMOylated K K K
ML methods Bidirectional Recurrent Neural Network (BRNN) Using information of flanking windows Decaying with distance to center window Prone to overfit Support Vector Machine (SVM) regularized requires suitable kernel and feature representation  Standard Kernels Linear, Polynomial, RBF String Kernel P-kernel, local-alignment kernel
Data set Training/Testing data 144 proteins with  241 SUMOylation sites 5,741 non-SUMOylated Lysines 68% of the SUMOulated sites confom to the consensus motif  Hold-out  13 proteins with 27 SUMOylation sites 48% consensus motif Xu J.,  BMC Bioinformatics  2008, 9:8
Evaluation 5-fold cross-validation Matthews correlation coefficient (CC) Sensitivity, Specificity, Accuracy Area under the curve ( AUC )
Performance overview SUMOsvm
Comparison with existing methods
Quest to improve performance  Protein structural features and evolutionary features  Separating SUMOylation sites from different species or compartment  Clustering for other motifs using kernel hierarchical clustering
Summary Regular Expression Scanner is still the best classifier. SUMO more versatile than expected ! The road to better predictions Are there other motifs? Which features can discriminate? Is the dataset biased? http://spot.colorado.edu/~colemab/Theatre_Resources/SumoBallerina.jpg
Acknowledgment  Predictor/Analysis Mikael Bod én Fabian Buske Dataset Xu et al. PhD Supervisors Tim Bailey Andrew Perkins Mikael Bod én Other Bioinformatic tools: STREAM – a practical workbench for modeling  transcriptional regulation. www.bioinformatics.org.au/stream/

More Related Content

PPTX
Immobilisation cell culture
Sukriti Singh
 
PDF
Cancer cell metabolism
biocatalysis and Bioremediation lab/GCUF
 
PPT
Dna methylation
Sushma Marla
 
PPTX
Monoclonal antibodies
DrAyush Garg
 
PPT
DNA repair
najmaldin saki
 
PPTX
Viral vector
Cleopatra William
 
PPT
DNA damages and repair
Aman Ullah
 
PPTX
Characterization of protein
KAUSHAL SAHU
 
Immobilisation cell culture
Sukriti Singh
 
Dna methylation
Sushma Marla
 
Monoclonal antibodies
DrAyush Garg
 
DNA repair
najmaldin saki
 
Viral vector
Cleopatra William
 
DNA damages and repair
Aman Ullah
 
Characterization of protein
KAUSHAL SAHU
 

What's hot (20)

PPTX
Post translational modification (ubiquitination)
Bahauddin zakariya university,Multan
 
PPTX
Theory of immune surveillance
ShariqaJan
 
PPTX
Serine protease
gisha puliyoor
 
PPTX
Antisense rna and dna
Erin Sharkawy
 
PPTX
Plantibodies.
Vishal Sathe
 
PPTX
Protein kinases
MukulTambe
 
PPT
co and post translation modification
KAUSHAL SAHU
 
PPTX
MALDI-TOF Mass Spectrometry
Nawaz Shah
 
PPTX
Dna methylation ppt
Ibad khan
 
PPTX
Introduction of Protein Acetylation
Creative Proteomics
 
PPTX
(Gel Filtration Chromatography)GFC
Athira athira
 
PPTX
VIRAL VECTORS FOR GENE TRANSFER
ANKUR SHARMA
 
PPTX
Expression of Immunoglobin gene
Hadia Azhar
 
PPT
Cross talk between signalling pathway
Jyoti Prakash Sahoo
 
PPTX
Transcription factor
avinash tiwari
 
PPT
DNA Repair
Dr.M.Prasad Naidu
 
PPTX
SYNTHETIC PEPTIDE VACCINES AND RECOMBINANT ANTIGEN VACCINE
D.R. Chandravanshi
 
PPTX
Histotypic culture
BHAVYA SHREE
 
PPTX
Signal Transduction in cancer
Kundan Singh
 
Post translational modification (ubiquitination)
Bahauddin zakariya university,Multan
 
Theory of immune surveillance
ShariqaJan
 
Serine protease
gisha puliyoor
 
Antisense rna and dna
Erin Sharkawy
 
Plantibodies.
Vishal Sathe
 
Protein kinases
MukulTambe
 
co and post translation modification
KAUSHAL SAHU
 
MALDI-TOF Mass Spectrometry
Nawaz Shah
 
Dna methylation ppt
Ibad khan
 
Introduction of Protein Acetylation
Creative Proteomics
 
(Gel Filtration Chromatography)GFC
Athira athira
 
VIRAL VECTORS FOR GENE TRANSFER
ANKUR SHARMA
 
Expression of Immunoglobin gene
Hadia Azhar
 
Cross talk between signalling pathway
Jyoti Prakash Sahoo
 
Transcription factor
avinash tiwari
 
DNA Repair
Dr.M.Prasad Naidu
 
SYNTHETIC PEPTIDE VACCINES AND RECOMBINANT ANTIGEN VACCINE
D.R. Chandravanshi
 
Histotypic culture
BHAVYA SHREE
 
Signal Transduction in cancer
Kundan Singh
 
Ad

More from Denis C. Bauer (20)

PPTX
Cloud-native machine learning - Transforming bioinformatics research
Denis C. Bauer
 
PPTX
Translating genomics into clinical practice - 2018 AWS summit keynote
Denis C. Bauer
 
PPTX
Going Server-less for Web-Services that need to Crunch Large Volumes of Data
Denis C. Bauer
 
PPTX
How novel compute technology transforms life science research
Denis C. Bauer
 
PPTX
How novel compute technology transforms life science research
Denis C. Bauer
 
PPTX
VariantSpark: applying Spark-based machine learning methods to genomic inform...
Denis C. Bauer
 
PPTX
Population-scale high-throughput sequencing data analysis
Denis C. Bauer
 
PPTX
Trip Report Seattle
Denis C. Bauer
 
PPTX
Allelic Imbalance for Pre-capture Whole Exome Sequencing
Denis C. Bauer
 
PPTX
Centralizing sequence analysis
Denis C. Bauer
 
PPTX
Qbi Centre for Brain genomics (Informatics side)
Denis C. Bauer
 
PPTX
Differential gene expression
Denis C. Bauer
 
PPTX
Transcript detection in RNAseq
Denis C. Bauer
 
PPTX
Functionally annotate genomic variants
Denis C. Bauer
 
PPTX
Variant (SNPs/Indels) calling in DNA sequences, Part 2
Denis C. Bauer
 
PPTX
Variant (SNPs/Indels) calling in DNA sequences, Part 1
Denis C. Bauer
 
PPTX
Introduction to second generation sequencing
Denis C. Bauer
 
PPTX
Introduction to Bioinformatics
Denis C. Bauer
 
PPTX
The missing data issue for HiSeq runs
Denis C. Bauer
 
PDF
Deciphering the regulatory code in the genome
Denis C. Bauer
 
Cloud-native machine learning - Transforming bioinformatics research
Denis C. Bauer
 
Translating genomics into clinical practice - 2018 AWS summit keynote
Denis C. Bauer
 
Going Server-less for Web-Services that need to Crunch Large Volumes of Data
Denis C. Bauer
 
How novel compute technology transforms life science research
Denis C. Bauer
 
How novel compute technology transforms life science research
Denis C. Bauer
 
VariantSpark: applying Spark-based machine learning methods to genomic inform...
Denis C. Bauer
 
Population-scale high-throughput sequencing data analysis
Denis C. Bauer
 
Trip Report Seattle
Denis C. Bauer
 
Allelic Imbalance for Pre-capture Whole Exome Sequencing
Denis C. Bauer
 
Centralizing sequence analysis
Denis C. Bauer
 
Qbi Centre for Brain genomics (Informatics side)
Denis C. Bauer
 
Differential gene expression
Denis C. Bauer
 
Transcript detection in RNAseq
Denis C. Bauer
 
Functionally annotate genomic variants
Denis C. Bauer
 
Variant (SNPs/Indels) calling in DNA sequences, Part 2
Denis C. Bauer
 
Variant (SNPs/Indels) calling in DNA sequences, Part 1
Denis C. Bauer
 
Introduction to second generation sequencing
Denis C. Bauer
 
Introduction to Bioinformatics
Denis C. Bauer
 
The missing data issue for HiSeq runs
Denis C. Bauer
 
Deciphering the regulatory code in the genome
Denis C. Bauer
 
Ad

Recently uploaded (20)

PPTX
Measures_of_location_-_Averages_and__percentiles_by_DR SURYA K.pptx
Surya Ganesh
 
PPTX
CONCEPT OF CHILD CARE. pptx
AneetaSharma15
 
PPTX
Cleaning Validation Ppt Pharmaceutical validation
Ms. Ashatai Patil
 
PDF
The-Invisible-Living-World-Beyond-Our-Naked-Eye chapter 2.pdf/8th science cur...
Sandeep Swamy
 
PPTX
Artificial-Intelligence-in-Drug-Discovery by R D Jawarkar.pptx
Rahul Jawarkar
 
PPTX
Five Point Someone – Chetan Bhagat | Book Summary & Analysis by Bhupesh Kushwaha
Bhupesh Kushwaha
 
PPTX
Gupta Art & Architecture Temple and Sculptures.pptx
Virag Sontakke
 
PPTX
Continental Accounting in Odoo 18 - Odoo Slides
Celine George
 
PPTX
Dakar Framework Education For All- 2000(Act)
santoshmohalik1
 
PPTX
Basics and rules of probability with real-life uses
ravatkaran694
 
PPTX
Applications of matrices In Real Life_20250724_091307_0000.pptx
gehlotkrish03
 
PPTX
Introduction to pediatric nursing in 5th Sem..pptx
AneetaSharma15
 
PDF
Review of Related Literature & Studies.pdf
Thelma Villaflores
 
PPTX
How to Manage Leads in Odoo 18 CRM - Odoo Slides
Celine George
 
DOCX
SAROCES Action-Plan FOR ARAL PROGRAM IN DEPED
Levenmartlacuna1
 
PPTX
Tips Management in Odoo 18 POS - Odoo Slides
Celine George
 
PPTX
Information Texts_Infographic on Forgetting Curve.pptx
Tata Sevilla
 
PPTX
family health care settings home visit - unit 6 - chn 1 - gnm 1st year.pptx
Priyanshu Anand
 
PPTX
Sonnet 130_ My Mistress’ Eyes Are Nothing Like the Sun By William Shakespear...
DhatriParmar
 
PPTX
HISTORY COLLECTION FOR PSYCHIATRIC PATIENTS.pptx
PoojaSen20
 
Measures_of_location_-_Averages_and__percentiles_by_DR SURYA K.pptx
Surya Ganesh
 
CONCEPT OF CHILD CARE. pptx
AneetaSharma15
 
Cleaning Validation Ppt Pharmaceutical validation
Ms. Ashatai Patil
 
The-Invisible-Living-World-Beyond-Our-Naked-Eye chapter 2.pdf/8th science cur...
Sandeep Swamy
 
Artificial-Intelligence-in-Drug-Discovery by R D Jawarkar.pptx
Rahul Jawarkar
 
Five Point Someone – Chetan Bhagat | Book Summary & Analysis by Bhupesh Kushwaha
Bhupesh Kushwaha
 
Gupta Art & Architecture Temple and Sculptures.pptx
Virag Sontakke
 
Continental Accounting in Odoo 18 - Odoo Slides
Celine George
 
Dakar Framework Education For All- 2000(Act)
santoshmohalik1
 
Basics and rules of probability with real-life uses
ravatkaran694
 
Applications of matrices In Real Life_20250724_091307_0000.pptx
gehlotkrish03
 
Introduction to pediatric nursing in 5th Sem..pptx
AneetaSharma15
 
Review of Related Literature & Studies.pdf
Thelma Villaflores
 
How to Manage Leads in Odoo 18 CRM - Odoo Slides
Celine George
 
SAROCES Action-Plan FOR ARAL PROGRAM IN DEPED
Levenmartlacuna1
 
Tips Management in Odoo 18 POS - Odoo Slides
Celine George
 
Information Texts_Infographic on Forgetting Curve.pptx
Tata Sevilla
 
family health care settings home visit - unit 6 - chn 1 - gnm 1st year.pptx
Priyanshu Anand
 
Sonnet 130_ My Mistress’ Eyes Are Nothing Like the Sun By William Shakespear...
DhatriParmar
 
HISTORY COLLECTION FOR PSYCHIATRIC PATIENTS.pptx
PoojaSen20
 

SUMOylation site prediction

  • 1. SUMOylation-site Prediction Denis C. Bauer Fabian A. Buske Mikael Bod én
  • 2. Overview Background SUMOylation - what is that ? Published predictors Our approach What makes SUMO hard to tackle
  • 3. SUMO is not 相撲 S mall U biquitin-related Mo difier is a small protein of 97 amino acids. 20% homology to ubiquitin Post-translational modification Covalently attached to Lysines Involved in many pathways/mechanisms Transcriptional regulation Compartmentisation
  • 5. SUMOylation motif One consensus motif [ILV]K.E for about 60% of known sites However Not all [ILV]K.E -sites are SUMOylated Not all SUMOylated sites have the consensus motif TP FP FN
  • 6. Baseline prediction Method CC Regular Expression scanner 0.68
  • 7. Comparison with existing predictors + Xu J., BMC Bioinformatics 2008, 9:8 ‡ Xue Y., Nucleic Acid Res 2006, W254 -W 257 † http://www.abgent.com/doc/sumoplot (commercial) Method CC Regular Expression scanner 0.68 SUMOpre + 0.64 SUMOsp ‡ 0.26 SUMOplot † 0.48
  • 8. Case study : Core histones in yeast Identified SUMOylation sites + H2B : K6/7, K16/17 H2A : K2, K126 H4 : somewhere in the tail No SUMOylation consensus site Predictor to date are not able to predict even a single SUMOylation site in the histone sequence + Nathan D., Genes Dev 2006, 20(8):966-76
  • 9. Our approach Identify window size which ML method is best Voil á: better predictor ! Sequence xxxx K xxxx SUMOylation 1/0 ML
  • 10. Training in more Detail w U w D Protein Sequence K Imbalance in the dataset - more negatives than positives SUMOylated K Not SUMOylated K K K ML T 0 1 0 P 1 1 0 K K
  • 11. Prediction in more Detail w U w D Protein Sequence K K K Trained ML 1 1 0 K K SUMOylated K Not SUMOylated K K K
  • 12. ML methods Bidirectional Recurrent Neural Network (BRNN) Using information of flanking windows Decaying with distance to center window Prone to overfit Support Vector Machine (SVM) regularized requires suitable kernel and feature representation Standard Kernels Linear, Polynomial, RBF String Kernel P-kernel, local-alignment kernel
  • 13. Data set Training/Testing data 144 proteins with 241 SUMOylation sites 5,741 non-SUMOylated Lysines 68% of the SUMOulated sites confom to the consensus motif Hold-out 13 proteins with 27 SUMOylation sites 48% consensus motif Xu J., BMC Bioinformatics 2008, 9:8
  • 14. Evaluation 5-fold cross-validation Matthews correlation coefficient (CC) Sensitivity, Specificity, Accuracy Area under the curve ( AUC )
  • 17. Quest to improve performance Protein structural features and evolutionary features Separating SUMOylation sites from different species or compartment Clustering for other motifs using kernel hierarchical clustering
  • 18. Summary Regular Expression Scanner is still the best classifier. SUMO more versatile than expected ! The road to better predictions Are there other motifs? Which features can discriminate? Is the dataset biased? http://spot.colorado.edu/~colemab/Theatre_Resources/SumoBallerina.jpg
  • 19. Acknowledgment Predictor/Analysis Mikael Bod én Fabian Buske Dataset Xu et al. PhD Supervisors Tim Bailey Andrew Perkins Mikael Bod én Other Bioinformatic tools: STREAM – a practical workbench for modeling transcriptional regulation. www.bioinformatics.org.au/stream/