SlideShare a Scribd company logo
Statistics on big biomedical data
Methods and pitfalls when analyzing high-throughput
screens
Lars Juhl Jensen
Statistics on big biomedical data
Methods and pitfalls when analyzing high-throughput
screens
Lars Juhl Jensen
t-test
ANOVA
normal distribution
useful tests
counts
contingency table
Jensen et al., Nature Reviews Genetics, 2012
Fisher’s exact test
real numbers
no theoretical distribution
non-parametric statistics
do the medians differ?
Mann–Whitney U test
medians can mislead you
do the distributions differ?
Kolmogorov–Smirnov test
Statistics on big biomedical data: Methods and pitfalls when analyzing high-throughput screens
does not tell how they differ
resampling
Monte Carlo testing
Statistics on big biomedical data: Methods and pitfalls when analyzing high-throughput screens
always applicable
compute intensive
multiple testing
xkcd.com
xkcd.com
xkcd.com
xkcd.com
compare multiple condition
Gene Ontology enrichment
Bonferroni
avoid making any errors
too conservative
Benjamini–Hochberg
control false discovery rate
assumes independence
resampling
negative set
systematic biases
Huang et al., Journal of Proteome Research, 2014
studiedness bias
we study disease proteins
thus we know many PTMs
abundance bias
higher expressed
easier to detect in assays
better characterized
matched background
the big data effect
if you have enough data
any difference is significant
but maybe not relevant
biases become significant
“significant”
statistical significance
p-value
biological relevance
effect size
significant and relevant
volcano plots
Lundby et al., Science Signaling, 2013
rather ad hoc
questions?

More Related Content

PPT
Statistics on big biomedical data - Methods and pitfalls when analyzing high...
Lars Juhl Jensen
 
PDF
Unifying Genomics, Phenomics, and Environments
Anne Thessen
 
PDF
littenberg-strep
Meredith Woodward King
 
PPT
Cómo distinguir una investigación seria de una fraudulenta
antenasysalud
 
PDF
Prediction research in a pandemic: 3 lessons from a living systematic review ...
Laure Wynants
 
DOCX
Print
Vani Mam
 
Statistics on big biomedical data - Methods and pitfalls when analyzing high...
Lars Juhl Jensen
 
Unifying Genomics, Phenomics, and Environments
Anne Thessen
 
littenberg-strep
Meredith Woodward King
 
Cómo distinguir una investigación seria de una fraudulenta
antenasysalud
 
Prediction research in a pandemic: 3 lessons from a living systematic review ...
Laure Wynants
 
Print
Vani Mam
 

What's hot (19)

PPTX
Responsible Conduct of Research
T.J. Kasperbauer
 
PPTX
Ethics and Stem Cells
T.J. Kasperbauer
 
PPTX
Genomics privacy
T.J. Kasperbauer
 
PDF
Dominick Frosch, Many Miles to Go……Implementing Shared Decision Making in Rou...
The Petrie-Flom Center for Health Law Policy, Biotechnology, and Bioethics
 
PDF
Glyn Elwyn, Shared Decision Making... a dangerous idea
The Petrie-Flom Center for Health Law Policy, Biotechnology, and Bioethics
 
PPTX
How To Lie With Statistics Chapter 10
bookerenc1101
 
PPTX
Secondary Data Analysis
REY DECASTRO
 
PPT
Covering Screening Tests: Do No Harm (As A Reporter)
Ivan Oransky
 
ODP
ConstructPrecisePhenotypesBigDataChallenge
Athula Herath
 
PDF
Zorg | 150129 | Big Data | Een optie voor de toekomst van preventieve medisch...
Flevum
 
PPTX
Big Data: Learning from MIMIC- Celi
intensivecaresociety
 
PDF
Poster Validation of child search filters for Pubmed-18th Cochrane Colloquium
Edith Leclercq
 
PPT
Clinical Research Issues
Connie Dello Buono
 
PPTX
BYO App: Announcing Linq from Open mHealth
Ida Sim
 
PPT
Collin O´Neil MedicReS 5th World Congress 2015
MedicReS
 
PPT
PT 610: EBP and Information Management
Clista Clanton
 
PPTX
Principles of data_science
tvk66866
 
PPTX
Share & Flourish workshop, Leiden, August 2014
Varsha Khodiyar
 
PDF
Norwegian clinical genetics analysis platform ”genAP”, Thomas Grünfeld and To...
The Research Council of Norway, IKTPLUSS
 
Responsible Conduct of Research
T.J. Kasperbauer
 
Ethics and Stem Cells
T.J. Kasperbauer
 
Genomics privacy
T.J. Kasperbauer
 
Dominick Frosch, Many Miles to Go……Implementing Shared Decision Making in Rou...
The Petrie-Flom Center for Health Law Policy, Biotechnology, and Bioethics
 
Glyn Elwyn, Shared Decision Making... a dangerous idea
The Petrie-Flom Center for Health Law Policy, Biotechnology, and Bioethics
 
How To Lie With Statistics Chapter 10
bookerenc1101
 
Secondary Data Analysis
REY DECASTRO
 
Covering Screening Tests: Do No Harm (As A Reporter)
Ivan Oransky
 
ConstructPrecisePhenotypesBigDataChallenge
Athula Herath
 
Zorg | 150129 | Big Data | Een optie voor de toekomst van preventieve medisch...
Flevum
 
Big Data: Learning from MIMIC- Celi
intensivecaresociety
 
Poster Validation of child search filters for Pubmed-18th Cochrane Colloquium
Edith Leclercq
 
Clinical Research Issues
Connie Dello Buono
 
BYO App: Announcing Linq from Open mHealth
Ida Sim
 
Collin O´Neil MedicReS 5th World Congress 2015
MedicReS
 
PT 610: EBP and Information Management
Clista Clanton
 
Principles of data_science
tvk66866
 
Share & Flourish workshop, Leiden, August 2014
Varsha Khodiyar
 
Norwegian clinical genetics analysis platform ”genAP”, Thomas Grünfeld and To...
The Research Council of Norway, IKTPLUSS
 
Ad

Similar to Statistics on big biomedical data: Methods and pitfalls when analyzing high-throughput screens (20)

PDF
Bioinformatics Strategies for Exposome 100416
Chirag Patel
 
PPT
Day2 145pm Crawford
Sean Paul
 
PDF
Critical appraisal: How to read a scientific paper?
Mohammed Abd El Wadood
 
PDF
Informatics and data analytics to support for exposome-based discovery
Chirag Patel
 
PPTX
UAB Pulmonary board review study design and statistical principles
Terry Shaneyfelt
 
PPTX
Depersonalising medicine
Stephen Senn
 
PPTX
Montgomery expression
morenorossi
 
PPT
02 Study Designs - Research Methodology Workshop - Aug 2011.ppt
ParameshwariPrahalad
 
PPT
Lessons learned in polygenic risk research | Grand Rapids, MI 2019
Cecile Janssens
 
PPTX
Research by MAGIC
Mitchell Maltenfort
 
PDF
Data analytics to support exposome research course slides
Chirag Patel
 
PDF
Duzkale_2013_Variant Interpretation_
Hatice Duzkale, MD, MPH, PhD, FACMG
 
PDF
Methods to enhance the validity of precision guidelines emerging from big data
Chirag Patel
 
PPT
Large-scale biomedical data and text integration
Lars Juhl Jensen
 
PPT
Diabetes Systems Biology And Genetics V6
cphensley
 
PPTX
Basics of Research and Bias
Brian Wells, MD, MS, MPH
 
PPT
Surviving statistics lecture 1
MikeBlyth
 
PPTX
Overview of different statistical tests used in epidemiological
shefali jain
 
PPTX
Biomarkers for psychological phenotypes?
Dorothy Bishop
 
PDF
Repurposing large datasets for exposomic discovery in disease
Chirag Patel
 
Bioinformatics Strategies for Exposome 100416
Chirag Patel
 
Day2 145pm Crawford
Sean Paul
 
Critical appraisal: How to read a scientific paper?
Mohammed Abd El Wadood
 
Informatics and data analytics to support for exposome-based discovery
Chirag Patel
 
UAB Pulmonary board review study design and statistical principles
Terry Shaneyfelt
 
Depersonalising medicine
Stephen Senn
 
Montgomery expression
morenorossi
 
02 Study Designs - Research Methodology Workshop - Aug 2011.ppt
ParameshwariPrahalad
 
Lessons learned in polygenic risk research | Grand Rapids, MI 2019
Cecile Janssens
 
Research by MAGIC
Mitchell Maltenfort
 
Data analytics to support exposome research course slides
Chirag Patel
 
Duzkale_2013_Variant Interpretation_
Hatice Duzkale, MD, MPH, PhD, FACMG
 
Methods to enhance the validity of precision guidelines emerging from big data
Chirag Patel
 
Large-scale biomedical data and text integration
Lars Juhl Jensen
 
Diabetes Systems Biology And Genetics V6
cphensley
 
Basics of Research and Bias
Brian Wells, MD, MS, MPH
 
Surviving statistics lecture 1
MikeBlyth
 
Overview of different statistical tests used in epidemiological
shefali jain
 
Biomarkers for psychological phenotypes?
Dorothy Bishop
 
Repurposing large datasets for exposomic discovery in disease
Chirag Patel
 
Ad

More from Lars Juhl Jensen (20)

PPT
One tagger, many uses: Illustrating the power of dictionary-based named entit...
Lars Juhl Jensen
 
PPT
One tagger, many uses: Simple text-mining strategies for biomedicine
Lars Juhl Jensen
 
PPT
Extract 2.0: Text-mining-assisted interactive annotation
Lars Juhl Jensen
 
PPT
Network visualization: A crash course on using Cytoscape
Lars Juhl Jensen
 
PPT
STRING & STITCH : Network integration of heterogeneous data
Lars Juhl Jensen
 
PPT
Biomedical text mining: Automatic processing of unstructured text
Lars Juhl Jensen
 
PPT
Medical network analysis: Linking diseases and genes through data and text mi...
Lars Juhl Jensen
 
PPT
Network Biology: A crash course on STRING and Cytoscape
Lars Juhl Jensen
 
PPT
Cellular networks
Lars Juhl Jensen
 
PPT
Cellular Network Biology: Large-scale integration of data and text
Lars Juhl Jensen
 
PPT
STRING & related databases: Large-scale integration of heterogeneous data
Lars Juhl Jensen
 
PPT
Tagger: Rapid dictionary-based named entity recognition
Lars Juhl Jensen
 
PPT
Network Biology: Large-scale integration of data and text
Lars Juhl Jensen
 
PPT
Medical text mining: Linking diseases, drugs, and adverse reactions
Lars Juhl Jensen
 
PPT
Network biology: Large-scale integration of data and text
Lars Juhl Jensen
 
PPT
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Lars Juhl Jensen
 
PPT
Cellular Network Biology
Lars Juhl Jensen
 
PPT
Network biology: Large-scale integration of data and text
Lars Juhl Jensen
 
PPT
Biomarker bioinformatics: Network-based candidate prioritization
Lars Juhl Jensen
 
PPT
The Art of Counting: Scoring and ranking co-occurrences in literature
Lars Juhl Jensen
 
One tagger, many uses: Illustrating the power of dictionary-based named entit...
Lars Juhl Jensen
 
One tagger, many uses: Simple text-mining strategies for biomedicine
Lars Juhl Jensen
 
Extract 2.0: Text-mining-assisted interactive annotation
Lars Juhl Jensen
 
Network visualization: A crash course on using Cytoscape
Lars Juhl Jensen
 
STRING & STITCH : Network integration of heterogeneous data
Lars Juhl Jensen
 
Biomedical text mining: Automatic processing of unstructured text
Lars Juhl Jensen
 
Medical network analysis: Linking diseases and genes through data and text mi...
Lars Juhl Jensen
 
Network Biology: A crash course on STRING and Cytoscape
Lars Juhl Jensen
 
Cellular networks
Lars Juhl Jensen
 
Cellular Network Biology: Large-scale integration of data and text
Lars Juhl Jensen
 
STRING & related databases: Large-scale integration of heterogeneous data
Lars Juhl Jensen
 
Tagger: Rapid dictionary-based named entity recognition
Lars Juhl Jensen
 
Network Biology: Large-scale integration of data and text
Lars Juhl Jensen
 
Medical text mining: Linking diseases, drugs, and adverse reactions
Lars Juhl Jensen
 
Network biology: Large-scale integration of data and text
Lars Juhl Jensen
 
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Lars Juhl Jensen
 
Cellular Network Biology
Lars Juhl Jensen
 
Network biology: Large-scale integration of data and text
Lars Juhl Jensen
 
Biomarker bioinformatics: Network-based candidate prioritization
Lars Juhl Jensen
 
The Art of Counting: Scoring and ranking co-occurrences in literature
Lars Juhl Jensen
 

Recently uploaded (20)

PPTX
Nanofertilizer: Its potential benefits and associated challenges.pptx
BikramjitDeuri
 
PPTX
Qualification of.UV visible spectrophotometer pptx
shrutipandit17
 
PDF
Multiwavelength Study of a Hyperluminous X-Ray Source near NGC6099: A Strong ...
Sérgio Sacani
 
PPTX
Internal Capsule_Divisions_fibres_lesions
muralinath2
 
PDF
study of microbiologically influenced corrosion of 2205 duplex stainless stee...
ahmadfreak180
 
PPTX
Home Garden as a Component of Agroforestry system : A survey-based Study
AkhangshaRoy
 
PPTX
first COT (MATH).pptxCSAsCNKHPHCouAGSCAUO:GC/ZKVHxsacba
DitaSIdnay
 
DOCX
Echoes_of_Andromeda_Partial (1).docx9989
yakshitkrishnia5a3
 
PPTX
Hydrocarbons Pollution. OIL pollutionpptx
AkCreation33
 
PDF
JADESreveals a large population of low mass black holes at high redshift
Sérgio Sacani
 
PPTX
INTRO-TO-CRIM-THEORIES-OF-CRIME-2023 (1).pptx
ChrisFlickIII
 
PPTX
Modifications in RuBisCO system to enhance photosynthesis .pptx
raghumolbiotech
 
PPTX
The Toxic Effects of Aflatoxin B1 and Aflatoxin M1 on Kidney through Regulati...
OttokomaBonny
 
PPTX
Unit 4 - Astronomy and Astrophysics - Milky Way And External Galaxies
RDhivya6
 
PPT
Grade_9_Science_Atomic_S_t_r_u_cture.ppt
QuintReynoldDoble
 
PDF
Migrating Katalon Studio Tests to Playwright with Model Driven Engineering
ESUG
 
PPTX
General Characters and Classification of Su class Apterygota.pptx
Dr Showkat Ahmad Wani
 
PPTX
Hepatopulmonary syndrome power point presentation
raknasivar1997
 
PPTX
Quality control test for plastic & metal.pptx
shrutipandit17
 
PPTX
Feeding stratagey for climate change dairy animals.
Dr.Zulfy haq
 
Nanofertilizer: Its potential benefits and associated challenges.pptx
BikramjitDeuri
 
Qualification of.UV visible spectrophotometer pptx
shrutipandit17
 
Multiwavelength Study of a Hyperluminous X-Ray Source near NGC6099: A Strong ...
Sérgio Sacani
 
Internal Capsule_Divisions_fibres_lesions
muralinath2
 
study of microbiologically influenced corrosion of 2205 duplex stainless stee...
ahmadfreak180
 
Home Garden as a Component of Agroforestry system : A survey-based Study
AkhangshaRoy
 
first COT (MATH).pptxCSAsCNKHPHCouAGSCAUO:GC/ZKVHxsacba
DitaSIdnay
 
Echoes_of_Andromeda_Partial (1).docx9989
yakshitkrishnia5a3
 
Hydrocarbons Pollution. OIL pollutionpptx
AkCreation33
 
JADESreveals a large population of low mass black holes at high redshift
Sérgio Sacani
 
INTRO-TO-CRIM-THEORIES-OF-CRIME-2023 (1).pptx
ChrisFlickIII
 
Modifications in RuBisCO system to enhance photosynthesis .pptx
raghumolbiotech
 
The Toxic Effects of Aflatoxin B1 and Aflatoxin M1 on Kidney through Regulati...
OttokomaBonny
 
Unit 4 - Astronomy and Astrophysics - Milky Way And External Galaxies
RDhivya6
 
Grade_9_Science_Atomic_S_t_r_u_cture.ppt
QuintReynoldDoble
 
Migrating Katalon Studio Tests to Playwright with Model Driven Engineering
ESUG
 
General Characters and Classification of Su class Apterygota.pptx
Dr Showkat Ahmad Wani
 
Hepatopulmonary syndrome power point presentation
raknasivar1997
 
Quality control test for plastic & metal.pptx
shrutipandit17
 
Feeding stratagey for climate change dairy animals.
Dr.Zulfy haq
 

Statistics on big biomedical data: Methods and pitfalls when analyzing high-throughput screens