SlideShare a Scribd company logo
Multivariate Data Analysis and Visualization Tools for Understanding Biological Data   Dmitry Grapov
Introduction:  Systems Oltvai, et al. Science 25 October 2002: 763-764.   Emergent Reductionist  Deterministic Systems Complex systems Chemical analysis Physiology Biochemistry Graph theory Modeling Informatics
Introduction:  Inference
http://www.thefullwiki.org/Hypercube  Overview many correlation mean Central Idea: dendrograms heatmaps biplots networks scatter plots histograms densities Representations: matrix matrix vector Properties: Multivariate n-D Bivariate 2-D Univariate 1-D Types:
Univariate:  Properties   vector of length m mean variance
Univariate:  Representations
Univariate:  Assumptions Normality
Univariate:  Utility Hypothesis testing α   -  type I error  ( False Positive) β   -  type II error  ( False negative) power  -  (1– β ) effect size - standardized difference in mean
Univariate:  Limitations Biological definition of the mean ? Relationship between sample size and test power Multiple hypothesis testing False discovery rate
Old Faithful Data   272 observations time between eruptions 70 ± 14 min duration of eruption 3.5 ± 1 min Azzalini, A. and Bowman, A. W. (1990). A look at some data on the Old Faithful geyser.  Applied Statistics   39 , 357–365
Matrix of 2 vectors of length m  Bivariate:  Properties
( X , Y ) Bivariate:  Representations
( X , Y ) Bivariate:  Utility bivariate distribution correlation Variable 2  = m* Variable 1  + b
http://en.wikipedia.org/wiki/Correlation   Bivariate:  Limitations correlation coefficient Measure of linear or monotonic relationship
http://en.wikipedia.org/wiki/Correlation   Bivariate:  Limitations Sensitive to outliers
Old Faithful Azzalini, A. and Bowman, A. W. (1990). A look at some data on the Old Faithful geyser.  Applied Statistics   39 , 357–365
Old Unfaithful?
Old Unfaithful? Additional variables Nearby hydrofracking Improve inference based on more information
Old Unfaithful? Additional variables Nearby hydrofracking Improve inference based on more information
Challenges data often wide structured integration noise Rewards robust inference signal amplification holistic/systems approach A matrix of n vectors of length m Multivariate:  Properties Correlation matrix
Principal Components Analysis (PCA) Linear n-dimensional encoding of original data  Where dimensions are: orthogonal (uncorrelated) Top k dimensions are ordered by variance explained Multivariate:   Dimensional Reduction PC 2 PC 1
Multivariate:   Dimensional Reduction Wall, Michael E., Andreas Rechtsteiner, Luis M. Rocha."Singular value decomposition and principal component analysis". in  A Practical Approach to Microarray Data Analysis . D.P. Berrar, W. Dubitzky, M. Granzow, eds. pp. 91-109, Kluwer: Norwell, MA (2003). LANL LA-UR-02-4001.  Scores Loadings Explained variance m x PC PC x PC n x PC Original Data Calculating PCs: singular value decomposition (SVD) Eigenvalue explained variance Scores   sample representation based on all variables Loadings variable contribution to scores
Old Faithful 2.0 272 measurements 8 variables 2 real, 6 random noise A matrix of n vectors of length m Multivariate:  Representations
Multivariate:  Representation Identify outliers using all measurements Use known to impute missing Identify interesting groups Evaluate uni- and bivariate observations Number of PCs can be used true data complexity
PCA:  Considerations data pre-treatment  outliers  noise unsupervised projection no pre-treatment centered  and scaled to unit variance
PCA:  Considerations data pre-treatment  outliers  linear reconstruction noise Independent components analysis (ICA)  unsupervised projection Use ICA to calculate statistically independent components
PCA:  Considerations data pre-treatment  outliers  linear reconstruction noise supervised projection Non-negative matrix factorization (NMF) NMF uses additive parts based encoding Learning the parts of objects by nonnegative matrix factorization,  D.D. Lee,H.S. Seung, Zhipeng Zhao, ppt.
PCA:  Considerations data pre-treatment  outliers  linear reconstruction noise supervised projection Identify projection correlated with class assignment (classification) or continuous variables (regression) Partial Least Squares Projection to Latent Structures (PLS/-DA)
PLS/-DA: Utility Strengths Predict multiple dependent variables avoids issues of multicollinearity Independent measure of variable importance Weaknesses Need to derive an empirical reference for model performance Poor established model optimization methods
PLS-DA: Example Data: Old Faithful 2.0 272 observations on 8 variables Latent Variables are analogous to PCs Important Statistics (CV) Q2 = fit RMSEP = error of prediction AU(RO)C = specificity vs. sensitivity Select the appropriate number Latent Variables (LVs) to maximize Q2
PLS-DA: Performance Use permutation tests to empirically determine model performance
PLS-DA: Performance Use permutation tests to empirically determine model performance
PLS: Predictive Performance Split data into training (2/3) and test sets (1/3) Generate model using training set and then predict class assignment for test set Use permutation tests to generate confidence bounds for future predictions
PLS: Predictive Performance
PLS: Feature Selection Use the PLS-DA as an objective function to identify the most informative variables
Networks Network: representation of relationships among objects Utility Project statistical results into a biological context Explore informative data aspects in the context of all that was observed. Identify emergent patterns
Networks Interpret statistical results within a biological context
Networks Highlight changes in patterns of relationships.  non-diabetics type 2 diabetics
Networks Display complex interactions non-diabetics type 2 diabetics
non-diabetics type 2 diabetics imDEV :  interactive modules for Data Exploration and Visualization   An integrated environment for systems level analysis of multivariate data. http:// sourceforge.net/apps/mediawiki/imdev
Acknowledgements Newman Lab  Designated Emphasis in Biotechnology (DEB) NIH This project is funded in part by the NIH grant NIGMS-NIH T32-GM008799, USDA-ARS 5306-51530-019-00D, and NIH-NIDDK R01DK078328 -01.

More Related Content

Viewers also liked (20)

PPTX
4 partial least squares modeling
Dmitry Grapov
 
PPTX
Multivariate data analysis
Setia Pramana
 
PPT
Ecology jt2012
Caroline Holmes
 
PDF
188904603 apostila-execucao-fiscal-mauro-luis-rocha-lopes-1
Droit ZeitGeist
 
PDF
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
Till Rohrmann
 
PDF
Multivariate statistics
Veneficus
 
PDF
Multivariate
dessybudiyanti
 
PPTX
Connecting Metabolomic Data with Context
Dmitry Grapov
 
PPTX
Complex Systems Biology Informed Data Analysis and Machine Learning
Dmitry Grapov
 
PPTX
High Dimensional Biological Data Analysis and Visualization
Dmitry Grapov
 
PPTX
Automation of (Biological) Data Analysis and Report Generation
Dmitry Grapov
 
PPTX
Mapping to the Metabolomic Manifold
Dmitry Grapov
 
PPTX
Theories Of Normality
Jade Sun
 
PDF
Open Universiteit Pls Prestation November 1st
PaulGhijsen
 
PPTX
An introduction to denial of service attack
Mohammad Reza Mousavinasr
 
PPTX
dos attacks
AMAL PERUMPALLIL
 
PPTX
Human population Ecology
Maria Donohue
 
PDF
How to write up and report PLS analyses-三星統計張偉豪-20141004
Beckett Hsieh
 
PPT
Gene Ontology Enrichment Network Analysis -Tutorial
Dmitry Grapov
 
PPT
Prote-OMIC Data Analysis and Visualization
Dmitry Grapov
 
4 partial least squares modeling
Dmitry Grapov
 
Multivariate data analysis
Setia Pramana
 
Ecology jt2012
Caroline Holmes
 
188904603 apostila-execucao-fiscal-mauro-luis-rocha-lopes-1
Droit ZeitGeist
 
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
Till Rohrmann
 
Multivariate statistics
Veneficus
 
Multivariate
dessybudiyanti
 
Connecting Metabolomic Data with Context
Dmitry Grapov
 
Complex Systems Biology Informed Data Analysis and Machine Learning
Dmitry Grapov
 
High Dimensional Biological Data Analysis and Visualization
Dmitry Grapov
 
Automation of (Biological) Data Analysis and Report Generation
Dmitry Grapov
 
Mapping to the Metabolomic Manifold
Dmitry Grapov
 
Theories Of Normality
Jade Sun
 
Open Universiteit Pls Prestation November 1st
PaulGhijsen
 
An introduction to denial of service attack
Mohammad Reza Mousavinasr
 
dos attacks
AMAL PERUMPALLIL
 
Human population Ecology
Maria Donohue
 
How to write up and report PLS analyses-三星統計張偉豪-20141004
Beckett Hsieh
 
Gene Ontology Enrichment Network Analysis -Tutorial
Dmitry Grapov
 
Prote-OMIC Data Analysis and Visualization
Dmitry Grapov
 

Similar to Multivariate data analysis and visualization tools for biological data (20)

PDF
Machine Learning.pdf
BeyaNasr1
 
PPTX
Metabolomic Data Analysis Workshop and Tutorials (2014)
Dmitry Grapov
 
PPTX
0 introduction
Dmitry Grapov
 
PDF
Machine learning Mind Map
Ashish Patel
 
PDF
Statistical analysis
Xiuxia Du
 
PPTX
ML unit2.pptx
SwarnaKumariChinni
 
PPTX
Singular Value Decomposition (SVD).pptx
rajalakshmi5921
 
PPTX
EDAB Module 5 Singular Value Decomposition (SVD).pptx
rajalakshmi5921
 
PPT
Multivariate Analysis and Visualization of Proteomic Data
UC Davis
 
PPTX
Some statistical concepts relevant to proteomics data analysis
UC Davis
 
PPTX
EDAB - Principal Components Analysis and Classification -Module - 5.pptx
preethiBP2
 
PPTX
Exploratory Data Analysis and Machine Learning.pptx
AraniNavaratnarajah2
 
PDF
report
Arthur He
 
PDF
Lecture_note1.pdf
EssaAlMadhagi
 
PDF
Fundamentals of data science presentation
topuri1218
 
PDF
An intro to applied multi stat with r by everitt et al
Razzaqe
 
PDF
Chapter 02-logistic regression
Raman Kannan
 
PDF
Subject-3---Bayesian-regression-models-2024.pdf
faiber13
 
PPT
Lect5_GSEA_Classify (1).ppt
SaiGanesh836443
 
PPTX
Chap2-Data.pptx. It is all about data in data mining.
stuti8985
 
Machine Learning.pdf
BeyaNasr1
 
Metabolomic Data Analysis Workshop and Tutorials (2014)
Dmitry Grapov
 
0 introduction
Dmitry Grapov
 
Machine learning Mind Map
Ashish Patel
 
Statistical analysis
Xiuxia Du
 
ML unit2.pptx
SwarnaKumariChinni
 
Singular Value Decomposition (SVD).pptx
rajalakshmi5921
 
EDAB Module 5 Singular Value Decomposition (SVD).pptx
rajalakshmi5921
 
Multivariate Analysis and Visualization of Proteomic Data
UC Davis
 
Some statistical concepts relevant to proteomics data analysis
UC Davis
 
EDAB - Principal Components Analysis and Classification -Module - 5.pptx
preethiBP2
 
Exploratory Data Analysis and Machine Learning.pptx
AraniNavaratnarajah2
 
report
Arthur He
 
Lecture_note1.pdf
EssaAlMadhagi
 
Fundamentals of data science presentation
topuri1218
 
An intro to applied multi stat with r by everitt et al
Razzaqe
 
Chapter 02-logistic regression
Raman Kannan
 
Subject-3---Bayesian-regression-models-2024.pdf
faiber13
 
Lect5_GSEA_Classify (1).ppt
SaiGanesh836443
 
Chap2-Data.pptx. It is all about data in data mining.
stuti8985
 
Ad

More from Dmitry Grapov (20)

PDF
R programming for Data Science - A Beginner’s Guide
Dmitry Grapov
 
PDF
Network mapping 101 course
Dmitry Grapov
 
PDF
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
Dmitry Grapov
 
PDF
Dmitry Grapov Resume and CV
Dmitry Grapov
 
PPTX
Machine Learning Powered Metabolomic Network Analysis
Dmitry Grapov
 
PPTX
Data analysis workflows part 1 2015
Dmitry Grapov
 
PPTX
Data analysis workflows part 2 2015
Dmitry Grapov
 
PPTX
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
Dmitry Grapov
 
PDF
Case Study: Overview of Metabolomic Data Normalization Strategies
Dmitry Grapov
 
PPTX
Modeling poster
Dmitry Grapov
 
PPTX
3 data normalization (2014 lab tutorial)
Dmitry Grapov
 
PPTX
Normalization of Large-Scale Metabolomic Studies 2014
Dmitry Grapov
 
PPTX
American Society of Mass Spectrommetry Conference 2014
Dmitry Grapov
 
PPT
Multivarite and network tools for biological data analysis
Dmitry Grapov
 
PPTX
Data Normalization Approaches for Large-scale Biological Studies
Dmitry Grapov
 
PPTX
Omic Data Integration Strategies
Dmitry Grapov
 
PPTX
Metabolomic data analysis and visualization tools
Dmitry Grapov
 
PPTX
6 metabolite enrichment analysis
Dmitry Grapov
 
PPTX
5 data analysis case study
Dmitry Grapov
 
PPTX
3 principal components analysis
Dmitry Grapov
 
R programming for Data Science - A Beginner’s Guide
Dmitry Grapov
 
Network mapping 101 course
Dmitry Grapov
 
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
Dmitry Grapov
 
Dmitry Grapov Resume and CV
Dmitry Grapov
 
Machine Learning Powered Metabolomic Network Analysis
Dmitry Grapov
 
Data analysis workflows part 1 2015
Dmitry Grapov
 
Data analysis workflows part 2 2015
Dmitry Grapov
 
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
Dmitry Grapov
 
Case Study: Overview of Metabolomic Data Normalization Strategies
Dmitry Grapov
 
Modeling poster
Dmitry Grapov
 
3 data normalization (2014 lab tutorial)
Dmitry Grapov
 
Normalization of Large-Scale Metabolomic Studies 2014
Dmitry Grapov
 
American Society of Mass Spectrommetry Conference 2014
Dmitry Grapov
 
Multivarite and network tools for biological data analysis
Dmitry Grapov
 
Data Normalization Approaches for Large-scale Biological Studies
Dmitry Grapov
 
Omic Data Integration Strategies
Dmitry Grapov
 
Metabolomic data analysis and visualization tools
Dmitry Grapov
 
6 metabolite enrichment analysis
Dmitry Grapov
 
5 data analysis case study
Dmitry Grapov
 
3 principal components analysis
Dmitry Grapov
 
Ad

Recently uploaded (20)

PPT
Talk on Critical Theory, Part II, Philosophy of Social Sciences
Soraj Hongladarom
 
PDF
Generative AI: it's STILL not a robot (CIJ Summer 2025)
Paul Bradshaw
 
PDF
The-Ever-Evolving-World-of-Science (1).pdf/7TH CLASS CURIOSITY /1ST CHAPTER/B...
Sandeep Swamy
 
PPTX
Unit 2 COMMERCIAL BANKING, Corporate banking.pptx
AnubalaSuresh1
 
PDF
0725.WHITEPAPER-UNIQUEWAYSOFPROTOTYPINGANDUXNOW.pdf
Thomas GIRARD, MA, CDP
 
PDF
CONCURSO DE POESIA “POETUFAS – PASSOS SUAVES PELO VERSO.pdf
Colégio Santa Teresinha
 
PPTX
2025 Winter SWAYAM NPTEL & A Student.pptx
Utsav Yagnik
 
PDF
The dynastic history of the Chahmana.pdf
PrachiSontakke5
 
PPTX
CATEGORIES OF NURSING PERSONNEL: HOSPITAL & COLLEGE
PRADEEP ABOTHU
 
PDF
The Constitution Review Committee (CRC) has released an updated schedule for ...
nservice241
 
PPTX
A PPT on Alfred Lord Tennyson's Ulysses.
Beena E S
 
PPTX
PATIENT ASSIGNMENTS AND NURSING CARE RESPONSIBILITIES.pptx
PRADEEP ABOTHU
 
PPTX
STAFF DEVELOPMENT AND WELFARE: MANAGEMENT
PRADEEP ABOTHU
 
PPTX
grade 5 lesson matatag ENGLISH 5_Q1_PPT_WEEK4.pptx
SireQuinn
 
PDF
DIGESTION OF CARBOHYDRATES,PROTEINS,LIPIDS
raviralanaresh2
 
PDF
LAW OF CONTRACT ( 5 YEAR LLB & UNITARY LLB)- MODULE-3 - LEARN THROUGH PICTURE
APARNA T SHAIL KUMAR
 
PDF
Stokey: A Jewish Village by Rachel Kolsky
History of Stoke Newington
 
PDF
Women's Health: Essential Tips for Every Stage.pdf
Iftikhar Ahmed
 
PDF
Knee Extensor Mechanism Injuries - Orthopedic Radiologic Imaging
Sean M. Fox
 
PDF
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - GLOBAL SUCCESS - CẢ NĂM - NĂM 2024 (VOCABULARY, ...
Nguyen Thanh Tu Collection
 
Talk on Critical Theory, Part II, Philosophy of Social Sciences
Soraj Hongladarom
 
Generative AI: it's STILL not a robot (CIJ Summer 2025)
Paul Bradshaw
 
The-Ever-Evolving-World-of-Science (1).pdf/7TH CLASS CURIOSITY /1ST CHAPTER/B...
Sandeep Swamy
 
Unit 2 COMMERCIAL BANKING, Corporate banking.pptx
AnubalaSuresh1
 
0725.WHITEPAPER-UNIQUEWAYSOFPROTOTYPINGANDUXNOW.pdf
Thomas GIRARD, MA, CDP
 
CONCURSO DE POESIA “POETUFAS – PASSOS SUAVES PELO VERSO.pdf
Colégio Santa Teresinha
 
2025 Winter SWAYAM NPTEL & A Student.pptx
Utsav Yagnik
 
The dynastic history of the Chahmana.pdf
PrachiSontakke5
 
CATEGORIES OF NURSING PERSONNEL: HOSPITAL & COLLEGE
PRADEEP ABOTHU
 
The Constitution Review Committee (CRC) has released an updated schedule for ...
nservice241
 
A PPT on Alfred Lord Tennyson's Ulysses.
Beena E S
 
PATIENT ASSIGNMENTS AND NURSING CARE RESPONSIBILITIES.pptx
PRADEEP ABOTHU
 
STAFF DEVELOPMENT AND WELFARE: MANAGEMENT
PRADEEP ABOTHU
 
grade 5 lesson matatag ENGLISH 5_Q1_PPT_WEEK4.pptx
SireQuinn
 
DIGESTION OF CARBOHYDRATES,PROTEINS,LIPIDS
raviralanaresh2
 
LAW OF CONTRACT ( 5 YEAR LLB & UNITARY LLB)- MODULE-3 - LEARN THROUGH PICTURE
APARNA T SHAIL KUMAR
 
Stokey: A Jewish Village by Rachel Kolsky
History of Stoke Newington
 
Women's Health: Essential Tips for Every Stage.pdf
Iftikhar Ahmed
 
Knee Extensor Mechanism Injuries - Orthopedic Radiologic Imaging
Sean M. Fox
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - GLOBAL SUCCESS - CẢ NĂM - NĂM 2024 (VOCABULARY, ...
Nguyen Thanh Tu Collection
 

Multivariate data analysis and visualization tools for biological data

  • 1. Multivariate Data Analysis and Visualization Tools for Understanding Biological Data Dmitry Grapov
  • 2. Introduction: Systems Oltvai, et al. Science 25 October 2002: 763-764. Emergent Reductionist Deterministic Systems Complex systems Chemical analysis Physiology Biochemistry Graph theory Modeling Informatics
  • 4. http://www.thefullwiki.org/Hypercube Overview many correlation mean Central Idea: dendrograms heatmaps biplots networks scatter plots histograms densities Representations: matrix matrix vector Properties: Multivariate n-D Bivariate 2-D Univariate 1-D Types:
  • 5. Univariate: Properties vector of length m mean variance
  • 8. Univariate: Utility Hypothesis testing α - type I error ( False Positive) β - type II error ( False negative) power - (1– β ) effect size - standardized difference in mean
  • 9. Univariate: Limitations Biological definition of the mean ? Relationship between sample size and test power Multiple hypothesis testing False discovery rate
  • 10. Old Faithful Data 272 observations time between eruptions 70 ± 14 min duration of eruption 3.5 ± 1 min Azzalini, A. and Bowman, A. W. (1990). A look at some data on the Old Faithful geyser. Applied Statistics 39 , 357–365
  • 11. Matrix of 2 vectors of length m Bivariate: Properties
  • 12. ( X , Y ) Bivariate: Representations
  • 13. ( X , Y ) Bivariate: Utility bivariate distribution correlation Variable 2 = m* Variable 1 + b
  • 14. http://en.wikipedia.org/wiki/Correlation Bivariate: Limitations correlation coefficient Measure of linear or monotonic relationship
  • 16. Old Faithful Azzalini, A. and Bowman, A. W. (1990). A look at some data on the Old Faithful geyser. Applied Statistics 39 , 357–365
  • 18. Old Unfaithful? Additional variables Nearby hydrofracking Improve inference based on more information
  • 19. Old Unfaithful? Additional variables Nearby hydrofracking Improve inference based on more information
  • 20. Challenges data often wide structured integration noise Rewards robust inference signal amplification holistic/systems approach A matrix of n vectors of length m Multivariate: Properties Correlation matrix
  • 21. Principal Components Analysis (PCA) Linear n-dimensional encoding of original data Where dimensions are: orthogonal (uncorrelated) Top k dimensions are ordered by variance explained Multivariate: Dimensional Reduction PC 2 PC 1
  • 22. Multivariate: Dimensional Reduction Wall, Michael E., Andreas Rechtsteiner, Luis M. Rocha."Singular value decomposition and principal component analysis". in  A Practical Approach to Microarray Data Analysis . D.P. Berrar, W. Dubitzky, M. Granzow, eds. pp. 91-109, Kluwer: Norwell, MA (2003). LANL LA-UR-02-4001. Scores Loadings Explained variance m x PC PC x PC n x PC Original Data Calculating PCs: singular value decomposition (SVD) Eigenvalue explained variance Scores sample representation based on all variables Loadings variable contribution to scores
  • 23. Old Faithful 2.0 272 measurements 8 variables 2 real, 6 random noise A matrix of n vectors of length m Multivariate: Representations
  • 24. Multivariate: Representation Identify outliers using all measurements Use known to impute missing Identify interesting groups Evaluate uni- and bivariate observations Number of PCs can be used true data complexity
  • 25. PCA: Considerations data pre-treatment outliers noise unsupervised projection no pre-treatment centered and scaled to unit variance
  • 26. PCA: Considerations data pre-treatment outliers linear reconstruction noise Independent components analysis (ICA) unsupervised projection Use ICA to calculate statistically independent components
  • 27. PCA: Considerations data pre-treatment outliers linear reconstruction noise supervised projection Non-negative matrix factorization (NMF) NMF uses additive parts based encoding Learning the parts of objects by nonnegative matrix factorization, D.D. Lee,H.S. Seung, Zhipeng Zhao, ppt.
  • 28. PCA: Considerations data pre-treatment outliers linear reconstruction noise supervised projection Identify projection correlated with class assignment (classification) or continuous variables (regression) Partial Least Squares Projection to Latent Structures (PLS/-DA)
  • 29. PLS/-DA: Utility Strengths Predict multiple dependent variables avoids issues of multicollinearity Independent measure of variable importance Weaknesses Need to derive an empirical reference for model performance Poor established model optimization methods
  • 30. PLS-DA: Example Data: Old Faithful 2.0 272 observations on 8 variables Latent Variables are analogous to PCs Important Statistics (CV) Q2 = fit RMSEP = error of prediction AU(RO)C = specificity vs. sensitivity Select the appropriate number Latent Variables (LVs) to maximize Q2
  • 31. PLS-DA: Performance Use permutation tests to empirically determine model performance
  • 32. PLS-DA: Performance Use permutation tests to empirically determine model performance
  • 33. PLS: Predictive Performance Split data into training (2/3) and test sets (1/3) Generate model using training set and then predict class assignment for test set Use permutation tests to generate confidence bounds for future predictions
  • 35. PLS: Feature Selection Use the PLS-DA as an objective function to identify the most informative variables
  • 36. Networks Network: representation of relationships among objects Utility Project statistical results into a biological context Explore informative data aspects in the context of all that was observed. Identify emergent patterns
  • 37. Networks Interpret statistical results within a biological context
  • 38. Networks Highlight changes in patterns of relationships. non-diabetics type 2 diabetics
  • 39. Networks Display complex interactions non-diabetics type 2 diabetics
  • 40. non-diabetics type 2 diabetics imDEV : interactive modules for Data Exploration and Visualization   An integrated environment for systems level analysis of multivariate data. http:// sourceforge.net/apps/mediawiki/imdev
  • 41. Acknowledgements Newman Lab Designated Emphasis in Biotechnology (DEB) NIH This project is funded in part by the NIH grant NIGMS-NIH T32-GM008799, USDA-ARS 5306-51530-019-00D, and NIH-NIDDK R01DK078328 -01.