SlideShare a Scribd company logo
The T-BioInfo Platform
RNA-seq: whole transcriptome analysis
Noncoding RNA:
RNA functions directly, based on its own
shape
Messenger RNA:
Codes for proteins, which function based
on their shape
Types of noncoding RNA:
• tRNA (transfer RNA)
• rRNA (ribosomal RNA)
• ribozymes (RNA enzymes)
• miRNA (micro RNA)
• snRNA (small nuclear RNA)
• siRNA (small interfering RNA)
• piRNA (Piwi-interacting RNA)
• Xist
• Many more
Exons, introns, and isoforms from NGS data
Alternative splicing can generate different isoforms from the same
RNA gene product:
* Noncoding RNAs can have introns, too! Examples: Xist, HOTAIR, other lincRNAs
Why do a whole transcriptome analysis?
• Unknown disease correlations
Finding what you’re looking for when you don’t know exactly what you are
looking for. “Hypothesis Free Approach”
Example: 70% treatment efficacy = 30% poor response/no response
–Whole transcriptome analysis- compare responders and non-responders
–Computer can identify differences, even in the absence of a hypothesis
–Computer can present unexpected results that a researcher would not look for
due to preconceptions about the disease biology
• Disease correlations with post-transcription events
e.g. gene fusions and alternative splicing
• Species without a reference genome (GTF)
unsequenced species, poorly annotated genome, environmental sequencing
• Power: can outperform microarrays
Microarray vs. RNA-seq
 Microarrays: can only detect sequences the
array was designed to detect (must know in
advance what to put on the chip)
 Certain analyses not possible with
microarray, such as:
• Distinguish mature mRNA from unspliced
RNA, as well as different isoforms/splice
variants
• Strandedness
• Single cell analysis
 RNA-seq: "fuzzy" overview; facilitates novel
transcript discovery
 RNA-seq lends itself to further and
confirmatory analyses
 Lower error rate + problems like cross-
hybridization avoided in RNA-seq
NGS
Steps:
1.fragment RNA
2.reverse transcribe => cDNA
3.High-throughput sequencing
Length: Long = more information
but more errors + expensive
Variety of machines:
-choose based on experimental
design and cost
-output: 7.5 Gb to 1800 Gb
-max reads/run: 25 million to 6
billion
-max read length: 2 x 150bp to 2
x 300 bp
RNA-seq overview
de novo
Step 1:
Preparation of raw RNA reads
-Primers cleaned from library (library of
fragments)
-Length: computation vs. sequencing power
-Single-end vs. Paired-end
Sequences of
fragments (reads)
will be aligned to a
reference genome
with GTF file
Align RNA-seq library to genome
For today’s analysis, we will be mapping to a genome using an existing GTF file
• Genes
• Isoforms
Step 2:
Mapping on Transcriptome
Step 3:
Generating expression tables
Genes and isoforms
For our purposes, mapping (aligning) reads to a transcriptome is
just mapping to a genome, but with expression levels of each
transcript
Building pipelines in the T-BioInfo platform
The T-BioInfo pipeline we will be building in
today’s workshop
So, the pipeline will give us a table of transcripts.
Now what?
• Normalization: Methods for overcoming variance due to
technical issues or other issues not related to the experiment
• Post-processing:
• Principal Component Analysis (PCA): provides visual overview
of the data
• Statistical analysis (e.g. T-test)
• Machine learning techniques
• Biological interpretation of results: use databases to find out
more about the identified genes, e.g. publications,
correlations
Output you will
see (Excel table):
First two components
(“principle components”)
can be plotted on a 2D
graph to detect clustering:
“Shadow” (does not
show the whole picture)
Benchmark: 40% of variability
PCA
Dimension reduction technique for reducing a lot of data into a subset that captures the essence
of the original data.
A brief explanation of machine learning
Using a training set to teach a computer to categorize
Duck vs. Not Duck:
Three subtypes of breast cancer
1. ER+ Positive for the estrogen receptor, treatment includes hormone therapy and drug
treatments targeting the estrogen receptor. The most common subtype of diagnosed breast
cancer. Positive outlook in the short term.
1. HER2+ Overexpress human epidermal growth factor, HER2/neu, a growth-promoting protein.
This type of cancer tends to be more aggressive than ER+ or PR+ breast cancer. Cannot be
treated with hormone therapy, but there are targeted drug treatments.
1. Triple Negative Negative for estrogen receptor and progesterone receptor, and does not
overexpress HER2/neu. Most cancers with mutated BRCA1 genes are triple negative. This type
responds to surgery/chemotherapy, but tends to recur later. No targeted therapy, although some
treatments in development. Survival rates lower than for other breast cancer subtypes. This
cancer type occurs in 15-20% of those diagnosed with breast cancer in the United States.
Patient Derived Xenograft mouse models
each represents a different way of being immunocompromisedEx:
Athymic Nude: Lacks the thymus, unable to produce T-cells.
NOD/CB17 SCID: Combined immunodeficiency, no mature T cells or B cells. Functional natural
killer cells, macrophages, and granulocytes.
Tumor = human, Stroma = mouse (original transplant had human stroma)
Whole Transcriptome Profiling of
Cancer Tumors in Mouse PDX Models
http://www.impactjournals.com/oncotarget/index.php?journal=oncotarget&page=article&op=view&path%5B%5D=80
14
Based on breast cancer samples taken from the publication “Whole transcriptome profiling
of patient-derived xenograft models as a tool to identify both tumor and stromal specific
biomarkers” (James R. Bradford et. al.; DOI: 10.18632/oncotarget.8014)
Introduction
• Dataset: 21 samples from 3 subtypes of breast cancer in 3 different mouse models.
• Goals: identify a clear signal showing transcriptional differences between cancer subtypes
1) Identify differences in expression between cancer subtypes and between mouse models 2) Select representative
genes that could be considered as biomarker candidates
PDX Mouse Species
XID: Characterized by the absence
of the thymus, mutant B
lymphocytes, and no T-cell function.
NOD SCID: Severe combined
immunodeficiency, with no
mature T cells and B cells.
Athymic Nude: Lacks the
thymus and is unable to
produce T-cells
Breast TN: Survival rates are lower for this cancer than
ER+ cancer types.
Breast ER+: Treatment often includes Hormone Therapy
and has a more positive outlook in the short term.
Breast HER2+: Tends to be a more aggressive cancer
type than ER+.
Breast Cancer Subtypes
Sample Summary
For More information: http://www.cancer.org/cancer/breastcancer/detailedguide/breast-cancer-classifying
Biological Data Repositories
What is a FastQ file?
Project Accession Number
FASTA Format:
Text Based File without the Quality Score
Step 1: RNA-seq pipeline prepares all annotated and non-
annotated genomic element estimation of expression levels
Removing genomic elements that
did not have any expression (all
zeros) in the RSEM table. This
includes both the isoform and gene
tables.
Quantile Normalization
Principal Component Analysis
Step 2: RSEM output tables of genes and isoforms are
prepared for Machine Learning Analysis
1. Mapping by Bowtie2 using the original GTF
(Mouse and Human Genome Combined)
2. RSEM Expression Table: Quantification of Gene
and Isoform Level Abundance
3. Outputs include Genes Table and Isoform Table Factor Regression Analysis
Visualization of T-Bioinfo Bioinformatics Functions
Lets First Build Our RNA-seq Pipeline!
When your RNA-seq pipeline is complete….
Quantile Normalization
Before Normalization
After Normalization
Gene Name Sample Names
Multi-Sample Normalization is considered a standard and necessary part of RNA-seq Analysis.
- Unwanted Technical Variation
Quantile Normalization
Biological Databases- Great for
Annotation!
https://david.ncifcrf.gov/http://www.ensembl.org
Now back to the T-BioInfo Platform!
1. Start a PCA Pipeline
2. Create a Scatter Plot Image from our Results
3. Utilize DAVID and ENSEMBL to investigate Biological Meaning
4. Learn about other Machine Learning Methods
5. Understand a “real” RNA-seq project timeline
T-Bio.Info Platform: http://tbioinfopb1.pine-biotech.com:3000
PCA of Human Tumor By Samples and By
Genes
Link:https://pinebio.shinyapps.io/app_genes/
Link: https://pinebio.shinyapps.io/app_samples/
https://pinebio.shinyapps.io/app_samples/
PC1:22.16%, PC2:9.22%
• Extracellular
Matrix
Remodeling
• Cell
Migration
• Tumor
Growth
• Angiogenesis
0
2
4
6
8
10
12
LevelofExpression
Breast Cancer Samples
Matrix Metalloprotease 14 Expression in Breast Cancer Samples
Upregulated in Triple Negative Cancer Samples
Defining the Breast Cancer Subtypes
• Estrogen
Regulated
Proteins
• Oncogenic
• Bone
Metastasis
TFF3 is a promoter of angiogenesis in Breast
Cancer . This protein is secreted from
mammary carcinoma cells to promote
angiogenesis
TFF3 also promotes angiogenesis by direct
functional effects on endothelial cellular
processes promoting angiogenesis.
TFF3 stimulates angiogenesis to co-
coordinate with the growth promoting and
metastatic actions of TFF3 in mammary
carcinoma to enhance tumor progression
and dissemination.
0
2
4
6
8
10
12
LevelOfExpression
Breast Cancer Samples
Trefoil Factor 3 in Breast Cancer
Upregulated in Estrogen Receptor + Samples
Significance of Hormones to Breast Cancer- Endocrine Therapy
0
2
4
6
8
10
12
LevelOfExpression
Breast Cancer Samples
Estrogen Receptor Expression in Breast Cancer Samples
Estrogen
Stimulates the
cell
proliferation
of the Breast
cancer cell
Progesterone
receptor testing
is a standard
part of testing
for breast cancer
diagnosis 0
1
2
3
4
5
6
7
8
LevelofExpression
Breast Cancer Sample
Progesterone Receptor Expression in Breast Cancer Samples
Progesterone receptors, when activated by progesterone,
actually attached themselves to the estrogen receptors,
which caused the estrogen receptors to stop turning on the
cancer promotion gene.
Then they actually turned on the genes that promote death
of cancer cells (called apoptosis), and the growth of
healthy cells!
Upregulated in Estrogen Receptor Cancer
Estrogen Receptor, HER2, Triple Negative
Expression Profile 1:
High Estrogen Receptor
High Progesterone Receptor
Low Matrix Metalloprotease 14
Expression Profile 2:
Low Estrogen Receptor
No Progesterone Receptor
High Matrix Metalloprotease 14
Expression Profile 3:
Low Estrogen Receptor
Low Progesterone Receptor
High Matrix Metalloprotease 14
HER2 Breast Cancer
Luminal B- Estrogen Positive Breast Cancer
Basal-Triple Negative Breast Cancer
0
2
4
6
8
10
12
Estrogen MMP14 Progesterone
Breast Cancer Sample 1
0
2
4
6
8
10
Estrogen MMP14 Progesterone
Breast Cancer Sample 3
0
2
4
6
8
10
Estrogen MMP14 Progesterone
Breast Cancer Sample 2
Factor Regression Analysis
A0B0 Triple Neg/ Athymic Nude
A0B1 Triple Neg-/SCID
A1B0 ER+/ Athymic Nude
A1B1 ER+/ SCID
Factor Table (2 factors, 2 levels each)
Factor A: Triple Negative vs. ER+
Factor A: Triple Negative vs. ER+
RNA-Seq Experiment Overview
Based on Breast Cancer Samples taken from the publication “Whole transcriptome profiling of patient-derived xenograft models
as a tool to identify both tumor and stromal specific biomarkers” (James R. Bradford et. al.; DOI: 10.18632/oncotarget.8014)
HER2 ER+TNBC
NOD SCID XID Athymic CB17 SCID
1. Ribosomal Depleted RNA
2. Fragment RNA
3. TruSeq RNA Sample
Preparation Kit
4. Concatenated Genome
(Mouse/Human)
5. Indexed with star align
Secondary Analysis
Tertiary Analysis
Gene Summary and Ontology Report
1. Mapping using TopHat
2. Finding Isoforms using Cufflinks
3. GTF file of isoforms using Cuffmerge
4. Mapping Bowtie-2t on new transcriptome
Cancer Subtypes
Mouse Species
Thanks for Listening!
Any Questions?
Contact: Info@pine-biotech.com
T-Bioinfo Platform : http://tbioinfopb1.pine-biotech.com:3000
Pine Biotech Website: http://pine-biotech.com
Pine Biotech Education Website: http://edu.t-bio.info
Factor Regression Analysis
A0B0 Triple Neg/ Athymic Nude
A0B1 Triple Neg-/SCID
A1B0 ER+/ Athymic Nude
A1B1 ER+/ SCID
Factor Table (2 factors, 2 levels each)
Triple Negative Samples ER+ Samples
Selecting Human Genes Under the Influence of
Either Triple Negative Breast Cancer or Estrogen
Positive Breast Cancer
Gene Expression Key
*No Significant Mouse Genes
Link: https://pinebio.shinyapps.io/app_faca/

More Related Content

PPTX
Catalyzing Plant Science Research with RNA-seq
Manjappa Ganiger
 
PDF
Talk ABRF 2015 (Gunnar Rätsch)
Gunnar Rätsch
 
PDF
RNASeq Experiment Design
Yaoyu Wang
 
PDF
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
University of California, Davis
 
PDF
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
VHIR Vall d’Hebron Institut de Recerca
 
PPTX
RNA-seq Data Analysis Overview
Sean Davis
 
PDF
An introduction to RNA-seq data analysis
AGRF_Ltd
 
PDF
wings2014 Workshop 1 Design, sequence, align, count, visualize
Ann Loraine
 
Catalyzing Plant Science Research with RNA-seq
Manjappa Ganiger
 
Talk ABRF 2015 (Gunnar Rätsch)
Gunnar Rätsch
 
RNASeq Experiment Design
Yaoyu Wang
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
University of California, Davis
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
VHIR Vall d’Hebron Institut de Recerca
 
RNA-seq Data Analysis Overview
Sean Davis
 
An introduction to RNA-seq data analysis
AGRF_Ltd
 
wings2014 Workshop 1 Design, sequence, align, count, visualize
Ann Loraine
 

What's hot (20)

PPTX
Transcript detection in RNAseq
Denis C. Bauer
 
PPTX
RNA-seq: A High-resolution View of the Transcriptome
Sean Davis
 
PPTX
RNA-seq differential expression analysis
mikaelhuss
 
PPTX
RNASeq - Analysis Pipeline for Differential Expression
Jatinder Singh
 
POT
RNA-seq quality control and pre-processing
mikaelhuss
 
PDF
Rna seq
Sean Davis
 
PPTX
Differential gene expression
Denis C. Bauer
 
PDF
RNA sequencing: advances and opportunities
Paolo Dametto
 
PPT
Rna seq pipeline
Karan Veer Singh
 
PDF
Part 1 of RNA-seq for DE analysis: Defining the goal
Joachim Jacob
 
PDF
ChipSeq Data Analysis
COST action BM1006
 
PPTX
Single cell RNA sequencing; Methods and applications
faraharooj
 
PPTX
Knowing Your NGS Upstream: Alignment and Variants
Golden Helix Inc
 
PDF
Rnaseq basics ngs_application1
Yaoyu Wang
 
PPTX
Why Transcriptome? Why RNA-Seq? ENCODE answers….
Mohammad Hossein Banabazi
 
PDF
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de NGS
VHIR Vall d’Hebron Institut de Recerca
 
PDF
Examining gene expression and methylation with next gen sequencing
Stephen Turner
 
PPTX
Data Management for Quantitative Biology - Data sources (Next generation tech...
QBiC_Tue
 
Transcript detection in RNAseq
Denis C. Bauer
 
RNA-seq: A High-resolution View of the Transcriptome
Sean Davis
 
RNA-seq differential expression analysis
mikaelhuss
 
RNASeq - Analysis Pipeline for Differential Expression
Jatinder Singh
 
RNA-seq quality control and pre-processing
mikaelhuss
 
Rna seq
Sean Davis
 
Differential gene expression
Denis C. Bauer
 
RNA sequencing: advances and opportunities
Paolo Dametto
 
Rna seq pipeline
Karan Veer Singh
 
Part 1 of RNA-seq for DE analysis: Defining the goal
Joachim Jacob
 
ChipSeq Data Analysis
COST action BM1006
 
Single cell RNA sequencing; Methods and applications
faraharooj
 
Knowing Your NGS Upstream: Alignment and Variants
Golden Helix Inc
 
Rnaseq basics ngs_application1
Yaoyu Wang
 
Why Transcriptome? Why RNA-Seq? ENCODE answers….
Mohammad Hossein Banabazi
 
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de NGS
VHIR Vall d’Hebron Institut de Recerca
 
Examining gene expression and methylation with next gen sequencing
Stephen Turner
 
Data Management for Quantitative Biology - Data sources (Next generation tech...
QBiC_Tue
 
Ad

Similar to Rna seq (20)

PPTX
Rna seq - PDX models
Amitha Dasari
 
PPTX
Whole Transcriptome Profiling of Cancer Tumors in Mouse PDX Models
Tom Koch
 
PPTX
Pdx project
JaclynW
 
PPTX
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Elia Brodsky
 
PPTX
June 25-26, Workshop
Fahadahammed2
 
PPTX
Cell lines breast cancer-project
Amitha Dasari
 
PDF
OncoRep: A n-of-1 reporting tool to support genome-guided treatment for breas...
Tobias Meißner
 
PPTX
Cell lines breast-project
JaclynW
 
PDF
User-friendly bioinformatics (Monthly Informational workshop)
Elia Brodsky
 
PPTX
Evolution of molecular prognostic testing in ER positive breast cancer
Bell Symposium & MSP Seminar
 
PPTX
May workshop
Fahadahammed2
 
PPTX
Poster_CBCD_2014
Rosanna Aversa
 
PPTX
May 15 workshop
Fahadahammed2
 
PPTX
Gene expression profiling in breast carcinoma
ghoshparthanrs
 
PDF
Global Gene Expression Profiles from Breast Tumor Samples using the Ion Ampli...
Thermo Fisher Scientific
 
PDF
2016 Presentation at the University of Hawaii Cancer Center
Casey Greene
 
PPTX
Exome breast cancer-edu-tk-sb
Amitha Dasari
 
PPTX
Pdx project
Amitha Dasari
 
PPT
Brian_Strahl 2013_class_on_genomics_and_proteomics
University of North Carolina Chapel Hill (UNC)
 
PDF
RNA-seq based Genome Annotation with mGene.ngs and MiTie
Gunnar Rätsch
 
Rna seq - PDX models
Amitha Dasari
 
Whole Transcriptome Profiling of Cancer Tumors in Mouse PDX Models
Tom Koch
 
Pdx project
JaclynW
 
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Elia Brodsky
 
June 25-26, Workshop
Fahadahammed2
 
Cell lines breast cancer-project
Amitha Dasari
 
OncoRep: A n-of-1 reporting tool to support genome-guided treatment for breas...
Tobias Meißner
 
Cell lines breast-project
JaclynW
 
User-friendly bioinformatics (Monthly Informational workshop)
Elia Brodsky
 
Evolution of molecular prognostic testing in ER positive breast cancer
Bell Symposium & MSP Seminar
 
May workshop
Fahadahammed2
 
Poster_CBCD_2014
Rosanna Aversa
 
May 15 workshop
Fahadahammed2
 
Gene expression profiling in breast carcinoma
ghoshparthanrs
 
Global Gene Expression Profiles from Breast Tumor Samples using the Ion Ampli...
Thermo Fisher Scientific
 
2016 Presentation at the University of Hawaii Cancer Center
Casey Greene
 
Exome breast cancer-edu-tk-sb
Amitha Dasari
 
Pdx project
Amitha Dasari
 
Brian_Strahl 2013_class_on_genomics_and_proteomics
University of North Carolina Chapel Hill (UNC)
 
RNA-seq based Genome Annotation with mGene.ngs and MiTie
Gunnar Rätsch
 
Ad

Recently uploaded (20)

PPTX
Employee Salary Presentation.l based on data science collection of data
barridevakumari2004
 
PPTX
Web dev -ppt that helps us understand web technology
shubhragoyal12
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PPTX
Multiscale Segmentation of Survey Respondents: Seeing the Trees and the Fores...
Sione Palu
 
PPTX
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
PPTX
The whitetiger novel review for collegeassignment.pptx
DhruvPatel754154
 
PPTX
Introduction to Data Analytics and Data Science
KavithaCIT
 
PPTX
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
PPTX
Introduction-to-Python-Programming-Language (1).pptx
dhyeysapariya
 
PPTX
Pipeline Automatic Leak Detection for Water Distribution Systems
Sione Palu
 
PPTX
lecture 13 mind test academy it skills.pptx
ggesjmrasoolpark
 
PDF
blockchain123456789012345678901234567890
tanvikhunt1003
 
PDF
Technical Writing Module-I Complete Notes.pdf
VedprakashArya13
 
PPTX
IP_Journal_Articles_2025IP_Journal_Articles_2025
mishell212144
 
PPTX
Probability systematic sampling methods.pptx
PrakashRajput19
 
PDF
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
PPTX
Databricks-DE-Associate Certification Questions-june-2024.pptx
pedelli41
 
PPTX
short term internship project on Data visualization
JMJCollegeComputerde
 
PDF
SUMMER INTERNSHIP REPORT[1] (AutoRecovered) (6) (1).pdf
pandeydiksha814
 
PPTX
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 
Employee Salary Presentation.l based on data science collection of data
barridevakumari2004
 
Web dev -ppt that helps us understand web technology
shubhragoyal12
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
Multiscale Segmentation of Survey Respondents: Seeing the Trees and the Fores...
Sione Palu
 
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
The whitetiger novel review for collegeassignment.pptx
DhruvPatel754154
 
Introduction to Data Analytics and Data Science
KavithaCIT
 
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
Introduction-to-Python-Programming-Language (1).pptx
dhyeysapariya
 
Pipeline Automatic Leak Detection for Water Distribution Systems
Sione Palu
 
lecture 13 mind test academy it skills.pptx
ggesjmrasoolpark
 
blockchain123456789012345678901234567890
tanvikhunt1003
 
Technical Writing Module-I Complete Notes.pdf
VedprakashArya13
 
IP_Journal_Articles_2025IP_Journal_Articles_2025
mishell212144
 
Probability systematic sampling methods.pptx
PrakashRajput19
 
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
Databricks-DE-Associate Certification Questions-june-2024.pptx
pedelli41
 
short term internship project on Data visualization
JMJCollegeComputerde
 
SUMMER INTERNSHIP REPORT[1] (AutoRecovered) (6) (1).pdf
pandeydiksha814
 
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 

Rna seq

  • 2. RNA-seq: whole transcriptome analysis Noncoding RNA: RNA functions directly, based on its own shape Messenger RNA: Codes for proteins, which function based on their shape Types of noncoding RNA: • tRNA (transfer RNA) • rRNA (ribosomal RNA) • ribozymes (RNA enzymes) • miRNA (micro RNA) • snRNA (small nuclear RNA) • siRNA (small interfering RNA) • piRNA (Piwi-interacting RNA) • Xist • Many more
  • 3. Exons, introns, and isoforms from NGS data Alternative splicing can generate different isoforms from the same RNA gene product: * Noncoding RNAs can have introns, too! Examples: Xist, HOTAIR, other lincRNAs
  • 4. Why do a whole transcriptome analysis? • Unknown disease correlations Finding what you’re looking for when you don’t know exactly what you are looking for. “Hypothesis Free Approach” Example: 70% treatment efficacy = 30% poor response/no response –Whole transcriptome analysis- compare responders and non-responders –Computer can identify differences, even in the absence of a hypothesis –Computer can present unexpected results that a researcher would not look for due to preconceptions about the disease biology • Disease correlations with post-transcription events e.g. gene fusions and alternative splicing • Species without a reference genome (GTF) unsequenced species, poorly annotated genome, environmental sequencing • Power: can outperform microarrays
  • 5. Microarray vs. RNA-seq  Microarrays: can only detect sequences the array was designed to detect (must know in advance what to put on the chip)  Certain analyses not possible with microarray, such as: • Distinguish mature mRNA from unspliced RNA, as well as different isoforms/splice variants • Strandedness • Single cell analysis  RNA-seq: "fuzzy" overview; facilitates novel transcript discovery  RNA-seq lends itself to further and confirmatory analyses  Lower error rate + problems like cross- hybridization avoided in RNA-seq
  • 6. NGS Steps: 1.fragment RNA 2.reverse transcribe => cDNA 3.High-throughput sequencing Length: Long = more information but more errors + expensive Variety of machines: -choose based on experimental design and cost -output: 7.5 Gb to 1800 Gb -max reads/run: 25 million to 6 billion -max read length: 2 x 150bp to 2 x 300 bp
  • 7. RNA-seq overview de novo Step 1: Preparation of raw RNA reads -Primers cleaned from library (library of fragments) -Length: computation vs. sequencing power -Single-end vs. Paired-end Sequences of fragments (reads) will be aligned to a reference genome with GTF file
  • 8. Align RNA-seq library to genome For today’s analysis, we will be mapping to a genome using an existing GTF file • Genes • Isoforms Step 2: Mapping on Transcriptome Step 3: Generating expression tables Genes and isoforms For our purposes, mapping (aligning) reads to a transcriptome is just mapping to a genome, but with expression levels of each transcript
  • 9. Building pipelines in the T-BioInfo platform
  • 10. The T-BioInfo pipeline we will be building in today’s workshop
  • 11. So, the pipeline will give us a table of transcripts. Now what? • Normalization: Methods for overcoming variance due to technical issues or other issues not related to the experiment • Post-processing: • Principal Component Analysis (PCA): provides visual overview of the data • Statistical analysis (e.g. T-test) • Machine learning techniques • Biological interpretation of results: use databases to find out more about the identified genes, e.g. publications, correlations
  • 12. Output you will see (Excel table): First two components (“principle components”) can be plotted on a 2D graph to detect clustering: “Shadow” (does not show the whole picture) Benchmark: 40% of variability PCA Dimension reduction technique for reducing a lot of data into a subset that captures the essence of the original data.
  • 13. A brief explanation of machine learning Using a training set to teach a computer to categorize Duck vs. Not Duck:
  • 14. Three subtypes of breast cancer 1. ER+ Positive for the estrogen receptor, treatment includes hormone therapy and drug treatments targeting the estrogen receptor. The most common subtype of diagnosed breast cancer. Positive outlook in the short term. 1. HER2+ Overexpress human epidermal growth factor, HER2/neu, a growth-promoting protein. This type of cancer tends to be more aggressive than ER+ or PR+ breast cancer. Cannot be treated with hormone therapy, but there are targeted drug treatments. 1. Triple Negative Negative for estrogen receptor and progesterone receptor, and does not overexpress HER2/neu. Most cancers with mutated BRCA1 genes are triple negative. This type responds to surgery/chemotherapy, but tends to recur later. No targeted therapy, although some treatments in development. Survival rates lower than for other breast cancer subtypes. This cancer type occurs in 15-20% of those diagnosed with breast cancer in the United States. Patient Derived Xenograft mouse models each represents a different way of being immunocompromisedEx: Athymic Nude: Lacks the thymus, unable to produce T-cells. NOD/CB17 SCID: Combined immunodeficiency, no mature T cells or B cells. Functional natural killer cells, macrophages, and granulocytes. Tumor = human, Stroma = mouse (original transplant had human stroma)
  • 15. Whole Transcriptome Profiling of Cancer Tumors in Mouse PDX Models http://www.impactjournals.com/oncotarget/index.php?journal=oncotarget&page=article&op=view&path%5B%5D=80 14 Based on breast cancer samples taken from the publication “Whole transcriptome profiling of patient-derived xenograft models as a tool to identify both tumor and stromal specific biomarkers” (James R. Bradford et. al.; DOI: 10.18632/oncotarget.8014)
  • 16. Introduction • Dataset: 21 samples from 3 subtypes of breast cancer in 3 different mouse models. • Goals: identify a clear signal showing transcriptional differences between cancer subtypes 1) Identify differences in expression between cancer subtypes and between mouse models 2) Select representative genes that could be considered as biomarker candidates PDX Mouse Species XID: Characterized by the absence of the thymus, mutant B lymphocytes, and no T-cell function. NOD SCID: Severe combined immunodeficiency, with no mature T cells and B cells. Athymic Nude: Lacks the thymus and is unable to produce T-cells Breast TN: Survival rates are lower for this cancer than ER+ cancer types. Breast ER+: Treatment often includes Hormone Therapy and has a more positive outlook in the short term. Breast HER2+: Tends to be a more aggressive cancer type than ER+. Breast Cancer Subtypes
  • 17. Sample Summary For More information: http://www.cancer.org/cancer/breastcancer/detailedguide/breast-cancer-classifying Biological Data Repositories
  • 18. What is a FastQ file? Project Accession Number FASTA Format: Text Based File without the Quality Score
  • 19. Step 1: RNA-seq pipeline prepares all annotated and non- annotated genomic element estimation of expression levels Removing genomic elements that did not have any expression (all zeros) in the RSEM table. This includes both the isoform and gene tables. Quantile Normalization Principal Component Analysis Step 2: RSEM output tables of genes and isoforms are prepared for Machine Learning Analysis 1. Mapping by Bowtie2 using the original GTF (Mouse and Human Genome Combined) 2. RSEM Expression Table: Quantification of Gene and Isoform Level Abundance 3. Outputs include Genes Table and Isoform Table Factor Regression Analysis Visualization of T-Bioinfo Bioinformatics Functions Lets First Build Our RNA-seq Pipeline!
  • 20. When your RNA-seq pipeline is complete….
  • 21. Quantile Normalization Before Normalization After Normalization Gene Name Sample Names Multi-Sample Normalization is considered a standard and necessary part of RNA-seq Analysis. - Unwanted Technical Variation Quantile Normalization
  • 22. Biological Databases- Great for Annotation! https://david.ncifcrf.gov/http://www.ensembl.org
  • 23. Now back to the T-BioInfo Platform! 1. Start a PCA Pipeline 2. Create a Scatter Plot Image from our Results 3. Utilize DAVID and ENSEMBL to investigate Biological Meaning 4. Learn about other Machine Learning Methods 5. Understand a “real” RNA-seq project timeline T-Bio.Info Platform: http://tbioinfopb1.pine-biotech.com:3000
  • 24. PCA of Human Tumor By Samples and By Genes Link:https://pinebio.shinyapps.io/app_genes/ Link: https://pinebio.shinyapps.io/app_samples/ https://pinebio.shinyapps.io/app_samples/ PC1:22.16%, PC2:9.22%
  • 25. • Extracellular Matrix Remodeling • Cell Migration • Tumor Growth • Angiogenesis 0 2 4 6 8 10 12 LevelofExpression Breast Cancer Samples Matrix Metalloprotease 14 Expression in Breast Cancer Samples Upregulated in Triple Negative Cancer Samples Defining the Breast Cancer Subtypes
  • 26. • Estrogen Regulated Proteins • Oncogenic • Bone Metastasis TFF3 is a promoter of angiogenesis in Breast Cancer . This protein is secreted from mammary carcinoma cells to promote angiogenesis TFF3 also promotes angiogenesis by direct functional effects on endothelial cellular processes promoting angiogenesis. TFF3 stimulates angiogenesis to co- coordinate with the growth promoting and metastatic actions of TFF3 in mammary carcinoma to enhance tumor progression and dissemination. 0 2 4 6 8 10 12 LevelOfExpression Breast Cancer Samples Trefoil Factor 3 in Breast Cancer
  • 27. Upregulated in Estrogen Receptor + Samples Significance of Hormones to Breast Cancer- Endocrine Therapy 0 2 4 6 8 10 12 LevelOfExpression Breast Cancer Samples Estrogen Receptor Expression in Breast Cancer Samples Estrogen Stimulates the cell proliferation of the Breast cancer cell
  • 28. Progesterone receptor testing is a standard part of testing for breast cancer diagnosis 0 1 2 3 4 5 6 7 8 LevelofExpression Breast Cancer Sample Progesterone Receptor Expression in Breast Cancer Samples Progesterone receptors, when activated by progesterone, actually attached themselves to the estrogen receptors, which caused the estrogen receptors to stop turning on the cancer promotion gene. Then they actually turned on the genes that promote death of cancer cells (called apoptosis), and the growth of healthy cells! Upregulated in Estrogen Receptor Cancer
  • 29. Estrogen Receptor, HER2, Triple Negative Expression Profile 1: High Estrogen Receptor High Progesterone Receptor Low Matrix Metalloprotease 14 Expression Profile 2: Low Estrogen Receptor No Progesterone Receptor High Matrix Metalloprotease 14 Expression Profile 3: Low Estrogen Receptor Low Progesterone Receptor High Matrix Metalloprotease 14 HER2 Breast Cancer Luminal B- Estrogen Positive Breast Cancer Basal-Triple Negative Breast Cancer 0 2 4 6 8 10 12 Estrogen MMP14 Progesterone Breast Cancer Sample 1 0 2 4 6 8 10 Estrogen MMP14 Progesterone Breast Cancer Sample 3 0 2 4 6 8 10 Estrogen MMP14 Progesterone Breast Cancer Sample 2
  • 30. Factor Regression Analysis A0B0 Triple Neg/ Athymic Nude A0B1 Triple Neg-/SCID A1B0 ER+/ Athymic Nude A1B1 ER+/ SCID Factor Table (2 factors, 2 levels each) Factor A: Triple Negative vs. ER+ Factor A: Triple Negative vs. ER+
  • 31. RNA-Seq Experiment Overview Based on Breast Cancer Samples taken from the publication “Whole transcriptome profiling of patient-derived xenograft models as a tool to identify both tumor and stromal specific biomarkers” (James R. Bradford et. al.; DOI: 10.18632/oncotarget.8014) HER2 ER+TNBC NOD SCID XID Athymic CB17 SCID 1. Ribosomal Depleted RNA 2. Fragment RNA 3. TruSeq RNA Sample Preparation Kit 4. Concatenated Genome (Mouse/Human) 5. Indexed with star align Secondary Analysis Tertiary Analysis Gene Summary and Ontology Report 1. Mapping using TopHat 2. Finding Isoforms using Cufflinks 3. GTF file of isoforms using Cuffmerge 4. Mapping Bowtie-2t on new transcriptome Cancer Subtypes Mouse Species
  • 32. Thanks for Listening! Any Questions? Contact: [email protected] T-Bioinfo Platform : http://tbioinfopb1.pine-biotech.com:3000 Pine Biotech Website: http://pine-biotech.com Pine Biotech Education Website: http://edu.t-bio.info
  • 33. Factor Regression Analysis A0B0 Triple Neg/ Athymic Nude A0B1 Triple Neg-/SCID A1B0 ER+/ Athymic Nude A1B1 ER+/ SCID Factor Table (2 factors, 2 levels each) Triple Negative Samples ER+ Samples Selecting Human Genes Under the Influence of Either Triple Negative Breast Cancer or Estrogen Positive Breast Cancer Gene Expression Key *No Significant Mouse Genes Link: https://pinebio.shinyapps.io/app_faca/