1
|
1
|
deleted file mode 100644
|
...
|
...
|
@@ -1,382 +0,0 @@
|
1
|
|
-title: "Use of SpatialDecon in a Spatial Transcriptomics dataset"
|
2
|
|
-output:
|
3
|
|
- rmarkdown::html_vignette:
|
4
|
|
- toc: true
|
5
|
|
-vignette: >
|
6
|
|
- %\VignetteIndexEntry{Use of SpatialDecon in a Spatial Transcriptomics dataset}
|
7
|
|
- %\VignetteEngine{knitr::rmarkdown}
|
8
|
|
- %\VignetteEncoding{UTF-8}
|
9
|
|
-
|
10
|
|
-<style>
|
11
|
|
-p.caption {
|
12
|
|
- font-size: 1.5em;
|
13
|
|
-}
|
14
|
|
-</style>
|
15
|
|
-
|
16
|
|
-```{r, include = FALSE}
|
17
|
|
-knitr::opts_chunk$set(
|
18
|
|
- collapse = TRUE,
|
19
|
|
- comment = "#>"
|
20
|
|
-)
|
21
|
|
-```
|
22
|
|
-
|
23
|
|
-
|
24
|
|
-### Installation
|
25
|
|
-
|
26
|
|
-```{r installation, eval=FALSE}
|
27
|
|
-
|
28
|
|
-if(!requireNamespace("BiocManager", quietly = TRUE))
|
29
|
|
- install.packages("BiocManager")
|
30
|
|
-BiocManager::install("SpatialDecon")
|
31
|
|
-
|
32
|
|
-```
|
33
|
|
-
|
34
|
|
-### Overview
|
35
|
|
-
|
36
|
|
-
|
37
|
|
-This vignette demonstrates the use of the SpatialDecon package to estimate cell
|
38
|
|
-abundance in spatial gene expression studies.
|
39
|
|
-
|
40
|
|
-The workflow demonstrated here focuses on analysis of Seurat objects, which
|
41
|
|
- are commonly used to store Visium and Spatial Transcriptomics data. See our other
|
42
|
|
- vignettes for examples in GeoMx data.
|
43
|
|
-
|
44
|
|
-We'll analyze a Spatial Transcriptomics dataset from a HER2+ breast tumor, looking for abundance of
|
45
|
|
-different immune cell types.
|
46
|
|
-
|
47
|
|
-
|
48
|
|
-
|
49
|
|
-### Data preparation
|
50
|
|
-
|
51
|
|
-First, we load the package:
|
52
|
|
-```{r setup}
|
53
|
|
-library(SpatialDecon)
|
54
|
|
-library(SeuratObject)
|
55
|
|
-```
|
56
|
|
-
|
57
|
|
-Let's load the example ST Seurat object and examine it:
|
58
|
|
-```{r loaddata}
|
59
|
|
-# download data:
|
60
|
|
-con <- gzcon(url("https://github.com/almaan/her2st/raw/master/data/ST-cnts/G1.tsv.gz"))
|
61
|
|
-txt <- readLines(con)
|
62
|
|
-temp <- read.table(textConnection(txt), sep = "\t", header = TRUE, row.names = 1)
|
63
|
|
-# parse data
|
64
|
|
-raw = t(as.matrix(temp))
|
65
|
|
-norm = sweep(raw, 2, colSums(raw), "/") * mean(colSums(raw))
|
66
|
|
-x = as.numeric(substr(rownames(temp), 1, unlist(gregexpr("x", rownames(temp))) - 1))
|
67
|
|
-y = -as.numeric(substr(rownames(temp), unlist(gregexpr("x", rownames(temp))) + 1, nchar(rownames(temp))))
|
68
|
|
-# put into a seurat object:
|
69
|
|
-andersson_g1 = CreateSeuratObject(counts = raw, assay="Spatial")
|
70
|
|
[email protected]$x = x
|
71
|
|
[email protected]$y = y
|
72
|
|
-
|
73
|
|
-```
|
74
|
|
-
|
75
|
|
-
|
76
|
|
-
|
77
|
|
-
|
78
|
|
-### Cell profile matrices
|
79
|
|
-
|
80
|
|
-A "cell profile matrix" is a pre-defined matrix that specifies the expected
|
81
|
|
-expression profiles of each cell type in the experiment.
|
82
|
|
-The SpatialDecon library comes with one such matrix pre-loaded, the "SafeTME"
|
83
|
|
-matrix, designed for estimation of immune and stroma cells in the tumor
|
84
|
|
-microenvironment.
|
85
|
|
-(This matrix was designed to avoid genes commonly expressed by cancer cells;
|
86
|
|
-see the SpatialDecon manuscript for details.)
|
87
|
|
-
|
88
|
|
-Let's take a glance at the safeTME matrix:
|
89
|
|
-
|
90
|
|
-```{r showsafetme, fig.height=5, fig.width=10, fig.cap = "The safeTME cell profile matrix"}
|
91
|
|
-data("safeTME")
|
92
|
|
-data("safeTME.matches")
|
93
|
|
-
|
94
|
|
-signif(safeTME[seq_len(3), seq_len(3)], 2)
|
95
|
|
-
|
96
|
|
-heatmap(sweep(safeTME, 1, apply(safeTME, 1, max), "/"),
|
97
|
|
- labRow = NA, margins = c(10, 5))
|
98
|
|
-
|
99
|
|
-```
|
100
|
|
-
|
101
|
|
-
|
102
|
|
-For studies of other tissue types, we have provided a library of cell profile
|
103
|
|
-matrices, available on Github and downloadable with the "download_profile_matrix" function.
|
104
|
|
-
|
105
|
|
-For a complete list of matrices, see [CellProfileLibrary GitHub Page](https://github.com/Nanostring-Biostats/CellProfileLibrary/tree/NewProfileMatrices).
|
106
|
|
-
|
107
|
|
-Below we download a matrix of cell profiles derived from scRNA-seq of a mouse
|
108
|
|
-spleen.
|
109
|
|
-
|
110
|
|
-```{r downloadmatrix, fig.height=7, fig.width=10, fig.cap = "The Mouse Spleen profile matrix", eval=T}
|
111
|
|
-mousespleen <- download_profile_matrix(species = "Mouse",
|
112
|
|
- age_group = "Adult",
|
113
|
|
- matrixname = "Spleen_MCA")
|
114
|
|
-dim(mousespleen)
|
115
|
|
-
|
116
|
|
-mousespleen[1:4,1:4]
|
117
|
|
-
|
118
|
|
-head(cellGroups)
|
119
|
|
-
|
120
|
|
-metadata
|
121
|
|
-
|
122
|
|
-heatmap(sweep(mousespleen, 1, apply(mousespleen, 1, max), "/"),
|
123
|
|
- labRow = NA, margins = c(10, 5), cexCol = 0.7)
|
124
|
|
-
|
125
|
|
-```
|
126
|
|
-
|
127
|
|
-For studies where the provided cell profile matrices aren't sufficient or if a specific single cell dataset is wanted, we can make a custom profile matrix using the function create_profile_matrix().
|
128
|
|
-
|
129
|
|
-This mini single cell dataset is a fraction of the data from Kinchen, J. et al. Structural Remodeling of the Human Colonic Mesenchyme in Inflammatory Bowel Disease. Cell 175, 372-386.e17 (2018).
|
130
|
|
-
|
131
|
|
-```{r single cell data}
|
132
|
|
-data("mini_singleCell_dataset")
|
133
|
|
-
|
134
|
|
-mini_singleCell_dataset$mtx@Dim # genes x cells
|
135
|
|
-
|
136
|
|
-as.matrix(mini_singleCell_dataset$mtx)[1:4,1:4]
|
137
|
|
-
|
138
|
|
-head(mini_singleCell_dataset$annots)
|
139
|
|
-
|
140
|
|
-table(mini_singleCell_dataset$annots$LabeledCellType)
|
141
|
|
-
|
142
|
|
-```
|
143
|
|
-
|
144
|
|
-**Pericyte cell** and **smooth muscle cell of colon** will be dropped from this matrix due to low cell count. The average expression across all cells of one type is returned so the more cells of one type, the better reflection of the true gene expression. The confidence in these averages can be changed using the minCellNum filter.
|
145
|
|
-
|
146
|
|
-```{r creatematrix, fig.height=7, fig.width=10, fig.cap = "Custom profile matrix"}
|
147
|
|
-custom_mtx <- create_profile_matrix(mtx = mini_singleCell_dataset$mtx, # cell x gene count matrix
|
148
|
|
- cellAnnots = mini_singleCell_dataset$annots, # cell annotations with cell type and cell name as columns
|
149
|
|
- cellTypeCol = "LabeledCellType", # column containing cell type
|
150
|
|
- cellNameCol = "CellID", # column containing cell ID/name
|
151
|
|
- matrixName = "custom_mini_colon", # name of final profile matrix
|
152
|
|
- outDir = NULL, # path to desired output directory, set to NULL if matrix should not be written
|
153
|
|
- normalize = FALSE, # Should data be normalized?
|
154
|
|
- minCellNum = 5, # minimum number of cells of one type needed to create profile, exclusive
|
155
|
|
- minGenes = 10, # minimum number of genes expressed in a cell, exclusive
|
156
|
|
- scalingFactor = 5, # what should all values be multiplied by for final matrix
|
157
|
|
- discardCellTypes = TRUE) # should cell types be filtered for types like mitotic, doublet, low quality, unknown, etc.
|
158
|
|
-
|
159
|
|
-head(custom_mtx)
|
160
|
|
-
|
161
|
|
-heatmap(sweep(custom_mtx, 1, apply(custom_mtx, 1, max), "/"),
|
162
|
|
- labRow = NA, margins = c(10, 5), cexCol = 0.7)
|
163
|
|
-
|
164
|
|
-```
|
165
|
|
-
|
166
|
|
-Custom matrices can be created from all single cell data classes as long as a counts matrix and cell annotations can be passed to the function. Here is an example of creating a matrix using a Seurat object.
|
167
|
|
-
|
168
|
|
-```{r createSeuratmatrix}
|
169
|
|
-library(SeuratObject)
|
170
|
|
-
|
171
|
|
-data("mini_singleCell_dataset")
|
172
|
|
-
|
173
|
|
-rownames(mini_singleCell_dataset$annots) <- mini_singleCell_dataset$annots$CellID
|
174
|
|
-
|
175
|
|
-seuratObject <- CreateSeuratObject(counts = mini_singleCell_dataset$mtx, meta.data = mini_singleCell_dataset$annots)
|
176
|
|
-Idents(seuratObject) <- seuratObject$LabeledCellType
|
177
|
|
-
|
178
|
|
-rm(mini_singleCell_dataset)
|
179
|
|
-
|
180
|
|
-annots <- data.frame(cbind(cellType=as.character(Idents(seuratObject)),
|
181
|
|
- cellID=names(Idents(seuratObject))))
|
182
|
|
-
|
183
|
|
-custom_mtx_seurat <- create_profile_matrix(mtx = seuratObject@assays$RNA@counts,
|
184
|
|
- cellAnnots = annots,
|
185
|
|
- cellTypeCol = "cellType",
|
186
|
|
- cellNameCol = "cellID",
|
187
|
|
- matrixName = "custom_mini_colon",
|
188
|
|
- outDir = NULL,
|
189
|
|
- normalize = FALSE,
|
190
|
|
- minCellNum = 5,
|
191
|
|
- minGenes = 10)
|
192
|
|
-
|
193
|
|
-head(custom_mtx_seurat)
|
194
|
|
-
|
195
|
|
-paste("custom_mtx and custom_mtx_seurat are identical", all(custom_mtx == custom_mtx_seurat))
|
196
|
|
-```
|
197
|
|
-
|
198
|
|
-
|
199
|
|
-### Deconvolving a Seurat object with the runspatialdecon function
|
200
|
|
-
|
201
|
|
-Now our data is ready for deconvolution.
|
202
|
|
-First we'll show how to use spatialdecon under the basic settings, omitting
|
203
|
|
-optional bells and whistles.
|
204
|
|
-
|
205
|
|
-
|
206
|
|
-```{r runiss}
|
207
|
|
-res = runspatialdecon(object = andersson_g1,
|
208
|
|
- bg = 0.01,
|
209
|
|
- X = safeTME,
|
210
|
|
- align_genes = TRUE)
|
211
|
|
-str(res)
|
212
|
|
-```
|
213
|
|
-
|
214
|
|
-We're most interested in "beta", the matrix of estimated cell abundances.
|
215
|
|
-
|
216
|
|
-```{r plotissres, fig.height = 5, fig.width = 8, fig.cap = "Cell abundance estimates"}
|
217
|
|
-heatmap(res$beta, cexCol = 0.5, cexRow = 0.7, margins = c(10,7))
|
218
|
|
-```
|
219
|
|
-
|
220
|
|
-
|
221
|
|
-### Using the advanced settings of spatialdecon
|
222
|
|
-
|
223
|
|
-spatialdecon has several abilities beyond basic deconvolution:
|
224
|
|
-
|
225
|
|
-1. If given the nuclei counts for each region/observation, it returns results on
|
226
|
|
-the scale of total cell counts. This option is generally not available for ST/Visium data.
|
227
|
|
-2. If given the identities of pure tumor regions/observations, it infers a
|
228
|
|
-handful of tumor-specific expression profiles and appends them to the cell
|
229
|
|
-profile matrix. Doing this accounts for cancer cell-derived expression from any
|
230
|
|
-genes in the cell profile matrix, removing contaminating signal from cancer
|
231
|
|
-cells. This operation is complicated in ST/Visium studies, since they generally
|
232
|
|
-do not contain regions previously marked as pure tumor. As a heuristic for identifying pure
|
233
|
|
-tumor regions, we identify a small subset of regions with relatively low counts from
|
234
|
|
-genes in the safeTME matrix vs. from the rest of the transcriptome.
|
235
|
|
-3. If given raw count data, it derives per-data-point weights. For Visium/ST data,
|
236
|
|
-we assume Poisson error.
|
237
|
|
-4. If given a "cellmatches" argument, it sums multiple closely-related cell
|
238
|
|
-types into a single score. E.g. if the safeTME matrix is used with the
|
239
|
|
-cell-matching data object "safeTME.matches", it e.g. sums the "T.CD8.naive" and
|
240
|
|
-"T.CD8.memory" scores into a single "CD8.T.cells" score.
|
241
|
|
-
|
242
|
|
-Let's take a look at an example cell matching object:
|
243
|
|
-```{r showmatches}
|
244
|
|
-str(safeTME.matches)
|
245
|
|
-```
|
246
|
|
-
|
247
|
|
-Now let's mark some regions as pure tumor:
|
248
|
|
-
|
249
|
|
-```{r puretumor}
|
250
|
|
-sharedgenes = intersect(rownames(raw), rownames(safeTME))
|
251
|
|
-plot(colSums(raw), colSums(raw[sharedgenes, ]), log = "xy")
|
252
|
|
-hist(colSums(raw[sharedgenes, ]) / colSums(raw), breaks = 20)
|
253
|
|
-alltumor = colSums(raw[sharedgenes, ]) / colSums(raw) < 0.03 # for alma data
|
254
|
|
-
|
255
|
|
-table(alltumor)
|
256
|
|
-
|
257
|
|
-```
|
258
|
|
-Calculate weights:
|
259
|
|
-```{r wts}
|
260
|
|
-sd_from_noise <- runErrorModel(counts = raw, platform = "st")
|
261
|
|
-wts <- 1 / sd_from_noise
|
262
|
|
-```
|
263
|
|
-
|
264
|
|
-Now let's run spatialdecon:
|
265
|
|
-
|
266
|
|
-```{r runisstils}
|
267
|
|
-
|
268
|
|
-# run spatialdecon with all the bells and whistles:
|
269
|
|
-restils = runspatialdecon(object = andersson_g1, # Seurat object
|
270
|
|
- X = safeTME, # safeTME matrix, used by default
|
271
|
|
- bg = 0.01, # Recommended value of 0.01 in Visium/ST data
|
272
|
|
- wts = wts, # weight
|
273
|
|
- cellmerges = safeTME.matches, # safeTME.matches object, used by default
|
274
|
|
- is_pure_tumor = alltumor, # identities of the Tumor segments/observations
|
275
|
|
- n_tumor_clusters = 5) # how many distinct tumor profiles to append to safeTME
|
276
|
|
-
|
277
|
|
-str(restils)
|
278
|
|
-```
|
279
|
|
-
|
280
|
|
-There are quite a few readouts here. Let's review the important ones:
|
281
|
|
-
|
282
|
|
-* beta: the cell abundance scores of the rolled-up/major cell types
|
283
|
|
-* beta.granular: the cell abundance scores of the granular cell types,
|
284
|
|
-corresponding to the columns of the cell profile matrix
|
285
|
|
-* yhat, resids: the fitted values and log2-scale residuals from the
|
286
|
|
-deconvolution fit. Can be used to measure each observation's goodness-of-fit, a
|
287
|
|
-possible QC metric.
|
288
|
|
-* prop_of_nontumor: the beta matrix rescaled to give the proportions of
|
289
|
|
-non-tumor cells in each observation.
|
290
|
|
-* X: the cell profile matrix used, including newly-derived tumor-specific
|
291
|
|
-columns.
|
292
|
|
-
|
293
|
|
-To illustrate the derivation of tumor profiles, let's look at the cell profile
|
294
|
|
-matrix output by spatialdecon:
|
295
|
|
-
|
296
|
|
-```{r shownewX, fig.height=5, fig.width=8, fig.cap = "safeTME merged with newly-derived tumor profiles"}
|
297
|
|
-heatmap(sweep(restils$X, 1, apply(restils$X, 1, max), "/"),
|
298
|
|
- labRow = NA, margins = c(10, 5))
|
299
|
|
-
|
300
|
|
-```
|
301
|
|
-
|
302
|
|
-Note the new tumor-specific columns.
|
303
|
|
-
|
304
|
|
-
|
305
|
|
-### Plotting deconvolution results
|
306
|
|
-
|
307
|
|
-The "florets" function plotting cell abundances atop some
|
308
|
|
-2-D projection.
|
309
|
|
-Here, we'll plot cell abundances atop the first 2 principal components of the
|
310
|
|
-data:
|
311
|
|
-
|
312
|
|
-```{r florets, fig.width=8, fig.height=6, fig.cap = "TIL abundance plotted on PC space"}
|
313
|
|
-# PCA of the normalized data:
|
314
|
|
-pc = prcomp(t(log2(pmax(norm, 1))))$x[, c(1, 2)]
|
315
|
|
-
|
316
|
|
-# run florets function:
|
317
|
|
-par(mar = c(5,5,1,1))
|
318
|
|
-layout(mat = (matrix(c(1, 2), 1)), widths = c(6, 2))
|
319
|
|
-florets(x = pc[, 1], y = pc[, 2],
|
320
|
|
- b = restils$beta, cex = .5,
|
321
|
|
- xlab = "PC1", ylab = "PC2")
|
322
|
|
-par(mar = c(0,0,0,0))
|
323
|
|
-frame()
|
324
|
|
-legend("center", fill = cellcols[rownames(restils$beta)],
|
325
|
|
- legend = rownames(restils$beta), cex = 0.7)
|
326
|
|
-```
|
327
|
|
-
|
328
|
|
-```{r floretsxy, fig.width=8, fig.height=6, fig.cap = "TIL abundance plotted on physical space"}
|
329
|
|
-# and plot florets in space:
|
330
|
|
-par(mar = c(5,5,1,1))
|
331
|
|
-layout(mat = (matrix(c(1, 2), 1)), widths = c(6, 2))
|
332
|
|
-florets(x = x, y = y,
|
333
|
|
- b = restils$beta, cex = .5,
|
334
|
|
- xlab = "", ylab = "")
|
335
|
|
-par(mar = c(0,0,0,0))
|
336
|
|
-frame()
|
337
|
|
-legend("center", fill = cellcols[rownames(restils$beta)],
|
338
|
|
- legend = rownames(restils$beta), cex = 0.7)
|
339
|
|
-
|
340
|
|
-
|
341
|
|
-```
|
342
|
|
-
|
343
|
|
-So we can see that PC1 roughly tracks many vs. few immune cells, and PC2 tracks
|
344
|
|
-the relative abundance of lymphoid/myeloid populations.
|
345
|
|
-
|
346
|
|
-
|
347
|
|
-### Other functions
|
348
|
|
-
|
349
|
|
-The SpatialDecon library includes several helpful functions for further
|
350
|
|
-analysis/fine-tuning of deconvolution results.
|
351
|
|
-
|
352
|
|
-#### Combining cell types:
|
353
|
|
-
|
354
|
|
-When two cell types are too similar, the estimation of their abundances becomes
|
355
|
|
-unstable. However, their sum can still be estimated easily.
|
356
|
|
-The function "collapseCellTypes" takes a deconvolution results object and
|
357
|
|
-collapses any colsely-related cell types you tell it to:
|
358
|
|
-
|
359
|
|
-```{r collapse, fig.width=5, fig.height=5, fig.cap="Cell abundance estimates with related cell types collapsed"}
|
360
|
|
-matching = list()
|
361
|
|
-matching$myeloid = c( "macrophages", "monocytes", "mDCs")
|
362
|
|
-matching$T.NK = c("CD4.T.cells","CD8.T.cells", "Treg", "NK")
|
363
|
|
-matching$B = c("B")
|
364
|
|
-matching$mast = c("mast")
|
365
|
|
-matching$neutrophils = c("neutrophils")
|
366
|
|
-matching$stroma = c("endothelial.cells", "fibroblasts")
|
367
|
|
-
|
368
|
|
-
|
369
|
|
-collapsed = collapseCellTypes(fit = restils,
|
370
|
|
- matching = matching)
|
371
|
|
-
|
372
|
|
-heatmap(collapsed$beta, cexRow = 0.85, cexCol = 0.75)
|
373
|
|
-```
|
374
|
|
-
|
375
|
|
-
|
376
|
|
-### Session Info
|
377
|
|
-
|
378
|
|
-```{r sessioninfo}
|
379
|
|
-sessionInfo()
|
380
|
|
-```
|
381
|
0
|
\ No newline at end of file
|