2deecfa7 |
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
# Purpose
This vignette describes how to use the `SimpleCOMPASS` interface in the `COMPASS` package to fit a polyfunctionality response model to tabular ICS data.
In contrast to `COMPASS()` which is tied more closely to the `GatingSet` data structures available in the core BioConductor flow cytometry infrastructure, `SimpleCOMPASS()` allows the user to used tabular cell count data (exported from FlowJo, for example) directly for `COMPASS` model fitting.
This approach requires a bit more care on the user's part, to ensure data are formatted and ordered correctly, but can can be less intimidating than working with the core flow infrastructure.
## Prerequisites
We'll assume that the data are provided as an `excel` document. We'll use some of the data from the paper: @Rakshit2017-go. The data are available in the COMPASS package as of version 1.17.2.
We need a good package to load the excel file, we'll use the tidyverse pacakge `readxl`. If it's not installed, you can install it with the command
```{r install_readxl, eval=FALSE}
|
2deecfa7 |
dim(counts)
# NOTE that we still have too many columns.
# We should have 256, not 257.
# We'll fix this. The first column is the total. We'd need this for COMPASS, but not for SimpleCOMPASS.
# drop the total count
counts = counts[,-1L]
dim(counts)
```
Read the code comments above, there are some imporant details there.
Next we split the counts matrix, but we notice that we have a typo in our metadata `Stimulation` variable, we'll fix that and proceed.
We assign the stimulated counts to the new `n_s` variable and the unstimulated counts to the `n_u` variable.
```{r split_counts}
unique(metadata$Stimulation)
# Which entries have the typos
typos = which(metadata$Stimulation=="UnStimulated")
#Correct them
metadata$Stimulation[typos] = "Unstimulated"
# Better
unique(metadata$Stimulation)
# old school R
n_u = subset(counts, metadata$Stimulation == "Unstimulated")
n_s = subset(counts, metadata$Stimulation == "ESAT6/CFP10")
```
### 3. A Unique Sample Specific Identifier
In this case, this is simple as each subject corresponds to a unique sample. In more complex cases where there are multiple timepoints, we would construct a unique identifier by concatenating the visit and the subject ids.
```{r unique_id}
metadata$unique_id = metadata$`PATIENT ID`
```
### 4. Name and order rows of all the matrices.
The rows of `metadata`, `n_s` and `n_u` are all consistent, although `metadata` has twice as many rows as `n_s` or `n_u` since we split the counts matrix.
```{r}
# assign consistent row names to n_u and n_s
rownames(n_u) = subset(metadata$unique_id,
metadata$Stimulation=="Unstimulated")
rownames(n_s) = subset(metadata$unique_id,
metadata$Stimulation=="ESAT6/CFP10")
# Now all matrices have the same dimensions and appropriate rownames
metadata = subset(metadata,
metadata$Stimulation=="ESAT6/CFP10")
```
We have subset metadata to include only the stimulated rows. We have named the columns of the split count matrices according to the unique ids of the metadata matrix entries for the stimulated or non-stimulated entries, as appropriate.
### 5. Reformat cell population names.
Let's look at the current population names.
```{r}
colnames(n_s)[1]
```
There is a gating path prefix (we won't need that), and a trailing ",Count" string, presumably to specify that these are count values as oppposed to proportions. We won't need that either.
The cytokine names are in a short form (i.e. 17F corresponds to IL17F), and they use the "+/-" naming scheme to indicate cells that are positive or negative for a particular cytokine. This is not the format required by `COMPASS`.
We need to drop the path prefix, and use boolean operators to specify the cell combinations. For example "A+/B+C-" would translate to "A&B&!C", read as "A and B and NOT C".
COMPASS provides a convenience function to do this for you `translate_marker_names()`.
```{r}
# remove the path
nms = basename(colnames(n_s))
# translate the marker names to a COMPASS compatible format.
nms = COMPASS:::translate_marker_names(nms)
sample(nms,2)
colnames(n_s) = nms
nms = basename(colnames(n_u))
# translate the marker names to a COMPASS compatible format.
nms = COMPASS:::translate_marker_names(nms)
sample(nms,2)
colnames(n_u) = nms
```
If we want, we can rename the cytokines to familiar form.
```{r rename_markers}
#2 to IL2
colnames(n_s) = gsub("([&!])2([&!])","\\1IL2\\2",colnames(n_s))
#10 to IL10
colnames(n_s) = gsub("([&!])10([&!])","\\1IL10\\2",colnames(n_s))
# 17A to IL17A
colnames(n_s) = gsub("([&!])17A([&!])","\\1IL17A\\2",colnames(n_s))
# 17F to IL17F
colnames(n_s) = gsub("([&!])17F([&!])","\\1IL17F\\2",colnames(n_s))
# 22 to IL22
colnames(n_s) = gsub("([&!])22([&!])","\\1IL22\\2",colnames(n_s))
```
We can put the above in a function:
```{r rename_function}
rename_cytokines = function(nms){
#2 to IL2
nms = gsub("([&!])2([&!])",
"\\1IL2\\2", nms)
#10 to IL10
nms = gsub("([&!])10([&!])",
"\\1IL10\\2", nms)
# 17A to IL17A
nms = gsub("([&!])17A([&!])",
"\\1IL17A\\2", nms)
# 17F to IL17F
nms = gsub("([&!])17F([&!])",
"\\1IL17F\\2", nms)
# 22 to IL22
nms = gsub("([&!])22([&!])",
"\\1IL22\\2", nms)
}
```
And apply it as so:
```{r rename_markers_with_function}
colnames(n_s) = rename_cytokines(colnames(n_s))
colnames(n_u) = rename_cytokines(colnames(n_u))
```
### 6. Last column should be the all-negative boolean combination.
The last column of the count matrices should be the all-negative cell subset. Let's confirm that is is.
```{r}
colnames(n_s)[ncol(n_s)]
colnames(n_u)[ncol(n_u)]
```
## Fit a COMPASS model
The arguments required by `SimpleCOMPASS` are:
```{r}
args(COMPASS::SimpleCOMPASS)
```
We have already defined `n_s`, `n_u`, and `meta` (metadata). The `individual_id` is a character name of the variable for the unique_id, in our case "unique_id".
We'll limit the number of iterations and replications for the purpose of the vignette, so we won't have a good fit, but it's only for demonstration.
If you see a message about reordering the rows of n_s and n_u, double check your work, it likely means that the rownames of the count matrices don't match the `unique_id` variable, or are not consistent between each other.
```{r fit_model, cache = TRUE}
fit = COMPASS::SimpleCOMPASS(n_s = n_s,
n_u = n_u,
meta = metadata,
individual_id = "unique_id",
iterations = 1000,
replications = 3)
```
For a complete run, iterations should be 400000, and replications should be 8 (the defaults for `COMPASS`).
# Visualization of results
We can extract polyfunctionality scores with the `scores()` function. We look at the distribution of the functionality score (FS) across groups.
```{r scores}
library(ggplot2)
library(dplyr)
scores(fit) %>%
ggplot() +
geom_boxplot(outlier.colour = NA) +
geom_jitter() +
aes(y = FS, x = GROUP)
```
A heatmap can be drawn using `plot()`.
```{r}
plot(fit,"GROUP")
```
# References
|