Bioconductor Code: scAnnotatR

Browse code

Fix vignettes

Johannes Griss authored on 21/03/2024 14:09:36
Showing 1 changed files

@@ -69,6 +69,10 @@ First, we load the dataset. To reduce the computational complexity of this vigne
                                      ```{r}
                                      zilionis <- ZilionisLungData()
                                      zilionis <- zilionis[, 1:5000]
+                                    +
                                     +# now we add simple colnames (= cell ids) to the dataset
                                     +# Note: This is normally not necessary
                                     +colnames(zilionis) <- paste0("Cell", 1:ncol(zilionis))
                                      ```
                                      We split this dataset into two parts, one for the training and the other for the testing.

Browse code

Fixed typo

Johannes Griss authored on 30/07/2021 13:53:47
Showing 1 changed files

vignettes/training-basic-model.Rmd

History View file @ 35ed297

@@ -24,7 +24,7 @@ easy tools to train their own model classifying new cell types from labeled
                                      scRNA-seq data.
                                      This vignette shows how to train a basic
                                     -classification model for an independant cell type, which is not a child of
                                     +classification model for an independent cell type, which is not a child of
                                      any other cell type.
                                      ## Preparing train object and test object

Browse code

merge separate methods for Seurat and SCE signatures into one function

nttvy authored on 29/07/2021 22:23:02
Showing 1 changed files

vignettes/training-basic-model.Rmd

History View file @ a212ecf

@@ -148,8 +148,8 @@ times for one model, users can use `set.seed`.
                                      ```{r}
                                      set.seed(123)
                                      classifier_B <- train_classifier(train_obj = train_set, cell_type = "B cells",
                                     -                          marker_genes = selected_marker_genes_B,
                                     -                          sce_assay = 'counts', sce_tag_slot = 'B_cell')
                                     +                                 marker_genes = selected_marker_genes_B,
                                     +                                 assay = 'counts', tag_slot = 'B_cell')
                                      ```
                                      ```{r}
                                      classifier_B
@@ -169,8 +169,8 @@ The `test_classifier` model automatically tests a classifier's performance
                                      against another dataset. Here, we used the `test_set` created before:
                                      ```{r}
                                     -classifier_B_test <- test_classifier(test_obj = test_set, classifier = classifier_B,
                                     -                              sce_assay = 'counts', sce_tag_slot = 'B_cell')
                                     +classifier_B_test <- test_classifier(classifier = classifier_B, test_obj = test_set,
                                     +                                     assay = 'counts', tag_slot = 'B_cell')
                                      ```
                                      ### Interpreting test model result

Browse code

remove default_models store in package, use BiocFileCache to manage pretrained models

Former-commit-id: 4a69d2314d0fe5849d5002a1dc1cdd474f9afaff

nttvy authored on 29/07/2021 14:57:11
Showing 1 changed files

vignettes/training-basic-model.Rmd

History View file @ a5d06e3

@@ -233,7 +233,7 @@ New classification models can be stored using the `save_new_model` function:
                                      ```{r}
                                      # no copy of pretrained models is performed
                                     -save_new_model(new_model = classifier_B, path.to.models = tempdir(),
                                     +save_new_model(new_model = classifier_B, path_to_models = tempdir(),
                                                     include.default = FALSE)
                                      ```
@@ -241,7 +241,7 @@ Parameters:
                                        * **new_model**: The new model that should be added to the database in the
                                                         specified directory.
                                     -  * **path.to.models**: The directory where the new models should be stored.
                                     +  * **path_to_models**: The directory where the new models should be stored.
                                        * **include.default**: If set, the default models shipped with the package
                                                               are added to the database.
@@ -253,7 +253,7 @@ Models can be deleted from the model database using the `delete_model` function:
                                      ```{r}
                                      # delete the "B cells" model from the new database
                                     -delete_model("B cells", path.to.models = tempdir())
                                     +delete_model("B cells", path_to_models = tempdir())
                                      ```
                                      ## Session Info

Browse code

replace core clf by caret_model replace all other clf by classifiers

Former-commit-id: 91ec8da24f45c110ab4f2e728ab71e9c1d255ca8

nttvy authored on 22/07/2021 13:51:48
Showing 1 changed files

vignettes/training-basic-model.Rmd

History View file @ adb6f19

@@ -147,12 +147,12 @@ from the majority group. To use the same set of cells while training multiple
                                      times for one model, users can use `set.seed`.
                                      ```{r}
                                      set.seed(123)
                                     -clf_B <- train_classifier(train_obj = train_set, cell_type = "B cells",
                                     +classifier_B <- train_classifier(train_obj = train_set, cell_type = "B cells",
                                                                marker_genes = selected_marker_genes_B,
                                                                sce_assay = 'counts', sce_tag_slot = 'B_cell')
                                      ```
                                      ```{r}
                                     -clf_B
                                     +classifier_B
                                      ```
                                      The classification model is a `scAnnotatR` object.
                                      Details about the classification model are accessible via getter methods.
@@ -160,7 +160,7 @@ Details about the classification model are accessible via getter methods.
                                      For example:
                                      ```{r}
                                     -clf(clf_B)
                                     +caret_model(classifier_B)
                                      ```
                                      ## Test model
@@ -169,7 +169,7 @@ The `test_classifier` model automatically tests a classifier's performance
                                      against another dataset. Here, we used the `test_set` created before:
                                      ```{r}
                                     -clf_B_test <- test_classifier(test_obj = test_set, classifier = clf_B,
                                     +classifier_B_test <- test_classifier(test_obj = test_set, classifier = classifier_B,
                                                                    sce_assay = 'counts', sce_tag_slot = 'B_cell')
                                      ```
@@ -191,7 +191,7 @@ Every classifier internally consists of a trained SVM and a probability threshol
                                      this threshold are classified as the respective cell type. The *overall_roc* slot summarizes the True Positive Rate (sensitivity) and False Positive Rate (1 - specificity) obtained by the trained model according to different thresholds.
                                      ```{r}
                                     -clf_B_test$overall_roc
                                     +classifier_B_test$overall_roc
                                      ```
                                      In this example of B cell classifier, the current threshold is at 0.5. The higher sensitivity can be reached if we set the p_thres at 0.4. However, we will then have lower specificity, which means that we will incorrectly classify some cells as B cells. At the sime time, we may not retrieve all actual B cells with higher p_thres (0.6, for example).
@@ -202,7 +202,7 @@ There is of course a certain trade-off between the sensitivity and the specifici
                                      Apart from numbers, we also provide a method to plot the ROC curve.
                                      ```{r}
                                     -roc_curve <- plot_roc_curve(test_result = clf_B_test)
                                     +roc_curve <- plot_roc_curve(test_result = classifier_B_test)
                                      ```
                                      ```{r}
                                      plot(roc_curve)
@@ -233,7 +233,7 @@ New classification models can be stored using the `save_new_model` function:
                                      ```{r}
                                      # no copy of pretrained models is performed
                                     -save_new_model(new_model = clf_B, path.to.models = tempdir(),
                                     +save_new_model(new_model = classifier_B, path.to.models = tempdir(),
                                                     include.default = FALSE)
                                      ```

Browse code

change features to marker_genes

Former-commit-id: b8f82581ead9bfe31c088db02857a1d1c996d832

nttvy authored on 22/07/2021 11:32:05
Showing 1 changed files

vignettes/training-basic-model.Rmd

History View file @ 7e9ee09

@@ -112,13 +112,13 @@ We may want to check the number of cells in each category:
                                      table(train_set$B_cell)
                                      ```
                                     -## Defining features
                                     +## Defining marker genes
                                     -Next, we define a set of features, which will be used in training the
                                     +Next, we define a set of marker genes, which will be used in training the
                                      classification model. Supposing we are training a model for classifying
                                     -B cells, we define the set of features as follows:
                                     +B cells, we define the set of marker genes as follows:
                                      ```{r}
                                     -selected_features_B <- c("CD19", "MS4A1", "CD79A", "CD79B", 'CD27', 'IGHG1', 'IGHG2', 'IGHM',
                                     +selected_marker_genes_B <- c("CD19", "MS4A1", "CD79A", "CD79B", 'CD27', 'IGHG1', 'IGHG2', 'IGHM',
                                                               "CR2", "MEF2C", 'VPREB3', 'CD86', 'LY86', "BLK", "DERL3")
                                      ```
@@ -128,7 +128,7 @@ When the model is being trained, three pieces of information must be
                                      provided:
                                        * the Seurat/SCE object used for training
                                     -  * the set of applied features
                                     +  * the set of applied marker genes
                                        * the cell type defining the trained model
                                      In case the dataset does not contain any cell classified as the target
@@ -148,7 +148,7 @@ times for one model, users can use `set.seed`.
                                      ```{r}
                                      set.seed(123)
                                      clf_B <- train_classifier(train_obj = train_set, cell_type = "B cells",
                                     -                          features = selected_features_B,
                                     +                          marker_genes = selected_marker_genes_B,
                                                                sce_assay = 'counts', sce_tag_slot = 'B_cell')
                                      ```
                                      ```{r}
@@ -210,8 +210,8 @@ plot(roc_curve)
                                      ### Which model to choose?
                                     -Changes in the training data, in the set of features and in the prediction probability
                                     -threshold will all lead to a change in model performance.
                                     +Changes in the training data, in the set of marker genes and in the prediction
                                     +probability threshold will all lead to a change in model performance.
                                      There are several ways to evaluate the trained model, including the overall
                                      accuracy, the AUC score and the sensitivity/specificity of the model when

Browse code

export models in vignettes to tempdir instead of current directory

Former-commit-id: 15b64d9e0186c72b444cb313c15cb97cd80a3b4e

nttvy authored on 20/07/2021 11:35:28
Showing 1 changed files

vignettes/training-basic-model.Rmd

History View file @ f92ca6b

@@ -147,7 +147,8 @@ from the majority group. To use the same set of cells while training multiple
                                      times for one model, users can use `set.seed`.
                                      ```{r}
                                      set.seed(123)
                                     -clf_B <- train_classifier(train_obj = train_set, cell_type = "B cells", features = selected_features_B,
                                     +clf_B <- train_classifier(train_obj = train_set, cell_type = "B cells",
                                     +                          features = selected_features_B,
                                                                sce_assay = 'counts', sce_tag_slot = 'B_cell')
                                      ```
                                      ```{r}
@@ -232,7 +233,8 @@ New classification models can be stored using the `save_new_model` function:
                                      ```{r}
                                      # no copy of pretrained models is performed
                                     -save_new_model(new_model = clf_B, path.to.models = getwd(),include.default = FALSE)
                                     +save_new_model(new_model = clf_B, path.to.models = tempdir(),
                                     +               include.default = FALSE)
                                      ```
                                      Parameters:
@@ -251,7 +253,7 @@ Models can be deleted from the model database using the `delete_model` function:
                                      ```{r}
                                      # delete the "B cells" model from the new database
                                     -delete_model("B cells", path.to.models = getwd())
                                     +delete_model("B cells", path.to.models = tempdir())
                                      ```
                                      ## Session Info

Browse code

rename package from scClassifR to scAnnotatR

Former-commit-id: 9823f9b8cf0ad38ebbe2bcd6f870a63a98c29d07

nttvy authored on 01/07/2021 12:02:51
Showing 1 changed files

vignettes/training-basic-model.Rmd

History View file @ 499496a

@@ -19,7 +19,7 @@ options(rmarkdown.html_vignette.check_title = FALSE)
                                      ## Introduction
                                     -One of key functions of the scClassifR package is to provide users
                                     +One of key functions of the scAnnotatR package is to provide users
                                      easy tools to train their own model classifying new cell types from labeled
                                      scRNA-seq data.
@@ -50,9 +50,9 @@ To start the training workflow, we first install and load the necessary librarie
                                      if (!requireNamespace("BiocManager", quietly = TRUE))
                                          install.packages("BiocManager")
                                     -# the scClassifR package
                                     -if (!require(scClassifR))
                                     -  BiocManager::install("scClassifR")
                                     +# the scAnnotatR package
                                     +if (!require(scAnnotatR))
                                     +  BiocManager::install("scAnnotatR")
                                      # we use the scRNAseq package to load example data
                                      if (!require(scRNAseq))
@@ -61,7 +61,7 @@ if (!require(scRNAseq))
                                      ```{r}
                                      library(scRNAseq)
                                     -library(scClassifR)
                                     +library(scAnnotatR)
                                      ```
                                      First, we load the dataset. To reduce the computational complexity of this vignette, we only use the first 5000 cells of the dataset.
@@ -105,7 +105,7 @@ test_set$B_cell <- unlist(lapply(test_set$`Most likely LM22 cell type`,
                                                                       function(x) if (is.na(x)) {'ambiguous'} else if (x %in% c('Plasma cells', 'B cells memory', 'B cells naive')) {'B cells'} else {'others'}))
                                      ```
                                     -We observe that there are cells marked NAs. Those can be understood as 1/different from all indicated cell types or 2/any unknown cell types. Here we consider the second case, ie. we don't know whether they are positive or negative to B cells. To avoid any effect of these cells, we can assign them as 'ambiguous'. All cells tagged 'ambiguous' will be ignored by scClassifR from training and testing.
                                     +We observe that there are cells marked NAs. Those can be understood as 1/different from all indicated cell types or 2/any unknown cell types. Here we consider the second case, ie. we don't know whether they are positive or negative to B cells. To avoid any effect of these cells, we can assign them as 'ambiguous'. All cells tagged 'ambiguous' will be ignored by scAnnotatR from training and testing.
                                      We may want to check the number of cells in each category:
                                      ```{r}
@@ -153,7 +153,7 @@ clf_B <- train_classifier(train_obj = train_set, cell_type = "B cells", features
                                      ```{r}
                                      clf_B
                                      ```
                                     -The classification model is a `scClassifR` object.
                                     +The classification model is a `scAnnotatR` object.
                                      Details about the classification model are accessible via getter methods.
                                      For example:

Browse code

rename package

Former-commit-id: c41bda4739a20cb0a346da77aaa02256d756b67f

nttvy authored on 29/12/2020 00:49:20
Showing 1 changed files

vignettes/training-basic-model.Rmd

History View file @ 0ed9c39

@@ -19,7 +19,7 @@ options(rmarkdown.html_vignette.check_title = FALSE)
                                      ## Introduction
                                     -One of key functions of the scTypeR package is to provide users
                                     +One of key functions of the scClassifR package is to provide users
                                      easy tools to train their own model classifying new cell types from labeled
                                      scRNA-seq data.
@@ -50,9 +50,9 @@ To start the training workflow, we first install and load the necessary librarie
                                      if (!requireNamespace("BiocManager", quietly = TRUE))
                                          install.packages("BiocManager")
                                     -# the scTypeR package
                                     -if (!require(scTypeR))
                                     -  BiocManager::install("scTypeR")
                                     +# the scClassifR package
                                     +if (!require(scClassifR))
                                     +  BiocManager::install("scClassifR")
                                      # we use the scRNAseq package to load example data
                                      if (!require(scRNAseq))
@@ -61,7 +61,7 @@ if (!require(scRNAseq))
                                      ```{r}
                                      library(scRNAseq)
                                     -library(scTypeR)
                                     +library(scClassifR)
                                      ```
                                      First, we load the dataset. To reduce the computational complexity of this vignette, we only use the first 5000 cells of the dataset.
@@ -105,7 +105,7 @@ test_set$B_cell <- unlist(lapply(test_set$`Most likely LM22 cell type`,
                                                                       function(x) if (is.na(x)) {'ambiguous'} else if (x %in% c('Plasma cells', 'B cells memory', 'B cells naive')) {'B cells'} else {'others'}))
                                      ```
                                     -We observe that there are cells marked NAs. Those can be understood as 1/different from all indicated cell types or 2/any unknown cell types. Here we consider the second case, ie. we don't know whether they are positive or negative to B cells. To avoid any effect of these cells, we can assign them as 'ambiguous'. All cells tagged 'ambiguous' will be ignored by scTypeR from training and testing.
                                     +We observe that there are cells marked NAs. Those can be understood as 1/different from all indicated cell types or 2/any unknown cell types. Here we consider the second case, ie. we don't know whether they are positive or negative to B cells. To avoid any effect of these cells, we can assign them as 'ambiguous'. All cells tagged 'ambiguous' will be ignored by scClassifR from training and testing.
                                      We may want to check the number of cells in each category:
                                      ```{r}
@@ -153,7 +153,7 @@ clf_B <- train_classifier(train_obj = train_set, cell_type = "B cells", features
                                      ```{r}
                                      clf_B
                                      ```
                                     -The classification model is a `scTypeR` object.
                                     +The classification model is a `scClassifR` object.
                                      Details about the classification model are accessible via getter methods.
                                      For example:

Browse code

add eval = FALSe to the code chunks involving package installation

Former-commit-id: 864885ec7ff787bf98b464fcc188c50693100b77

nttvy authored on 24/12/2020 22:29:17
Showing 1 changed files

vignettes/training-basic-model.Rmd

History View file @ 73dff4d

@@ -45,7 +45,7 @@ which is available in the scRNAseq (2.4.0) library. The dataset is stored as a
                                      SCE object.
                                      To start the training workflow, we first install and load the necessary libraries.
                                     -```{r}
                                     +```{r, eval = FALSE}
                                      # use BiocManager to install from Bioconductor
                                      if (!requireNamespace("BiocManager", quietly = TRUE))
                                          install.packages("BiocManager")
@@ -59,6 +59,11 @@ if (!require(scRNAseq))
                                        BiocManager::install("scRNAseq")
                                      ```
                                     +```{r}
                                     +library(scRNAseq)
                                     +library(scTypeR)
                                     +```
+                                    +
                                      First, we load the dataset. To reduce the computational complexity of this vignette, we only use the first 5000 cells of the dataset.
                                      ```{r}

Browse code

reduce dataset size in vignettes

Former-commit-id: d3cd57efacc4694501f4f615a49b7fe899a06e5b

nttvy authored on 09/12/2020 13:02:17
Showing 1 changed files

vignettes/training-basic-model.Rmd

History View file @ a6da6ff

@@ -59,10 +59,11 @@ if (!require(scRNAseq))
                                        BiocManager::install("scRNAseq")
                                      ```
                                     -Load the dataset:
                                     +First, we load the dataset. To reduce the computational complexity of this vignette, we only use the first 5000 cells of the dataset.
                                      ```{r}
                                      zilionis <- ZilionisLungData()
                                     +zilionis <- zilionis[, 1:5000]
                                      ```
                                      We split this dataset into two parts, one for the training and the other for the testing.
@@ -112,12 +113,8 @@ Next, we define a set of features, which will be used in training the
                                      classification model. Supposing we are training a model for classifying
                                      B cells, we define the set of features as follows:
                                      ```{r}
                                     -selected_features_B <- c("CD19", "MS4A1", "SDC1", "CD79A", "CD79B",
                                     -                         "CD38", "CD37", "CD83", "CR2", "MVK", "MME",
                                     -                         "IL2RA", "PTEN", "POU2AF1", "MEF2C", "IRF8",
                                     -                         "TCF3", "BACH2", "MZB1", 'VPREB3', 'RASGRP2',
                                     -                         'CD86', 'CD84', 'LY86', 'CD74', 'SP140', "BLK",
                                     -                         'FLI1', 'CD14', "DERL3", "LRMP")
                                     +selected_features_B <- c("CD19", "MS4A1", "CD79A", "CD79B", 'CD27', 'IGHG1', 'IGHG2', 'IGHM',
                                     +                         "CR2", "MEF2C", 'VPREB3', 'CD86', 'LY86', "BLK", "DERL3")
                                      ```
                                      ## Train model

Browse code

Change package name

Former-commit-id: af53b823be827fde1d0a8435136b9455293ccbdc

nttvy authored on 01/12/2020 09:03:46
Showing 1 changed files

vignettes/training-basic-model.Rmd

History View file @ 5796ba1

@@ -19,7 +19,7 @@ options(rmarkdown.html_vignette.check_title = FALSE)
                                      ## Introduction
                                     -One of key functions of the SingleCellClassR package is to provide users
                                     +One of key functions of the scTypeR package is to provide users
                                      easy tools to train their own model classifying new cell types from labeled
                                      scRNA-seq data.
@@ -50,9 +50,9 @@ To start the training workflow, we first install and load the necessary librarie
                                      if (!requireNamespace("BiocManager", quietly = TRUE))
                                          install.packages("BiocManager")
                                     -# the SingleCellClassR package
                                     -if (!require(SingleCellClassR))
                                     -  BiocManager::install("SingleCellClassR")
                                     +# the scTypeR package
                                     +if (!require(scTypeR))
                                     +  BiocManager::install("scTypeR")
                                      # we use the scRNAseq package to load example data
                                      if (!require(scRNAseq))
@@ -99,7 +99,7 @@ test_set$B_cell <- unlist(lapply(test_set$`Most likely LM22 cell type`,
                                                                       function(x) if (is.na(x)) {'ambiguous'} else if (x %in% c('Plasma cells', 'B cells memory', 'B cells naive')) {'B cells'} else {'others'}))
                                      ```
                                     -We observe that there are cells marked NAs. Those can be understood as 1/different from all indicated cell types or 2/any unknown cell types. Here we consider the second case, ie. we don't know whether they are positive or negative to B cells. To avoid any effect of these cells, we can assign them as 'ambiguous'. All cells tagged 'ambiguous' will be ignored by SingleCellClassR from training and testing.
                                     +We observe that there are cells marked NAs. Those can be understood as 1/different from all indicated cell types or 2/any unknown cell types. Here we consider the second case, ie. we don't know whether they are positive or negative to B cells. To avoid any effect of these cells, we can assign them as 'ambiguous'. All cells tagged 'ambiguous' will be ignored by scTypeR from training and testing.
                                      We may want to check the number of cells in each category:
                                      ```{r}
@@ -151,7 +151,7 @@ clf_B <- train_classifier(train_obj = train_set, cell_type = "B cells", features
                                      ```{r}
                                      clf_B
                                      ```
                                     -The classification model is a `SingleCellClassR` object.
                                     +The classification model is a `scTypeR` object.
                                      Details about the classification model are accessible via getter methods.
                                      For example:

Browse code

Adapted vignette text

Former-commit-id: f8479b40bc3ff528d07dd7c08bb7b95ed163e1fb

Johannes Griss authored on 16/11/2020 23:02:38
Showing 1 changed files

vignettes/training-basic-model.Rmd

History View file @ 2ddab67

@@ -73,10 +73,12 @@ test_set <- zilionis[, (1+pivot):ncol(zilionis)]
                                      ```
                                      In this dataset, the cell type meta data is stored in the *Most likely LM22 cell type*
                                     -slot of the SingleCellExperiment object (in both train object and test object).
                                     -If cell type is stored in another slot of object meta data,
                                     -the slot/tag slot name must be then provided as a parameter in the
                                     -train and test method.
                                     +slot of the SingleCellExperiment object (in both the train object and test object).
+                                    +
                                     +If the cell type is stored not stored as the default identification (set through
                                     +`Idents` for Seurat object) the slot must be set as a parameter in the training
                                     +and testing function (see below).
+                                    +
                                      ```{r}
                                      unique(train_set$`Most likely LM22 cell type`)
                                      ```

Browse code

Adapted training-basic-model vignette

Former-commit-id: e5f1d8de2f05f7473f211949af702734a4c14cd9

Johannes Griss authored on 16/11/2020 17:28:26
Showing 1 changed files

vignettes/training-basic-model.Rmd

History View file @ c604829

@@ -19,18 +19,21 @@ options(rmarkdown.html_vignette.check_title = FALSE)
                                      ## Introduction
                                     -One of basic functions of the SingleCellClassR package is to provide users
                                     +One of key functions of the SingleCellClassR package is to provide users
                                      easy tools to train their own model classifying new cell types from labeled
                                      scRNA-seq data.
                                     -From the very beginning, this vignette shows how to train a basic
                                     +This vignette shows how to train a basic
                                      classification model for an independant cell type, which is not a child of
                                      any other cell type.
                                      ## Preparing train object and test object
                                     -The workflow starts from a couple of objects where cells have been
                                     -assigned to be different cell types. To do this, users may have annotated
                                     +The workflow starts with either a [Seurat](https://satijalab.org/seurat/) or
                                     +[SingleCellExperiment](https://osca.bioconductor.org/) object where cells have already
                                     +been assigned to different cell types.
+                                    +
                                     +To do this, users may have annotated
                                      scRNA-seq data (by a FACS-sorting process, for example), create a Seurat/
                                      SingleCellExperiment (SCE) object based on the sequencing data and assign the
                                      predetermined cell types as cell meta data. If the scRNA-seq data has not
@@ -41,10 +44,19 @@ In this vignette, we use the human lung dataset from Zilionis et al., 2019,
                                      which is available in the scRNAseq (2.4.0) library. The dataset is stored as a
                                      SCE object.
                                     -To start the training workflow, we first load the neccessary libraries.
                                     +To start the training workflow, we first install and load the necessary libraries.
                                      ```{r}
                                     -library(SingleCellClassR)
                                     -library(scRNAseq)
                                     +# use BiocManager to install from Bioconductor
                                     +if (!requireNamespace("BiocManager", quietly = TRUE))
                                     +    install.packages("BiocManager")
+                                    +
                                     +# the SingleCellClassR package
                                     +if (!require(SingleCellClassR))
                                     +  BiocManager::install("SingleCellClassR")
+                                    +
                                     +# we use the scRNAseq package to load example data
                                     +if (!require(scRNAseq))
                                     +  BiocManager::install("scRNAseq")
                                      ```
                                      Load the dataset:
@@ -53,7 +65,7 @@ Load the dataset:
                                      zilionis <- ZilionisLungData()
                                      ```
                                     -We cut this dataset into two parts, one for the training and the other for the testing.
                                     +We split this dataset into two parts, one for the training and the other for the testing.
                                      ```{r}
                                      pivot = ncol(zilionis)%/%2
                                      train_set <- zilionis[, 1:pivot]
@@ -85,14 +97,14 @@ test_set$B_cell <- unlist(lapply(test_set$`Most likely LM22 cell type`,
                                                                       function(x) if (is.na(x)) {'ambiguous'} else if (x %in% c('Plasma cells', 'B cells memory', 'B cells naive')) {'B cells'} else {'others'}))
                                      ```
                                     -We observe that there are cells marked NAs. Those can be understood as 1/different from all indicated cell types or 2/any unknown cell types. Here we consider the second case, ie. we don't know whether they are positive or negative to B cells. To avoid the affect of those NAs cells, we can assign them as 'ambiguous'. All cells tagged 'ambiguous' will be ignored by SingleCellClassR from training and testing.
                                     +We observe that there are cells marked NAs. Those can be understood as 1/different from all indicated cell types or 2/any unknown cell types. Here we consider the second case, ie. we don't know whether they are positive or negative to B cells. To avoid any effect of these cells, we can assign them as 'ambiguous'. All cells tagged 'ambiguous' will be ignored by SingleCellClassR from training and testing.
                                      We may want to check the number of cells in each category:
                                      ```{r}
                                      table(train_set$B_cell)
                                      ```
                                     -## Defining set of features
                                     +## Defining features
                                      Next, we define a set of features, which will be used in training the
                                      classification model. Supposing we are training a model for classifying
@@ -108,21 +120,27 @@ selected_features_B <- c("CD19", "MS4A1", "SDC1", "CD79A", "CD79B",
                                      ## Train model
                                     -When the model is being trained, three most important information must be
                                     -provided are: the Seurat/SCE object used for training, the set of applied features
                                     -and the cell type defining the trained model.
                                     +When the model is being trained, three pieces of information must be
                                     +provided:
+                                    +
                                     +  * the Seurat/SCE object used for training
                                     +  * the set of applied features
                                     +  * the cell type defining the trained model
+                                    +
                                     +In case the dataset does not contain any cell classified as the target
                                     +cell type, the function will fail.
                                     -Cell type corresponding to the trained model must exist among identities
                                     -assigned to cells in the trained Seurat object. Remember if cell types
                                     -are not indicated as active identification of the trained object, name
                                     -of the tag slot in object meta data must be provided to the sce_tag_slot parameter.
                                     +If the cell type annotation is not set in the default identification slot
                                     +(`Idents` for `Seurat` objects) the name
                                     +of the metadata field must be provided to the `sce_tag_slot parameter`.
                                     -When training on a imbalanced dataset, the trained model may bias toward the
                                     +When training on an imbalanced dataset (f.e. a datasets containing 90% B cells and
                                     +only very few other cell types), the trained model may bias toward the
                                      majority group and ignore the presence of the minority group. To avoid this,
                                     -the number of positive cells and negative cells will automatically be balanced
                                     +the number of positive cells and negative cells will be automatically balanced
                                      before training. Therefore, a smaller number cells will be randomly picked
                                      from the majority group. To use the same set of cells while training multiple
                                     -times for one model, users can use set.seed.
                                     +times for one model, users can use `set.seed`.
                                      ```{r}
                                      set.seed(123)
                                      clf_B <- train_classifier(train_obj = train_set, cell_type = "B cells", features = selected_features_B,
@@ -131,15 +149,20 @@ clf_B <- train_classifier(train_obj = train_set, cell_type = "B cells", features
                                      ```{r}
                                      clf_B
                                      ```
                                     -The classifying model is a S4 object named SingleCellClassR.
                                     -Details about the classifying model is accessible via getter methods.
                                     +The classification model is a `SingleCellClassR` object.
                                     +Details about the classification model are accessible via getter methods.
+                                    +
                                      For example:
+                                    +
                                      ```{r}
                                      clf(clf_B)
                                      ```
                                      ## Test model
                                     +The `test_classifier` model automatically tests a classifier's performance
                                     +against another dataset. Here, we used the `test_set` created before:
+                                    +
                                      ```{r}
                                      clf_B_test <- test_classifier(test_obj = test_set, classifier = clf_B,
                                                                    sce_assay = 'counts', sce_tag_slot = 'B_cell')
@@ -155,17 +178,18 @@ Apart from the output exported to console, test classifier function also returns
                                        * **acc**: prediction accuracy at the fixed probability threshold, the probability threshold value can also be queried using *p_thres(classifier)*
                                     -  * **auc**: AUC score of provided by current classifier
                                     +  * **auc**: AUC score provided by current classifier
                                        * **overall_roc**: True Positive Rate and False Positive Rate with a certain number of prediction probability thresholds
                                     -With the same classification model, the sensitivity and the specification of classification can be different because of the prediction probability threshold. To optimize user experience, we have the *overall_roc* as a summary of True Positive Rate (sensitivity) and False Positive Rate (1 - specificity) obtained by the trained model according to different thresholds:
                                     +Every classifier internally consists of a trained SVM and a probability threshold. Only cells that are classified with a probability exceeding
                                     +this threshold are classified as the respective cell type. The *overall_roc* slot summarizes the True Positive Rate (sensitivity) and False Positive Rate (1 - specificity) obtained by the trained model according to different thresholds.
                                      ```{r}
                                      clf_B_test$overall_roc
                                      ```
                                     -In this example of B cell classifier, the current threshold is at 0.5. The higher sensitivity can be reached if we set the p_thres at 0.4. However, we will then have lower specificity, which means that we misclassify more stranger cells as B cells. In contradiction, we may not retrieve all actual B cells with higher p_thres (0.6, for example).
                                     +In this example of B cell classifier, the current threshold is at 0.5. The higher sensitivity can be reached if we set the p_thres at 0.4. However, we will then have lower specificity, which means that we will incorrectly classify some cells as B cells. At the sime time, we may not retrieve all actual B cells with higher p_thres (0.6, for example).
                                      There is of course a certain trade-off between the sensitivity and the specificity of the model. Depending on the need of the project or the user-own preference, a probability threshold giving higher sensitivity or higher specificity can be chosen. In our perspective, p_thres at 0.5 is a good choice for the current B cell model.
@@ -181,13 +205,12 @@ plot(roc_curve)
                                      ### Which model to choose?
                                     -Changes in train data, in the set of features and in the prediction probability
                                     +Changes in the training data, in the set of features and in the prediction probability
                                      threshold will all lead to a change in model performance.
                                      There are several ways to evaluate the trained model, including the overall
                                      accuracy, the AUC score and the sensitivity/specificity of the model when
                                     -testing on an independent dataset. Here, we export all these statistics to
                                     -help user have a wider range of choices. In this example, we choose the model
                                     +testing on an independent dataset. In this example, we choose the model
                                      which has the best AUC score.
                                      *Tip: Using more general markers of the whole population leads to higher
@@ -195,29 +218,36 @@ sensitivity. This sometimes produces lower specificity because of close
                                      cell types (T cells and NK cells, for example). While training some models,
                                      we observed that we can use the markers producing high sensitivity but at
                                      the same time can improve the specificity by increasing the probability
                                     -threshold. Of course, this tip can only applied in some cases, because
                                     +threshold. Of course, this can only applied in some cases, because
                                      some markers can even have a larger affect on the specificity than the
                                      prediction probability threshold.*
                                      ## Save classification model for further use
                                     -After having obtained a good classification model, users may want to save it
                                     -for future classification. To do this, we provide a method that helps the user
                                     -step-by-step store all new classification models.
                                     +New classification models can be stored using the `save_new_model` function:
                                     -To use this method, two information must be provided: the to be saved model and
                                     -the directory path where the new model will be stored. This method will then
                                     -create a small database containing all trained models. Therefore, users must
                                     -indicate the same path to models in order to use multiple classification models
                                     -at the same time.
                                     +```{r}
                                     +# no copy of pretrained models is performed
                                     +save_new_model(new_model = clf_B, path.to.models = getwd(),include.default = FALSE)
                                     +```
+                                    +
                                     +Parameters:
+                                    +
                                     +  * **new_model**: The new model that should be added to the database in the
                                     +                   specified directory.
                                     +  * **path.to.models**: The directory where the new models should be stored.
                                     +  * **include.default**: If set, the default models shipped with the package
                                     +                         are added to the database.
                                      Users can also choose whether copy all pretrained models of the packages to the
                                      new model database. If not, in the future, user can only choose to use either
                                      default pretrained models or new models by specifying only one path to models.
                                     +Models can be deleted from the model database using the `delete_model` function:
+                                    +
                                      ```{r}
                                     -# no copy of pretrained models is performed
                                     -save_new_model(new_model = clf_B, path.to.models = getwd(),include.default = FALSE)
                                     +# delete the "B cells" model from the new database
                                     +delete_model("B cells", path.to.models = getwd())
                                      ```
                                      ## Session Info

Browse code

fix bugs + adapt vignettes to scRNAseq dataset

Former-commit-id: fa2385410e4cefdc694545daa48892d15ffe8921

nttvy authored on 14/11/2020 11:50:55
Showing 1 changed files

vignettes/training-basic-model.Rmd

History View file @ 855ac3a

@@ -29,41 +29,67 @@ any other cell type.
                                      ## Preparing train object and test object
                                     -The workflow starts from a couple of Seurat objects where cells have been
                                     +The workflow starts from a couple of objects where cells have been
                                      assigned to be different cell types. To do this, users may have annotated
                                     -scRNA-seq data (by a FACS-sorting process, for example), create a Seurat
                                     -object based on the sequencing data and assign the predetermined cell types
                                     -as Seurat meta data. If the scRNA-seq data has not been annotated yet,
                                     -another possible approach is to follow the basic Seurat workflow until
                                     -assigning cell type identity to clusters.
                                     +scRNA-seq data (by a FACS-sorting process, for example), create a Seurat/
                                     +SingleCellExperiment (SCE) object based on the sequencing data and assign the
                                     +predetermined cell types as cell meta data. If the scRNA-seq data has not
                                     +been annotated yet, another possible approach is to follow the basic
                                     +workflow (Seurat, for example) until assigning cell type identity to clusters.
+                                    +
                                     +In this vignette, we use the human lung dataset from Zilionis et al., 2019,
                                     +which is available in the scRNAseq (2.4.0) library. The dataset is stored as a
                                     +SCE object.
                                      To start the training workflow, we first load the neccessary libraries.
                                      ```{r}
                                      library(SingleCellClassR)
                                     -library(SingleCellClassR.data)
                                     +library(scRNAseq)
                                      ```
                                     -One Seurat object will be used as train object, while the other is the test
                                     -object. In this example, we used Sade-Feldman dataset to create the train
                                     -object.
                                     +Load the dataset:
                                      ```{r}
                                     -data("feldman_seurat")
                                     -feldman_seurat
                                     +zilionis <- ZilionisLungData()
                                      ```
                                     -We load Jerby-Arnon dataset for the testing object.
                                     +We cut this dataset into two parts, one for the training and the other for the testing.
                                      ```{r}
                                     -data("jerby_seurat")
                                     -jerby_seurat
                                     +pivot = ncol(zilionis)%/%2
                                     +train_set <- zilionis[, 1:pivot]
                                     +test_set <- zilionis[, (1+pivot):ncol(zilionis)]
                                      ```
                                     -In our example, the cell type meta data is indicated as the active
                                     -identification of the Seurat object (in both train object and test
                                     -object). If cell type is stored in another slot of object meta data,
+                                    +
                                     +In this dataset, the cell type meta data is stored in the *Most likely LM22 cell type*
                                     +slot of the SingleCellExperiment object (in both train object and test object).
                                     +If cell type is stored in another slot of object meta data,
                                      the slot/tag slot name must be then provided as a parameter in the
                                      train and test method.
                                      ```{r}
                                     -head(Idents(feldman_seurat))
                                     +unique(train_set$`Most likely LM22 cell type`)
                                     +```
                                     +```{r}
                                     +unique(test_set$`Most likely LM22 cell type`)
                                     +```
+                                    +
                                     +We want to train a classifier for B cells and their phenotypes. Considering memory B cells,
                                     +naive B cells and plasma cells as B cell phenotypes, we convert all those cells to a uniform
                                     +cell label, ie. B cells. All non B cells are converted into 'others'.
+                                    +
                                     +```{r}
                                     +# change cell label
                                     +train_set$B_cell <- unlist(lapply(train_set$`Most likely LM22 cell type`,
                                     +                                  function(x) if (is.na(x)) {'ambiguous'} else if (x %in% c('Plasma cells', 'B cells memory', 'B cells naive')) {'B cells'} else {'others'}))
+                                    +
                                     +test_set$B_cell <- unlist(lapply(test_set$`Most likely LM22 cell type`,
                                     +                                 function(x) if (is.na(x)) {'ambiguous'} else if (x %in% c('Plasma cells', 'B cells memory', 'B cells naive')) {'B cells'} else {'others'}))
                                     +```
+                                    +
                                     +We observe that there are cells marked NAs. Those can be understood as 1/different from all indicated cell types or 2/any unknown cell types. Here we consider the second case, ie. we don't know whether they are positive or negative to B cells. To avoid the affect of those NAs cells, we can assign them as 'ambiguous'. All cells tagged 'ambiguous' will be ignored by SingleCellClassR from training and testing.
+                                    +
                                     +We may want to check the number of cells in each category:
                                     +```{r}
                                     +table(train_set$B_cell)
                                      ```
                                      ## Defining set of features
@@ -83,13 +109,13 @@ selected_features_B <- c("CD19", "MS4A1", "SDC1", "CD79A", "CD79B",
                                      ## Train model
                                      When the model is being trained, three most important information must be
                                     -provided are: the Seurat object used for training, the set of applied features
                                     +provided are: the Seurat/SCE object used for training, the set of applied features
                                      and the cell type defining the trained model.
                                      Cell type corresponding to the trained model must exist among identities
                                      assigned to cells in the trained Seurat object. Remember if cell types
                                      are not indicated as active identification of the trained object, name
                                     -of the tag slot in object meta data must be provided to the tag_slot parameter.
                                     +of the tag slot in object meta data must be provided to the sce_tag_slot parameter.
                                      When training on a imbalanced dataset, the trained model may bias toward the
                                      majority group and ignore the presence of the minority group. To avoid this,
@@ -99,8 +125,8 @@ from the majority group. To use the same set of cells while training multiple
                                      times for one model, users can use set.seed.
                                      ```{r}
                                      set.seed(123)
                                     -clf_B <- train_classifier(train_obj = feldman_seurat,
                                     -features = selected_features_B, cell_type = "B cells")
                                     +clf_B <- train_classifier(train_obj = train_set, cell_type = "B cells", features = selected_features_B,
                                     +                          sce_assay = 'counts', sce_tag_slot = 'B_cell')
                                      ```
                                      ```{r}
                                      clf_B
@@ -115,7 +141,8 @@ clf(clf_B)
                                      ## Test model
                                      ```{r}
                                     -clf_B_test <- test_classifier(test_obj = jerby_seurat, classifier = clf_B)
                                     +clf_B_test <- test_classifier(test_obj = test_set, classifier = clf_B,
                                     +                              sce_assay = 'counts', sce_tag_slot = 'B_cell')
                                      ```
                                      ### Interpreting test model result
@@ -138,7 +165,7 @@ With the same classification model, the sensitivity and the specification of cla
                                      clf_B_test$overall_roc
                                      ```
                                     -In this example of B cell classifier, the current threshold is at 0.5. The sensitivity is 0.9932203, and the specificity is 0.9890493 (FPR = 0.010950643). The higher sensitivity (0.9943503) can be reached if we set the p_thres at 0.4. However, we will have lower specificity (FPR = 0.013966037), which means that we misclassify more stranger cells as B cells. In contradiction, we may not retrieve all actual B cells with higher p_thres (0.6, for example).
                                     +In this example of B cell classifier, the current threshold is at 0.5. The higher sensitivity can be reached if we set the p_thres at 0.4. However, we will then have lower specificity, which means that we misclassify more stranger cells as B cells. In contradiction, we may not retrieve all actual B cells with higher p_thres (0.6, for example).
                                      There is of course a certain trade-off between the sensitivity and the specificity of the model. Depending on the need of the project or the user-own preference, a probability threshold giving higher sensitivity or higher specificity can be chosen. In our perspective, p_thres at 0.5 is a good choice for the current B cell model.

Browse code

Initial commit

Former-commit-id: c30d5e0eb17749597b12316855f05517b93b7be9

Johannes Griss authored on 10/11/2020 10:34:39
Showing 1 changed files

vignettes/training-basic-model.Rmd

History View file @ 0065b78

                                     new file mode 100644
@@ -0,0 +1,199 @@
                                     +---
                                     +title: "Training basic model classifying a cell type from scRNA-seq data"
                                     +author: "Vy Nguyen"
                                     +date: "`r Sys.Date()`"
                                     +output: rmarkdown::html_vignette
                                     +vignette: >
                                     +  %\VignetteIndexEntry{2. Training basic model}
                                     +  %\VignetteEngine{knitr::rmarkdown}
                                     +  \usepackage[utf8]{inputenc}
                                     +---
+                                    +
                                     +```{r setup, include = FALSE}
                                     +knitr::opts_chunk$set(
                                     +  collapse = TRUE,
                                     +  comment = "#>"
                                     +)
                                     +options(rmarkdown.html_vignette.check_title = FALSE)
                                     +```
+                                    +
                                     +## Introduction
+                                    +
                                     +One of basic functions of the SingleCellClassR package is to provide users
                                     +easy tools to train their own model classifying new cell types from labeled
                                     +scRNA-seq data.
+                                    +
                                     +From the very beginning, this vignette shows how to train a basic
                                     +classification model for an independant cell type, which is not a child of
                                     +any other cell type.
+                                    +
                                     +## Preparing train object and test object
+                                    +
                                     +The workflow starts from a couple of Seurat objects where cells have been
                                     +assigned to be different cell types. To do this, users may have annotated
                                     +scRNA-seq data (by a FACS-sorting process, for example), create a Seurat
                                     +object based on the sequencing data and assign the predetermined cell types
                                     +as Seurat meta data. If the scRNA-seq data has not been annotated yet,
                                     +another possible approach is to follow the basic Seurat workflow until
                                     +assigning cell type identity to clusters.
+                                    +
                                     +To start the training workflow, we first load the neccessary libraries.
                                     +```{r}
                                     +library(SingleCellClassR)
                                     +library(SingleCellClassR.data)
                                     +```
+                                    +
                                     +One Seurat object will be used as train object, while the other is the test
                                     +object. In this example, we used Sade-Feldman dataset to create the train
                                     +object.
+                                    +
                                     +```{r}
                                     +data("feldman_seurat")
                                     +feldman_seurat
                                     +```
+                                    +
                                     +We load Jerby-Arnon dataset for the testing object.
                                     +```{r}
                                     +data("jerby_seurat")
                                     +jerby_seurat
                                     +```
                                     +In our example, the cell type meta data is indicated as the active
                                     +identification of the Seurat object (in both train object and test
                                     +object). If cell type is stored in another slot of object meta data,
                                     +the slot/tag slot name must be then provided as a parameter in the
                                     +train and test method.
                                     +```{r}
                                     +head(Idents(feldman_seurat))
                                     +```
+                                    +
                                     +## Defining set of features
+                                    +
                                     +Next, we define a set of features, which will be used in training the
                                     +classification model. Supposing we are training a model for classifying
                                     +B cells, we define the set of features as follows:
                                     +```{r}
                                     +selected_features_B <- c("CD19", "MS4A1", "SDC1", "CD79A", "CD79B",
                                     +                         "CD38", "CD37", "CD83", "CR2", "MVK", "MME",
                                     +                         "IL2RA", "PTEN", "POU2AF1", "MEF2C", "IRF8",
                                     +                         "TCF3", "BACH2", "MZB1", 'VPREB3', 'RASGRP2',
                                     +                         'CD86', 'CD84', 'LY86', 'CD74', 'SP140', "BLK",
                                     +                         'FLI1', 'CD14', "DERL3", "LRMP")
                                     +```
+                                    +
                                     +## Train model
+                                    +
                                     +When the model is being trained, three most important information must be
                                     +provided are: the Seurat object used for training, the set of applied features
                                     +and the cell type defining the trained model.
+                                    +
                                     +Cell type corresponding to the trained model must exist among identities
                                     +assigned to cells in the trained Seurat object. Remember if cell types
                                     +are not indicated as active identification of the trained object, name
                                     +of the tag slot in object meta data must be provided to the tag_slot parameter.
+                                    +
                                     +When training on a imbalanced dataset, the trained model may bias toward the
                                     +majority group and ignore the presence of the minority group. To avoid this,
                                     +the number of positive cells and negative cells will automatically be balanced
                                     +before training. Therefore, a smaller number cells will be randomly picked
                                     +from the majority group. To use the same set of cells while training multiple
                                     +times for one model, users can use set.seed.
                                     +```{r}
                                     +set.seed(123)
                                     +clf_B <- train_classifier(train_obj = feldman_seurat,
                                     +features = selected_features_B, cell_type = "B cells")
                                     +```
                                     +```{r}
                                     +clf_B
                                     +```
                                     +The classifying model is a S4 object named SingleCellClassR.
                                     +Details about the classifying model is accessible via getter methods.
                                     +For example:
                                     +```{r}
                                     +clf(clf_B)
                                     +```
+                                    +
                                     +## Test model
+                                    +
                                     +```{r}
                                     +clf_B_test <- test_classifier(test_obj = jerby_seurat, classifier = clf_B)
                                     +```
+                                    +
                                     +### Interpreting test model result
+                                    +
                                     +Apart from the output exported to console, test classifier function also returns an object, which is a list of:
+                                    +
                                     +  * **test_tag**: actual cell label, this can be different from the label provided by users because of ambiguous characters or the incoherence in cell type and sub cell type label assignment.
+                                    +
                                     +  * **pred**: cell type prediction using current classifier
+                                    +
                                     +  * **acc**: prediction accuracy at the fixed probability threshold, the probability threshold value can also be queried using *p_thres(classifier)*
+                                    +
                                     +  * **auc**: AUC score of provided by current classifier
+                                    +
                                     +  * **overall_roc**: True Positive Rate and False Positive Rate with a certain number of prediction probability thresholds
+                                    +
                                     +With the same classification model, the sensitivity and the specification of classification can be different because of the prediction probability threshold. To optimize user experience, we have the *overall_roc* as a summary of True Positive Rate (sensitivity) and False Positive Rate (1 - specificity) obtained by the trained model according to different thresholds:
+                                    +
                                     +```{r}
                                     +clf_B_test$overall_roc
                                     +```
+                                    +
                                     +In this example of B cell classifier, the current threshold is at 0.5. The sensitivity is 0.9932203, and the specificity is 0.9890493 (FPR = 0.010950643). The higher sensitivity (0.9943503) can be reached if we set the p_thres at 0.4. However, we will have lower specificity (FPR = 0.013966037), which means that we misclassify more stranger cells as B cells. In contradiction, we may not retrieve all actual B cells with higher p_thres (0.6, for example).
+                                    +
                                     +There is of course a certain trade-off between the sensitivity and the specificity of the model. Depending on the need of the project or the user-own preference, a probability threshold giving higher sensitivity or higher specificity can be chosen. In our perspective, p_thres at 0.5 is a good choice for the current B cell model.
+                                    +
                                     +### Plotting ROC curve
+                                    +
                                     +Apart from numbers, we also provide a method to plot the ROC curve.
                                     +```{r}
                                     +roc_curve <- plot_roc_curve(test_result = clf_B_test)
                                     +```
                                     +```{r}
                                     +plot(roc_curve)
                                     +```
+                                    +
                                     +### Which model to choose?
+                                    +
                                     +Changes in train data, in the set of features and in the prediction probability
                                     +threshold will all lead to a change in model performance.
+                                    +
                                     +There are several ways to evaluate the trained model, including the overall
                                     +accuracy, the AUC score and the sensitivity/specificity of the model when
                                     +testing on an independent dataset. Here, we export all these statistics to
                                     +help user have a wider range of choices. In this example, we choose the model
                                     +which has the best AUC score.
+                                    +
                                     +*Tip: Using more general markers of the whole population leads to higher
                                     +sensitivity. This sometimes produces lower specificity because of close
                                     +cell types (T cells and NK cells, for example). While training some models,
                                     +we observed that we can use the markers producing high sensitivity but at
                                     +the same time can improve the specificity by increasing the probability
                                     +threshold. Of course, this tip can only applied in some cases, because
                                     +some markers can even have a larger affect on the specificity than the
                                     +prediction probability threshold.*
+                                    +
                                     +## Save classification model for further use
+                                    +
                                     +After having obtained a good classification model, users may want to save it
                                     +for future classification. To do this, we provide a method that helps the user
                                     +step-by-step store all new classification models.
+                                    +
                                     +To use this method, two information must be provided: the to be saved model and
                                     +the directory path where the new model will be stored. This method will then
                                     +create a small database containing all trained models. Therefore, users must
                                     +indicate the same path to models in order to use multiple classification models
                                     +at the same time.
+                                    +
                                     +Users can also choose whether copy all pretrained models of the packages to the
                                     +new model database. If not, in the future, user can only choose to use either
                                     +default pretrained models or new models by specifying only one path to models.
+                                    +
                                     +```{r}
                                     +# no copy of pretrained models is performed
                                     +save_new_model(new_model = clf_B, path.to.models = getwd(),include.default = FALSE)
                                     +```
+                                    +
                                     +## Session Info
                                     +```{r}
                                     +sessionInfo()
                                     +```
                                     \ No newline at end of file