Bioconductor Code: singleCellTK

Raw Blame Patch Log History
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/celda_decontX.R
\name{runDecontX}
\alias{runDecontX}
\title{Detecting contamination with DecontX.}
\usage{
runDecontX(
  inSCE,
  sample = NULL,
  useAssay = "counts",
  background = NULL,
  bgAssayName = NULL,
  bgBatch = NULL,
  z = NULL,
  maxIter = 500,
  delta = c(10, 10),
  estimateDelta = TRUE,
  convergence = 0.001,
  iterLogLik = 10,
  varGenes = 5000,
  dbscanEps = 1,
  seed = 12345,
  logfile = NULL,
  verbose = TRUE
)
}
\arguments{
\item{inSCE}{A \link[SingleCellExperiment]{SingleCellExperiment} object.}

\item{sample}{A single character specifying a name that can be found in
\code{colData(inSCE)} to directly use the cell annotation; or a character
vector with as many elements as cells to indicates which sample each cell
belongs to. Default NULL. \link[celda]{decontX} will be run on cells from
each sample separately.}

\item{useAssay}{A string specifying which assay in the SCE to use. Default
'counts'.}

\item{background}{A \link[SingleCellExperiment]{SingleCellExperiment}
with the matrix located in the assay slot under \code{bgAssayName}. It should have 
the same structure as inSCE except it contains the matrix of empty droplets instead 
of cells. When supplied, empirical distribution of transcripts from these 
empty droplets will be used as the contamination distribution. Default NULL.}

\item{bgAssayName}{Character. Name of the assay to use if background is a 
\link[SingleCellExperiment]{SingleCellExperiment}. If NULL, the function
will use the same value as \code{useAssay}. Default is NULL.}

\item{bgBatch}{Batch labels for \code{background}. If \code{background} is a 
\link[SingleCellExperiment]{SingleCellExperiment} object, this can be a single 
character specifying a name that can be found in \code{colData(background)} 
to directly use the barcode annotation; or a numeric / character vector that has  
as many elements as barcodes to indicate which sample each barcode belongs to. Its 
unique values should be the same as those in \code{sample}, such that each 
batch of cells have their corresponding batch of empty droplets as background, 
pointed by this parameter. Default to NULL.}

\item{z}{Numeric or character vector. Cell cluster labels. If NULL,
PCA will be used to reduce the dimensionality of the dataset initially,
'\link[uwot]{umap}' from the 'uwot' package
will be used to further reduce the dataset to 2 dimenions and
the '\link[dbscan]{dbscan}' function from the 'dbscan' package
will be used to identify clusters of broad cell types. Default NULL.}

\item{maxIter}{Integer. Maximum iterations of the EM algorithm. Default 500.}

\item{delta}{Numeric Vector of length 2. Concentration parameters for
the Dirichlet prior for the contamination in each cell. The first element
is the prior for the native counts while the second element is the prior for
the contamination counts. These essentially act as pseudocounts for the
native and contamination in each cell. If \code{estimateDelta = TRUE},
this is only used to produce a random sample of proportions for an initial
value of contamination in each cell. Then
\code{\link[MCMCprecision]{fit_dirichlet}} is used to update
\code{delta} in each iteration.
If \code{estimateDelta = FALSE}, then \code{delta} is fixed with these
values for the entire inference procedure. Fixing \code{delta} and
setting a high number in the second element will force \code{decontX}
to be more aggressive and estimate higher levels of contamination at
the expense of potentially removing native expression.
Default \code{c(10, 10)}.}

\item{estimateDelta}{Boolean. Whether to update \code{delta} at each
iteration.}

\item{convergence}{Numeric. The EM algorithm will be stopped if the maximum
difference in the contamination estimates between the previous and
current iterations is less than this. Default 0.001.}

\item{iterLogLik}{Integer. Calculate log likelihood every \code{iterLogLik}
iteration. Default 10.}

\item{varGenes}{Integer. The number of variable genes to use in
dimensionality reduction before clustering. Variability is calcualted using
\code{\link[scran]{modelGeneVar}} function from the 'scran' package.
Used only when z is not provided. Default 5000.}

\item{dbscanEps}{Numeric. The clustering resolution parameter
used in '\link[dbscan]{dbscan}' to estimate broad cell clusters.
Used only when z is not provided. Default 1.}

\item{seed}{Integer. Passed to \link[withr]{with_seed}. For reproducibility,
a default value of 12345 is used. If NULL, no calls to
\link[withr]{with_seed} are made.}

\item{logfile}{Character. Messages will be redirected to a file named
`logfile`. If NULL, messages will be printed to stdout.  Default NULL.}

\item{verbose}{Logical. Whether to print log messages. Default TRUE.}
}
\value{
A \link[SingleCellExperiment]{SingleCellExperiment} object with
 'decontX_Contamination' and 'decontX_Clusters' added to the
 \link{colData} slot. Additionally, the
decontaminated counts will be added as an assay called 'decontXCounts'.
}
\description{
A wrapper function for \link[celda]{decontX}. Identify
 potential contamination from experimental factors such as ambient RNA.
}
\examples{
data(scExample, package = "singleCellTK")
sce <- subsetSCECols(sce, colData = "type != 'EmptyDroplet'")
sce <- runDecontX(sce[,sample(ncol(sce),20)])
}