Bioconductor Code: qmtools

Browse code

Prototyped compareSamples function

Jaehyun Joo authored on 02/04/2023 18:38:22
Showing 5 changed files

NAMESPACE index b4f8649..6111119 100644
R/clusterFeatures.R index 89d8a4d..459850b 100644
R/compareSamples.R index 0000000..70186dd
man/clusterFeatures.Rd index c95198c..ee35239 100644
man/compareSamples.Rd index 0000000..ea0fa5d

History View file @ da9a8da

@@ -7,6 +7,7 @@ S3method(summary,reduced.pca)
                      S3method(summary,reduced.plsda)
                      S3method(summary,reduced.tsne)
                      export(clusterFeatures)
                     +export(compareSamples)
                      export(imputeIntensity)
                      export(imputeKNN)
                      export(normalizeIntensity)

R/clusterFeatures.R

History View file @ da9a8da

@@ -47,7 +47,7 @@
                      ##'
                      ##' @param x A \linkS4class{SummarizedExperiment} object.
                      ##' @param i A string or integer value specifying which assay values to use.
                     -##' @param rtime_var A string specifying the names of variable containing a
                     +##' @param rtime_var A string specifying the name of variable containing a
                      ##'   numeric vector of retention times in `rowData(x)`.
                      ##' @param rt_cut A numeric value specifying a cut-off for the retention-time
                      ##'   based feature grouping.

R/compareSamples.R

History View file @ da9a8da

                     new file mode 100644
@@ -0,0 +1,138 @@
                     +##' Sample comparison
                     +##'
                     +##' Function to make a comparisons between two groups in study samples with a
                     +##' \linkS4class{SummarizedExperiment}.
                     +##'
                     +##' This function provides a simplified interface of fitting a linear model to
                     +##' make a comparison of interest using the [limma::lmFit], [limma::eBayes], and
                     +##' [limma::topTable] functions. For more flexible model specification (e.g.,
                     +##' interaction model, multi-level model), please use a standard workflow
                     +##' outlined in the `limma` package manual.
                     +##'
                     +##' @param x A \linkS4class{SummarizedExperiment} object.
                     +##' @param i A string or integer value specifying which assay values to use. The
                     +##'   assay is expected to contain log-transformed intensities.
                     +##' @param group A string specifying the name of variable containing a class
                     +##'   label of each sample in `colData(x)`.
                     +##' @param class1,class2 A string specifying the class label of samples to be
                     +##'   compared. Must be one of `group` levels. No need to be specified if
                     +##'   `group` has only two levels. This function evaluates the contrast: class2
                     +##'   - class1.
                     +##' @param covariates A vector indicating the names of variables to be included
                     +##'   in the model as covariates. The covariates must be found in `colData(x)`.
                     +##' @param confint A logical specifying whether 95% confidence intervals of
                     +##'   log-fold-change need to be reported. Alternatively, a numeric value
                     +##'   between zero and one specifying the confidence level required.
                     +##' @param number The maximum number of metabolic features to list.
                     +##' @param adjust.method A string specifying which p-value adjustment method to
                     +##'   use. Options, in increasing conservatism, include "none", "BH", "BY" and
                     +##'   "holm". See [p.adjust] for the complete list of options. A NULL value will
                     +##'   result in the default adjustment method, which is "BH".
                     +##' @param sort.by A string specifying which statistic to rank the metabolic
                     +##'   features by. Possible values for topTable are "logFC", "AveExpr", "t",
                     +##'   "P", "p", "B" or "none" (Permitted synonyms are "M" for "logFC", "A" or
                     +##'   "Amean" for "AveExpr", "T" for "t" and "p" for "P").
                     +##' @param resort.by A string specifying statistic to sort the selected
                     +##'   metabolic features by in the output. Possibilities are the same as for
                     +##'   sort.by.
                     +##' @param p.value A numeric value specifying a cut-off for adjusted p-values.
                     +##'   Only metabolic features with lower p-values are listed.
                     +##' @param fc A numeric value specifying a minimum fold-change to be required.
                     +##'   If specified, the function output only includes metabolic features with
                     +##'   absolute fold-change greater than `fc`.
                     +##' @param lfc A numeric value specifying a minimum log2-fold-change required,
                     +##'   equal to log2(fc). `fc` and `lfc` are alternative ways to specify a
                     +##'   fold-change cut-off and, if both are specified, then `fc` take precedence.
                     +##' @param ... Additional arguments passed to [limma::eBayes].
                     +##' @return A data.frame with a row for the metabolic features and the following
                     +##'   columns:
                     +##' * logFC: an estimate of log-fold-change corresponding to the contrast tested
                     +##' * CI.L: a left limit of confidence interval for `logFC` (if `confint` is
                     +##' enabled)
                     +##' * CI.R: a right limit of confidence interval for `logFC` (if `confint` is
                     +##' enabled)
                     +##' * AveExpr: an average log-expression/abundance of metabolic features
                     +##' * t: a moderated t-statistic
                     +##' * P.Value: a raw p-value
                     +##' * adj.P.Value: an adjusted p-value
                     +##' * B: a log-odds that the metabolic feature is differentially expressed
                     +##'
                     +##' @references
                     +##'
                     +##' Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK. limma powers
                     +##' differential expression analyses for RNA-sequencing and microarray studies.
                     +##' Nucleic Acids Res. 2015 Apr 20;43(7):e47. doi: 10.1093/nar/gkv007. Epub 2015
                     +##' Jan 20. PMID: 25605792; PMCID: PMC4402510.
                     +##'
                     +##' @seealso
                     +##'
                     +##' See [limma::lmFit], [limma::eBayes], and [limma::topTable] for underlying
                     +##' functions that do work.
                     +##'
                     +##' @examples
                     +##'
                     +##' data(faahko_se)
                     +##'
                     +##' compareSamples(faahko_se, i = "knn_vsn", group = "sample_group", number = 5)
                     +##'
                     +##' @export
                     +compareSamples <- function(x, i, group,
                     +                           class1, class2, covariates = NULL, confint = TRUE,
                     +                           number = nrow(x), adjust.method = "BH",
                     +                           sort.by = "B", resort.by = NULL,
                     +                           p.value = 1, fc = NULL, lfc = NULL, ...) {
                     +  if (!is(x, "SummarizedExperiment")) {
                     +    stop("`x` must be a SummarizedExperiment object.")
                     +  }
                     +  ## Meta data describing the study samples
                     +  cdat <- colData(x)
                     +  ## Assert `group` is a column of colData(x)
                     +  if (!any(colnames(colData(x)) == group)) {
                     +    stop("Variable `", group, "` is not found in `colData(x)`." )
                     +  }
                     +  ## Convert `group` into a factor variable
                     +  if (!is.factor(cdat[[group]])) {
                     +    cdat[[group]] <- as.factor(cdat[[group]])
                     +  }
                     +  ## Two-level group does not have to specify class levels; instead, infer a
                     +  ## contrast to test.
                     +  if (any(missing(class1), missing(class2))) {
                     +    if (length(levels(cdat[[group]])) > 2) {
                     +      stop("`", group, "` has more than two levels.",
                     +           " Please specify class1 and class2 arguments.")
                     +    } else if (missing(class1) && !missing(class2)) {
                     +      class1 <- setdiff(levels(cdat[[group]]), class2)
                     +    } else if (!missing(class1) && missing(class2)) {
                     +      class2 <- setdiff(levels(cdat[[group]]), class1)
                     +    } else {
                     +      class1 <- levels(cdat[[group]])[1]
                     +      class2 <- levels(cdat[[group]])[2]
                     +    }
                     +  }
                     +  if (!all(c(class1, class2) %in% levels(cdat[[group]]))) {
                     +    stop(setdiff(c(class1, class2), levels(cdat[[group]])),
                     +         " is not the levels of '", group, "'")
                     +  }
                     +  message(paste0("Contrast: ", class2, " - ", class1))
                     +  ## Check covariates are found in cdat
                     +  if (!all(covariates %in% colnames(cdat))) {
                     +    stop("`colData(x)` does not have a column(s): ",
                     +         paste(setdiff(covariates, colnames(cdat)), collapse = ", "))
                     +  }
                     +  ## Model formula
                     +  fm <- as.formula(paste(c("~ 0", group, covariates), collapse = " + "))
                     +  ## Design matrix (group mean specification)
                     +  dm <- model.matrix(fm, data = cdat)
                     +  colnames(dm) <- sub(group, "", colnames(dm))
                     +  contrast <- limma::makeContrasts(contrasts = paste(class2, class1, sep = " - "), levels = dm)
                     +  ## Fit model
                     +  fit <- limma::lmFit(assay(x, i), design = dm)
                     +  fit <- limma::contrasts.fit(fit, contrasts = contrast)
                     +  fit <- limma::eBayes(fit, ...)
                     +  ## Output
                     +  limma::topTable(
                     +           fit, number = number, adjust.method = adjust.method,
                     +           sort.by = sort.by, resort.by = resort.by, p.value = p.value,
                     +           fc = fc, lfc = lfc, confint = confint
                     +         )
                     +}

man/clusterFeatures.Rd

History View file @ da9a8da

@@ -24,7 +24,7 @@ clusterFeatures(
                      \item{i}{A string or integer value specifying which assay values to use.}
                     -\item{rtime_var}{A string specifying the names of variable containing a
                     +\item{rtime_var}{A string specifying the name of variable containing a
                      numeric vector of retention times in \code{rowData(x)}.}
                      \item{rt_cut}{A numeric value specifying a cut-off for the retention-time

man/compareSamples.Rd

History View file @ da9a8da

                     new file mode 100644
@@ -0,0 +1,120 @@
                     +% Generated by roxygen2: do not edit by hand
                     +% Please edit documentation in R/compareSamples.R
                     +\name{compareSamples}
                     +\alias{compareSamples}
                     +\title{Sample comparison}
                     +\usage{
                     +compareSamples(
                     +  x,
                     +  i,
                     +  group,
                     +  class1,
                     +  class2,
                     +  covariates = NULL,
                     +  confint = FALSE,
                     +  number = nrow(x),
                     +  adjust.method = "BH",
                     +  sort.by = "B",
                     +  resort.by = NULL,
                     +  p.value = 1,
                     +  fc = NULL,
                     +  lfc = NULL,
                     +  ...
                     +)
                     +}
                     +\arguments{
                     +\item{x}{A \linkS4class{SummarizedExperiment} object.}
+                    +
                     +\item{i}{A string or integer value specifying which assay values to use. The
                     +assay is expected to contain log-transformed intensities.}
+                    +
                     +\item{group}{A string specifying the name of variable containing a class
                     +label of each sample in \code{colData(x)}.}
+                    +
                     +\item{class1, class2}{A string specifying the class label of samples to be
                     +compared. Must be one of \code{group} levels. No need to be specified if
                     +\code{group} has only two levels. This function evaluates the contrast: class2
                     +\itemize{
                     +\item class1.
                     +}}
+                    +
                     +\item{covariates}{A vector indicating the names of variables to be included
                     +in the model as covariates. The covariates must be found in \code{colData(x)}.}
+                    +
                     +\item{confint}{A logical specifying whether 95\% confidence intervals of
                     +log-fold-change need to be reported. Alternatively, a numeric value
                     +between zero and one specifying the confidence level required.}
+                    +
                     +\item{number}{The maximum number of metabolic features to list.}
+                    +
                     +\item{adjust.method}{A string specifying which p-value adjustment method to
                     +use. Options, in increasing conservatism, include "none", "BH", "BY" and
                     +"holm". See \link{p.adjust} for the complete list of options. A NULL value will
                     +result in the default adjustment method, which is "BH".}
+                    +
                     +\item{sort.by}{A string specifying which statistic to rank the metabolic
                     +features by. Possible values for topTable are "logFC", "AveExpr", "t",
                     +"P", "p", "B" or "none" (Permitted synonyms are "M" for "logFC", "A" or
                     +"Amean" for "AveExpr", "T" for "t" and "p" for "P").}
+                    +
                     +\item{resort.by}{A string specifying statistic to sort the selected
                     +metabolic features by in the output. Possibilities are the same as for
                     +sort.by.}
+                    +
                     +\item{p.value}{A numeric value specifying a cut-off for adjusted p-values.
                     +Only metabolic features with lower p-values are listed.}
+                    +
                     +\item{fc}{A numeric value specifying a minimum fold-change to be required.
                     +If specified, the function output only includes metabolic features with
                     +absolute fold-change greater than \code{fc}.}
+                    +
                     +\item{lfc}{A numeric value specifying a minimum log2-fold-change required,
                     +equal to log2(fc). \code{fc} and \code{lfc} are alternative ways to specify a
                     +fold-change cut-off and, if both are specified, then \code{fc} take precedence.}
+                    +
                     +\item{...}{Additional arguments passed to \link[limma:ebayes]{limma::eBayes}.}
                     +}
                     +\value{
                     +A data.frame with a row for the metabolic features and the following
                     +columns:
                     +\itemize{
                     +\item logFC: an estimate of log-fold-change corresponding to the contrast tested
                     +\item CI.L: a left limit of confidence interval for \code{logFC} (if \code{confint} is
                     +enabled)
                     +\item CI.R: a right limit of confidence interval for \code{logFC} (if \code{confint} is
                     +enabled)
                     +\item AveExpr: an average log-expression/abundance of metabolic features
                     +\item t: a moderated t-statistic
                     +\item P.Value: a raw p-value
                     +\item adj.P.Value: an adjusted p-value
                     +\item B: a log-odds that the metabolic feature is differentially expressed
                     +}
                     +}
                     +\description{
                     +Function to make a comparisons between two groups in study samples with a
                     +\linkS4class{SummarizedExperiment}.
                     +}
                     +\details{
                     +This function provides a simplified interface of fitting a linear model to
                     +make a comparison of interest using the \link[limma:lmFit]{limma::lmFit}, \link[limma:ebayes]{limma::eBayes}, and
                     +\link[limma:toptable]{limma::topTable} functions. For more flexible model specification (e.g.,
                     +interaction model, multi-level model), please use a standard workflow
                     +outlined in the \code{limma} package manual.
                     +}
                     +\examples{
+                    +
                     +data(faahko_se)
+                    +
                     +compareSamples(faahko_se, i = "knn_vsn", group = "sample_group", number = 5)
+                    +
                     +}
                     +\references{
                     +Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK. limma powers
                     +differential expression analyses for RNA-sequencing and microarray studies.
                     +Nucleic Acids Res. 2015 Apr 20;43(7):e47. doi: 10.1093/nar/gkv007. Epub 2015
                     +Jan 20. PMID: 25605792; PMCID: PMC4402510.
                     +}
                     +\seealso{
                     +See \link[limma:lmFit]{limma::lmFit}, \link[limma:ebayes]{limma::eBayes}, and \link[limma:toptable]{limma::topTable} for underlying
                     +functions that do work.
                     +}