Bioconductor Code: qmtools

Raw Blame Patch Log History
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/clusterFeatures.R
\name{clusterFeatures}
\alias{clusterFeatures}
\title{Feature clustering}
\usage{
clusterFeatures(
  x,
  i,
  rtime_var = "rtime",
  rt_cut = 10,
  cor_cut = 0.7,
  rt_grouping = c("hclust", "closest", "consecutive"),
  cor_grouping = c("louvain", "SimilarityMatrix", "connected", "none"),
  cor_use = c("everything", "all.obs", "complete.obs", "na.or.complete",
    "pairwise.complete.obs"),
  cor_method = c("pearson", "kendall", "spearman"),
  log2 = FALSE,
  hclust_linkage = "complete"
)
}
\arguments{
\item{x}{A \linkS4class{SummarizedExperiment} object.}

\item{i}{A string or integer value specifying which assay values to use.}

\item{rtime_var}{A string specifying the name of variable containing a
numeric vector of retention times in \code{rowData(x)}.}

\item{rt_cut}{A numeric value specifying a cut-off for the retention-time
based feature grouping.}

\item{cor_cut}{A numeric value specifying a cut-off for the
correlation-based feature grouping.}

\item{rt_grouping}{A string specifying which method to use for the
retention-time based feature grouping.}

\item{cor_grouping}{A string specifying which method to use for the
correlation-based feature grouping.}

\item{cor_use}{A string specifying which method to compute correlations in
the presence of missing values. Refer to \code{?cor} for details.}

\item{cor_method}{A string specifying which correlation coefficient is to be
computed. See \code{?cor} for details.}

\item{log2}{A logical specifying whether feature intensities need to be
log2-transformed before calculating a correlation matrix.}

\item{hclust_linkage}{A string specifying the linkage method to be used when
\code{rt_grouping} is "hclust".}
}
\value{
A \linkS4class{SummarizedExperiment} object with the grouping
results added to columns "rtime_group" (initial grouping on retention
times) and "feature_group" in its \code{rowData}.
}
\description{
Function to cluster LC-MS features according to their retention time and
intensity correlation across samples with a
\linkS4class{SummarizedExperiment}.
}
\details{
For soft ionization methods (e.g., LC/ESI-MS) commonly used in metabolomics,
one or more ions could be generated from an individual compound upon
ionization. The redundancy of feature data needs to be addressed since we
typically interested in compounds rather than different ion species. This
function attempts to identify a group of features from the same compound
with the following steps:
\enumerate{
\item Features are grouped by their retention times to identify co-eluting
compounds.
\item For each retention time-based group, features are further clustered by
patterns of the intensity correlations across samples to identify a subset
of features from the same compound.
}

The retention time-based grouping is performed using either a hierarchical
clustering via \link{hclust} or the methods available in the \pkg{MsFeatures}
package via \link[MsFeatures:groupClosest]{MsFeatures::groupClosest} and \link[MsFeatures:groupConsecutive]{MsFeatures::groupConsecutive}.
For the \code{rt_grouping} = "hclust", by default, complete-linkage
clustering is conducted using the Manhattan distance (i.e., difference in
retention times) where the distance between two clusters is defined as the
difference in retention times between the farthest pair of elements in the
two clusters. Group memberships are assigned by specifying the cut height
for the distance metric. Other linkage methods can be specified with
\code{hclust_linkage}. Please refer to \code{?hclust} for details. For the
"closest" and "consecutive", please refer to
\code{?MsFeatures::groupClosest} and \code{?MsFeatures::groupConsecutive}
for the details of algorithms.

For the correlation-based grouping, \code{cor_grouping} = "connected"
creates a undirected graph using feature correlations as an adjacency matrix
(i.e., correlations serve as edge weights). The edges whose weights are
below the cut-off specified by \code{cor_cut} will be removed from the graph,
separating features into several disconnected subgroups. Features in the
same subgroup will be assigned to the same feature cluster. For the
"louvain", the function further applies the Louvain algorithm to the graph
in order to identify densely connected features via
\link[igraph:cluster_louvain]{igraph::cluster_louvain}. For the "SimilarityMatrix",
\link[MsFeatures:groupSimilarityMatrix]{MsFeatures::groupSimilarityMatrix} is used for feature grouping. Please
refer to \code{?MsFeatures::groupSimilarityMatrix} for the details of
algorithm.
}
\examples{

data(faahko_se)

se <- clusterFeatures(faahko_se, i = "knn_vsn", rtime_var = "rtmed")
rowData(se)[, c("rtmed", "rtime_group", "feature_group")]

}
\references{
Johannes Rainer (2022). MsFeatures: Functionality for Mass Spectrometry
Features. R package version 1.3.0.
'https://github.com/RforMassSpectrometry/MsFeatures

Vincent D. Blondel, Jean-Loup Guillaume, Renaud Lambiotte, Etienne Lefebvre:
Fast unfolding of communities in large networks. J. Stat. Mech. (2008)
P10008

Csardi G, Nepusz T: The igraph software package for complex network
research, InterJournal, Complex Systems 1695. 2006. https://igraph.org
}
\seealso{
See \link{hclust}, \link{cutree}, \link[MsFeatures:groupClosest]{MsFeatures::groupClosest},
\link[MsFeatures:groupConsecutive]{MsFeatures::groupConsecutive}, \link[MsFeatures:groupSimilarityMatrix]{MsFeatures::groupSimilarityMatrix}, and
\link[igraph:cluster_louvain]{igraph::cluster_louvain} for the underlying functions that do work.

See \link{plotRTgroup} to visualize the grouping result.
}