... | ... |
@@ -72,10 +72,12 @@ |
72 | 72 |
|
73 | 73 |
- Pre-release Bioconductor version. |
74 | 74 |
|
75 |
-## Version 0.99.1 (2023-04-24) |
|
75 |
+## Version 0.99.1 (2023-04-25) |
|
76 | 76 |
|
77 |
- - BioPlanet database vanished from internet and there is no sign of it coming back. Removing all BioPlanet-relaed code and replacing BioPlanet with GO in the vignette and examples (this, alas, makes it longer to check). |
|
78 |
- - OK, it is back, but I keep GO examples and vinettes. |
|
77 |
+ - BioPlanet database vanished from internet and there is no sign of it coming back. Removing all BioPlanet-related code and replacing BioPlanet with GO in the vignette and examples (this, alas, makes it longer to check). |
|
78 |
+ - OK, it is back, but I keep GO examples and vignettes. |
|
79 |
+ - Minor improvements to documentation. |
|
80 |
+ |
|
79 | 81 |
|
80 | 82 |
|
81 | 83 |
|
... | ... |
@@ -1,38 +1,39 @@ |
1 |
-#' Prepare term data for enrichment analysis |
|
1 |
+#' Prepare Term Data for Enrichment Analysis |
|
2 | 2 |
#' |
3 |
-#' Prepare term data downloaded with \code{fetch_*} functions for fast |
|
4 |
-#' enrichment analysis. |
|
3 |
+#' Process term data downloaded with the \code{fetch_*} functions, preparing it |
|
4 |
+#' for fast enrichment analysis using \code{functional_enrichment}. |
|
5 | 5 |
#' |
6 | 6 |
#' @details |
7 | 7 |
#' |
8 |
-#' Takes two tibbles with functional term information (\code{terms}) and |
|
9 |
-#' feature mapping (\code{mapping}) and converts them into an object required by |
|
10 |
-#' \code{functional_enrichment} for fast analysis. Terms and mapping can be |
|
11 |
-#' created with database access functions in this package, for example |
|
12 |
-#' \code{fetch_reactome} or \code{fetch_go_from_go}. |
|
8 |
+#' This function takes two tibbles containing functional term information |
|
9 |
+#' (\code{terms}) and feature mapping (\code{mapping}), and converts them into |
|
10 |
+#' an object required by \code{functional_enrichment} for efficient analysis. |
|
11 |
+#' Terms and mapping can be generated with the database access functions |
|
12 |
+#' included in this package, such as \code{fetch_reactome} or |
|
13 |
+#' \code{fetch_go_from_go}. |
|
13 | 14 |
#' |
14 | 15 |
#' @param terms A tibble with at least two columns: \code{term_id} and |
15 |
-#' \code{term_name}. Contains information about functional term |
|
16 |
-#' names/descriptions. |
|
17 |
-#' @param mapping A tibble with at least two columns, containing mapping between |
|
18 |
-#' functional terms and features. One column needs to be called \code{term_id} |
|
19 |
-#' and the other column has a name specified by \code{feature_name} argument. |
|
20 |
-#' For example, if \code{mapping} contains columns \code{term_id}, |
|
21 |
-#' \code{accession_number} and \code{gene_symbol} then setting |
|
22 |
-#' \code{feature_name = "gene_symbol"} indicates that gene symbols will be |
|
23 |
-#' used for enrichment. |
|
24 |
-#' @param all_features A vector with all feature ids used as background for |
|
16 |
+#' \code{term_name}. This tibble contains information about functional term |
|
17 |
+#' names and descriptions. |
|
18 |
+#' @param mapping A tibble with at least two columns, containing the mapping |
|
19 |
+#' between functional terms and features. One column must be named |
|
20 |
+#' \code{term_id}, while the other column should have a name specified by the |
|
21 |
+#' \code{feature_name} argument. For example, if \code{mapping} contains |
|
22 |
+#' columns \code{term_id}, \code{accession_number}, and \code{gene_symbol}, |
|
23 |
+#' setting \code{feature_name = "gene_symbol"} indicates that gene symbols |
|
24 |
+#' will be used for enrichment analysis. |
|
25 |
+#' @param all_features A vector with all feature IDs used as the background for |
|
25 | 26 |
#' enrichment. If not specified, all features found in \code{mapping} will be |
26 | 27 |
#' used, resulting in a larger object size. |
27 |
-#' @param feature_name The name of the column in \code{mapping} tibble to be |
|
28 |
-#' used as feature.For example, if \code{mapping} contains columns \code{term_id}, |
|
29 |
-#' \code{accession_number} and \code{gene_symbol} then setting |
|
30 |
-#' \code{feature_name = "gene_symbol"} indicates that gene symbols will be |
|
31 |
-#' used for enrichment. |
|
28 |
+#' @param feature_name The name of the column in the \code{mapping} tibble to be |
|
29 |
+#' used as the feature identifier. For example, if \code{mapping} contains |
|
30 |
+#' columns \code{term_id}, \code{accession_number}, and \code{gene_symbol}, |
|
31 |
+#' setting \code{feature_name = "gene_symbol"} indicates that gene symbols |
|
32 |
+#' will be used for enrichment analysis. |
|
32 | 33 |
#' |
33 |
-#' |
|
34 |
-#' @return An object class \code{fenr_terms} required by |
|
34 |
+#' @return An object of class \code{fenr_terms} required by |
|
35 | 35 |
#' \code{functional_enrichment}. |
36 |
+#' @importFrom assertthat assert_that |
|
36 | 37 |
#' @export |
37 | 38 |
#' @examples |
38 | 39 |
#' data(exmpl_all) |
... | ... |
@@ -43,35 +44,32 @@ prepare_for_enrichment <- function(terms, mapping, all_features = NULL, feature_ |
43 | 44 |
feature_id <- term_id <- NULL |
44 | 45 |
|
45 | 46 |
# Argument checks |
46 |
- if (!is.data.frame(terms) && !tibble::is_tibble(terms)) { |
|
47 |
- stop("'terms' must be a data frame or tibble.") |
|
48 |
- } |
|
47 |
+ assert_that(is.data.frame(terms) || tibble::is_tibble(terms), |
|
48 |
+ msg = "'terms' must be a data frame or tibble.") |
|
49 | 49 |
|
50 |
- if (!is.data.frame(mapping) && !tibble::is_tibble(mapping)) { |
|
51 |
- stop("'mapping' must be a data frame or tibble.") |
|
52 |
- } |
|
50 |
+ assert_that(is.data.frame(mapping) || tibble::is_tibble(mapping), |
|
51 |
+ msg = "'mapping' must be a data frame or tibble.") |
|
53 | 52 |
|
54 |
- if (!is.null(all_features) && !is.vector(all_features)) { |
|
55 |
- stop("'all_features' must be a vector or NULL.") |
|
56 |
- } |
|
53 |
+ assert_that(is.null(all_features) || is.vector(all_features), |
|
54 |
+ msg = "'all_features' must be a vector or NULL.") |
|
57 | 55 |
|
58 |
- if (!is.character(feature_name) || length(feature_name) != 1) { |
|
59 |
- stop("'feature_name' must be a single string.") |
|
60 |
- } |
|
56 |
+ assert_that(is.character(feature_name) && length(feature_name) == 1, |
|
57 |
+ msg = "'feature_name' must be a single string.") |
|
61 | 58 |
|
62 | 59 |
# Check terms |
63 |
- if (!all(c("term_id", "term_name") %in% colnames(terms))) |
|
64 |
- stop("Column names in 'terms' should be 'term_id' and 'term_name'.") |
|
65 |
- if(anyDuplicated(terms$term_id) > 0) |
|
66 |
- stop("Duplicated term_id detected in 'terms'.") |
|
60 |
+ assert_that(all(c("term_id", "term_name") %in% colnames(terms)), |
|
61 |
+ msg = "Column names in 'terms' should be 'term_id' and 'term_name'.") |
|
62 |
+ |
|
63 |
+ assert_that(anyDuplicated(terms$term_id) == 0, |
|
64 |
+ msg = "Duplicated term_id detected in 'terms'.") |
|
67 | 65 |
|
68 | 66 |
# Check mapping |
69 |
- if (!("term_id" %in% colnames(mapping))) |
|
70 |
- stop("'mapping' should contain a column named 'term_id'.") |
|
67 |
+ assert_that("term_id" %in% colnames(mapping), |
|
68 |
+ msg = "'mapping' should contain a column named 'term_id'.") |
|
71 | 69 |
|
72 | 70 |
# Check for feature name |
73 |
- if (!(feature_name %in% colnames(mapping))) |
|
74 |
- stop(feature_name, " column not found in mapping table. Check feature_name argument.") |
|
71 |
+ assert_that(feature_name %in% colnames(mapping), |
|
72 |
+ msg = paste0(feature_name, " column not found in mapping table. Check 'feature_name' argument.")) |
|
75 | 73 |
|
76 | 74 |
# Replace empty all_features with everything from mapping |
77 | 75 |
map_features <- mapping[[feature_name]] |> |
... | ... |
@@ -134,37 +132,39 @@ prepare_for_enrichment <- function(terms, mapping, all_features = NULL, feature_ |
134 | 132 |
} |
135 | 133 |
|
136 | 134 |
|
137 |
-#' Fast functional enrichment |
|
138 |
-#' |
|
139 |
-#' Fast functional enrichment based on hypergeometric distribution. Can be used |
|
140 |
-#' in interactive applications. |
|
135 |
+ |
|
136 |
+#' Fast Functional Enrichment |
|
141 | 137 |
#' |
142 |
-#' @details |
|
138 |
+#' Perform fast functional enrichment analysis based on the hypergeometric |
|
139 |
+#' distribution. Designed for use in interactive applications. |
|
143 | 140 |
#' |
144 |
-#' Functional enrichment in a selection (e.g. differentially expressed genes) of |
|
145 |
-#' features, using hypergeometric probability (that is, Fisher's exact test). A |
|
146 |
-#' feature can be a gene, protein, etc. \code{term_data} is an object with |
|
147 |
-#' functional term information and feature-term mapping |
|
141 |
+#' @details This function carries out functional enrichment analysis on a |
|
142 |
+#' selection of features (e.g., differentially expressed genes) using the |
|
143 |
+#' hypergeometric probability distribution (Fisher's exact test). Features can |
|
144 |
+#' be genes, proteins, etc. The \code{term_data} object contains functional |
|
145 |
+#' term information and feature-term mapping. |
|
148 | 146 |
#' |
149 |
-#' @param feat_all A character vector with all feature identifiers. This is the |
|
150 |
-#' background for enrichment. |
|
147 |
+#' @param feat_all A character vector with all feature identifiers, serving as |
|
148 |
+#' the background for enrichment. |
|
151 | 149 |
#' @param feat_sel A character vector with feature identifiers in the selection. |
152 |
-#' @param term_data An object class \code{fenr_terms}, created by |
|
150 |
+#' @param term_data An object of class \code{fenr_terms}, created by |
|
153 | 151 |
#' \code{prepare_for_enrichment}. |
154 |
-#' @param feat2name An optional named list to convert feature ids into feature |
|
152 |
+#' @param feat2name An optional named list to convert feature IDs into feature |
|
155 | 153 |
#' names. |
156 | 154 |
#' |
157 |
-#' @return A tibble with enrichment results. For each term the following |
|
158 |
-#' quantities are reported: \itemize{ \item{\code{N_with} - number of features |
|
159 |
-#' with this term among all features} \item{\code{n_with_sel} - number |
|
160 |
-#' of features with this term in the selection} \item{\code{n_expect} - |
|
161 |
-#' expected number of features with this term in the selection, under the null |
|
162 |
-#' hypothesis that terms are mapped to features randomly} |
|
163 |
-#' \item{\code{enrichment} - ratio of n_with_sel / n_expect} |
|
164 |
-#' \item{\code{odds_ratio} - odds ratio for enrichment; is infinite, when all |
|
165 |
-#' features with the given term are in the selection} \item{\code{p_value} - |
|
166 |
-#' p-value from a single hypergeometric test} \item{\code{p_adjust} - p-value |
|
167 |
-#' adjusted for multiple tests using Benjamini-Hochberg approach}}. |
|
155 |
+#' @return A tibble with enrichment results, providing the following information |
|
156 |
+#' for each term: |
|
157 |
+#' \itemize{ |
|
158 |
+#' \item{\code{N_with} - number of features with this term among all features} |
|
159 |
+#' \item{\code{n_with_sel} - number of features with this term in the selection} |
|
160 |
+#' \item{\code{n_expect} - expected number of features with this term in the selection, |
|
161 |
+#' under the null hypothesis that terms are mapped to features randomly} |
|
162 |
+#' \item{\code{enrichment} - ratio of n_with_sel / n_expect} |
|
163 |
+#' \item{\code{odds_ratio} - odds ratio for enrichment; is infinite when all |
|
164 |
+#' features with the given term are in the selection} |
|
165 |
+#' \item{\code{p_value} - p-value from a single hypergeometric test} |
|
166 |
+#' \item{\code{p_adjust} - p-value adjusted for multiple tests using the Benjamini-Hochberg approach} |
|
167 |
+#' }. |
|
168 | 168 |
#' |
169 | 169 |
#' @importFrom assertthat assert_that |
170 | 170 |
#' @importFrom methods is |
... | ... |
@@ -2,44 +2,47 @@ |
2 | 2 |
% Please edit documentation in R/enrichment.R |
3 | 3 |
\name{functional_enrichment} |
4 | 4 |
\alias{functional_enrichment} |
5 |
-\title{Fast functional enrichment} |
|
5 |
+\title{Fast Functional Enrichment} |
|
6 | 6 |
\usage{ |
7 | 7 |
functional_enrichment(feat_all, feat_sel, term_data, feat2name = NULL) |
8 | 8 |
} |
9 | 9 |
\arguments{ |
10 |
-\item{feat_all}{A character vector with all feature identifiers. This is the |
|
11 |
-background for enrichment.} |
|
10 |
+\item{feat_all}{A character vector with all feature identifiers, serving as |
|
11 |
+the background for enrichment.} |
|
12 | 12 |
|
13 | 13 |
\item{feat_sel}{A character vector with feature identifiers in the selection.} |
14 | 14 |
|
15 |
-\item{term_data}{An object class \code{fenr_terms}, created by |
|
15 |
+\item{term_data}{An object of class \code{fenr_terms}, created by |
|
16 | 16 |
\code{prepare_for_enrichment}.} |
17 | 17 |
|
18 |
-\item{feat2name}{An optional named list to convert feature ids into feature |
|
18 |
+\item{feat2name}{An optional named list to convert feature IDs into feature |
|
19 | 19 |
names.} |
20 | 20 |
} |
21 | 21 |
\value{ |
22 |
-A tibble with enrichment results. For each term the following |
|
23 |
- quantities are reported: \itemize{ \item{\code{N_with} - number of features |
|
24 |
- with this term among all features} \item{\code{n_with_sel} - number |
|
25 |
- of features with this term in the selection} \item{\code{n_expect} - |
|
26 |
- expected number of features with this term in the selection, under the null |
|
27 |
- hypothesis that terms are mapped to features randomly} |
|
28 |
- \item{\code{enrichment} - ratio of n_with_sel / n_expect} |
|
29 |
- \item{\code{odds_ratio} - odds ratio for enrichment; is infinite, when all |
|
30 |
- features with the given term are in the selection} \item{\code{p_value} - |
|
31 |
- p-value from a single hypergeometric test} \item{\code{p_adjust} - p-value |
|
32 |
- adjusted for multiple tests using Benjamini-Hochberg approach}}. |
|
22 |
+A tibble with enrichment results, providing the following information |
|
23 |
+ for each term: |
|
24 |
+ \itemize{ |
|
25 |
+ \item{\code{N_with} - number of features with this term among all features} |
|
26 |
+ \item{\code{n_with_sel} - number of features with this term in the selection} |
|
27 |
+ \item{\code{n_expect} - expected number of features with this term in the selection, |
|
28 |
+ under the null hypothesis that terms are mapped to features randomly} |
|
29 |
+ \item{\code{enrichment} - ratio of n_with_sel / n_expect} |
|
30 |
+ \item{\code{odds_ratio} - odds ratio for enrichment; is infinite when all |
|
31 |
+ features with the given term are in the selection} |
|
32 |
+ \item{\code{p_value} - p-value from a single hypergeometric test} |
|
33 |
+ \item{\code{p_adjust} - p-value adjusted for multiple tests using the Benjamini-Hochberg approach} |
|
34 |
+ }. |
|
33 | 35 |
} |
34 | 36 |
\description{ |
35 |
-Fast functional enrichment based on hypergeometric distribution. Can be used |
|
36 |
-in interactive applications. |
|
37 |
+Perform fast functional enrichment analysis based on the hypergeometric |
|
38 |
+distribution. Designed for use in interactive applications. |
|
37 | 39 |
} |
38 | 40 |
\details{ |
39 |
-Functional enrichment in a selection (e.g. differentially expressed genes) of |
|
40 |
-features, using hypergeometric probability (that is, Fisher's exact test). A |
|
41 |
-feature can be a gene, protein, etc. \code{term_data} is an object with |
|
42 |
-functional term information and feature-term mapping |
|
41 |
+This function carries out functional enrichment analysis on a |
|
42 |
+ selection of features (e.g., differentially expressed genes) using the |
|
43 |
+ hypergeometric probability distribution (Fisher's exact test). Features can |
|
44 |
+ be genes, proteins, etc. The \code{term_data} object contains functional |
|
45 |
+ term information and feature-term mapping. |
|
43 | 46 |
} |
44 | 47 |
\examples{ |
45 | 48 |
data(exmpl_all, exmpl_sel) |
... | ... |
@@ -2,7 +2,7 @@ |
2 | 2 |
% Please edit documentation in R/enrichment.R |
3 | 3 |
\name{prepare_for_enrichment} |
4 | 4 |
\alias{prepare_for_enrichment} |
5 |
-\title{Prepare term data for enrichment analysis} |
|
5 |
+\title{Prepare Term Data for Enrichment Analysis} |
|
6 | 6 |
\usage{ |
7 | 7 |
prepare_for_enrichment( |
8 | 8 |
terms, |
... | ... |
@@ -13,41 +13,42 @@ prepare_for_enrichment( |
13 | 13 |
} |
14 | 14 |
\arguments{ |
15 | 15 |
\item{terms}{A tibble with at least two columns: \code{term_id} and |
16 |
-\code{term_name}. Contains information about functional term |
|
17 |
-names/descriptions.} |
|
16 |
+\code{term_name}. This tibble contains information about functional term |
|
17 |
+names and descriptions.} |
|
18 | 18 |
|
19 |
-\item{mapping}{A tibble with at least two columns, containing mapping between |
|
20 |
-functional terms and features. One column needs to be called \code{term_id} |
|
21 |
-and the other column has a name specified by \code{feature_name} argument. |
|
22 |
-For example, if \code{mapping} contains columns \code{term_id}, |
|
23 |
-\code{accession_number} and \code{gene_symbol} then setting |
|
24 |
-\code{feature_name = "gene_symbol"} indicates that gene symbols will be |
|
25 |
-used for enrichment.} |
|
19 |
+\item{mapping}{A tibble with at least two columns, containing the mapping |
|
20 |
+between functional terms and features. One column must be named |
|
21 |
+\code{term_id}, while the other column should have a name specified by the |
|
22 |
+\code{feature_name} argument. For example, if \code{mapping} contains |
|
23 |
+columns \code{term_id}, \code{accession_number}, and \code{gene_symbol}, |
|
24 |
+setting \code{feature_name = "gene_symbol"} indicates that gene symbols |
|
25 |
+will be used for enrichment analysis.} |
|
26 | 26 |
|
27 |
-\item{all_features}{A vector with all feature ids used as background for |
|
27 |
+\item{all_features}{A vector with all feature IDs used as the background for |
|
28 | 28 |
enrichment. If not specified, all features found in \code{mapping} will be |
29 | 29 |
used, resulting in a larger object size.} |
30 | 30 |
|
31 |
-\item{feature_name}{The name of the column in \code{mapping} tibble to be |
|
32 |
-used as feature.For example, if \code{mapping} contains columns \code{term_id}, |
|
33 |
-\code{accession_number} and \code{gene_symbol} then setting |
|
34 |
-\code{feature_name = "gene_symbol"} indicates that gene symbols will be |
|
35 |
-used for enrichment.} |
|
31 |
+\item{feature_name}{The name of the column in the \code{mapping} tibble to be |
|
32 |
+used as the feature identifier. For example, if \code{mapping} contains |
|
33 |
+columns \code{term_id}, \code{accession_number}, and \code{gene_symbol}, |
|
34 |
+setting \code{feature_name = "gene_symbol"} indicates that gene symbols |
|
35 |
+will be used for enrichment analysis.} |
|
36 | 36 |
} |
37 | 37 |
\value{ |
38 |
-An object class \code{fenr_terms} required by |
|
38 |
+An object of class \code{fenr_terms} required by |
|
39 | 39 |
\code{functional_enrichment}. |
40 | 40 |
} |
41 | 41 |
\description{ |
42 |
-Prepare term data downloaded with \code{fetch_*} functions for fast |
|
43 |
-enrichment analysis. |
|
42 |
+Process term data downloaded with the \code{fetch_*} functions, preparing it |
|
43 |
+for fast enrichment analysis using \code{functional_enrichment}. |
|
44 | 44 |
} |
45 | 45 |
\details{ |
46 |
-Takes two tibbles with functional term information (\code{terms}) and |
|
47 |
-feature mapping (\code{mapping}) and converts them into an object required by |
|
48 |
-\code{functional_enrichment} for fast analysis. Terms and mapping can be |
|
49 |
-created with database access functions in this package, for example |
|
50 |
-\code{fetch_reactome} or \code{fetch_go_from_go}. |
|
46 |
+This function takes two tibbles containing functional term information |
|
47 |
+(\code{terms}) and feature mapping (\code{mapping}), and converts them into |
|
48 |
+an object required by \code{functional_enrichment} for efficient analysis. |
|
49 |
+Terms and mapping can be generated with the database access functions |
|
50 |
+included in this package, such as \code{fetch_reactome} or |
|
51 |
+\code{fetch_go_from_go}. |
|
51 | 52 |
} |
52 | 53 |
\examples{ |
53 | 54 |
data(exmpl_all) |
... | ... |
@@ -7,7 +7,7 @@ output: |
7 | 7 |
toc_float: true |
8 | 8 |
css: style.css |
9 | 9 |
abstract: | |
10 |
- `fenr` performs functional enrichment analysis quickly, typically in a fraction of a second, making it ideal for interactive applications, e.g. Shiny apps. To achieve this, `fenr` downloads functional data (e.g. GO terms of KEGG pathways) in advance, storing them in a format designed for fast analysis of any arbitrary selection of features (genes or proteins). |
|
10 |
+ The `fenr` R package enables rapid functional enrichment analysis, typically completing in a fraction of a second, which makes it well-suited for interactive applications, such as Shiny apps. To accomplish this, fenr pre-downloads functional data (e.g., GO terms or KEGG pathways) and stores them in a format optimized for swift analysis of any arbitrary selection of features, including genes or proteins. |
|
11 | 11 |
vignette: > |
12 | 12 |
%\VignetteIndexEntry{Fast functional enrichment} |
13 | 13 |
%\VignetteEngine{knitr::rmarkdown} |
... | ... |
@@ -24,26 +24,25 @@ knitr::opts_chunk$set( |
24 | 24 |
|
25 | 25 |
# Purpose |
26 | 26 |
|
27 |
-Functional enrichment determines whether some biological functions or pathways are enriched in a selection of features (genes, proteins etc.). The selection often comes from differential expression analysis, while functions and pathways are obtained from databases as *GO*, *Reactome* or *KEGG*. At its simplest, enrichment analysis tells us if a given function is enriched in the selection based on Fisher's test. The null hypothesis is that the proportion of features annotated with that function is the same among selected and non-selected features. |
|
27 |
+Functional enrichment analysis determines if specific biological functions or pathways are overrepresented in a set of features (e.g., genes, proteins). These sets often originate from differential expression analysis, while the functions and pathways are derived from databases such as *GO*, *Reactome*, or *KEGG*. In its simplest form, enrichment analysis employs Fisher's test to evaluate if a given function is enriched in the selection. The null hypothesis asserts that the proportion of features annotated with that function is the same between selected and non-selected features. |
|
28 | 28 |
|
29 |
-Performing functional enrichment involves downloading large data sets from the aforementioned databases before the actual analysis is done. Downloading data takes time, while Fisher's test can be performed quickly. The purpose of this package is to separate the two and allow for fast enrichment analysis for a given database on various selections of features. It is designed with interactive applications, like Shiny, in mind. A small Shiny app is included in the package to demonstrate usage of `fenr`. |
|
29 |
+Functional enrichment analysis requires downloading large datasets from the aforementioned databases before conducting the actual analysis. While downloading data is time-consuming, Fisher's test can be performed rapidly. This package aims to separate these two steps, enabling fast enrichment analysis for various feature selections using a given database. It is specifically designed for interactive applications like Shiny. A small Shiny app, included in the package, demonstrates the usage of `fenr`. |
|
30 | 30 |
|
31 | 31 |
## Caveats |
32 | 32 |
|
33 |
-Functional enrichment is not the final answer about biology. Quite often is does not give any answer about biology. In particular, when arbitrary groups of genes are selected, enrichment tells us only about simplified statistical overrepresentation of a functional term in the selection. Statistics does not equal biology. This package is meant to be only a tool to explore data and search for clues. Any further statements about biology need independent validation. |
|
34 |
- |
|
33 |
+Functional enrichment analysis should not be considered the ultimate answer in understanding biological systems. In many instances, it may not provide clear insights into biology. Specifically, when arbitrary groups of genes are selected, enrichment analysis only reveals the statistical overrepresentation of a functional term within the selection, which may not directly correspond to biological relevance. This package serves as a tool for data exploration; any conclusions drawn about biology require independent validation and further investigation. |
|
35 | 34 |
|
36 | 35 |
# Installation |
37 | 36 |
|
38 | 37 |
`fenr` can be installed from GitHub (you need to install `remotes` package first). |
39 | 38 |
|
40 |
-``` |
|
39 |
+```{r install, eval=FALSE} |
|
41 | 40 |
remotes::install_github("bartongroup/fenr", build_vignettes = TRUE) |
42 | 41 |
``` |
43 | 42 |
|
44 | 43 |
# Example |
45 | 44 |
|
46 |
-Package `fenr` and example data are loaded with |
|
45 |
+Package `fenr` and example data are loaded with the following commands: |
|
47 | 46 |
|
48 | 47 |
```{r load_fenr} |
49 | 48 |
library(fenr) |
... | ... |
@@ -52,13 +51,13 @@ data(exmpl_all, exmpl_sel) |
52 | 51 |
|
53 | 52 |
## Data preparation |
54 | 53 |
|
55 |
-The first step is to download functional term data. `fenr` supports downloads from *Gene Ontology*, *Reactome*, *KEGG* and *WikiPathways*. Other ontologies can be used as long as they are converted into a suitable format (see function `prepare_for_enrichment` for details). The following command downloads functional terms and gene mapping from Gene Ontology (GO): |
|
54 |
+The initial step involves downloading functional term data. `fenr` supports data downloads from *Gene Ontology*, *Reactome*, *KEGG*, *BioPlanet*, and *WikiPathways*. Custom ontologies can also be used, provided they are converted into an appropriate format (refer to the `prepare_for_enrichment` function for more information). The command below downloads functional terms and gene mapping from Gene Ontology (GO): |
|
56 | 55 |
|
57 | 56 |
```{r fetch_go} |
58 | 57 |
go <- fetch_go(species = "sgd") |
59 | 58 |
``` |
60 | 59 |
|
61 |
-This is a list with two tibbles. The first tibble contains term information: |
|
60 |
+This command returns a list with two tibbles. The first tibble contains term information: |
|
62 | 61 |
|
63 | 62 |
```{r go_terms} |
64 | 63 |
go$terms |
... | ... |
@@ -70,23 +69,22 @@ The second tibble contains gene-term mapping: |
70 | 69 |
go$mapping |
71 | 70 |
``` |
72 | 71 |
|
73 |
-Next, these user-friendly data need to be converted into machine-friendly object suitable for fast functional enrichment with the following function: |
|
72 |
+To make these user-friendly data more suitable for rapid functional enrichment analysis, they need to be converted into a machine-friendly object using the following function: |
|
74 | 73 |
|
75 | 74 |
```{r prepare_for_enrichment} |
76 | 75 |
go_terms <- prepare_for_enrichment(go$terms, go$mapping, exmpl_all, feature_name = "gene_symbol") |
77 | 76 |
``` |
78 | 77 |
|
79 |
-`exmpl_all` is an example of gene background - a vector with gene symbols related to all detections in an imaginary RNA-seq experiment. As different datasets use different features (gene id, gene symbol, protein id), the column name containing features in `go$mapping` needs to be specified with `feature_name = "gene_symbol"`. The result, `go_terms`, is a data structure containing all the mappings in a quickly accessible form. From this point on, `go_terms` can be used to do multiple functional enrichments on various gene selections. |
|
78 |
+`exmpl_all` is an example of gene background - a vector with gene symbols related to all detections in an imaginary RNA-seq experiment. Since different datasets use different features (gene id, gene symbol, protein id), the column name containing features in `go$mapping` needs to be specified using `feature_name = "gene_symbol"`. The resulting object, `go_terms`, is a data structure containing all the mappings in a quickly accessible form. From this point on, `go_terms` can be employed to perform multiple functional enrichment analyses on various gene selections. |
|
80 | 79 |
|
81 | 80 |
## Functional enrichment |
82 | 81 |
|
83 |
-There are two gene sets attached to the package. `exmpl_all` contains all background gene symbols and `exmpl_sel` contains genes of interest. Functional enrichment in the selection can be found using one fast function call: |
|
82 |
+The package includes two pre-defined gene sets. `exmpl_all` contains all background gene symbols, while `exmpl_sel` comprises the genes of interest. To perform functional enrichment analysis on the selected genes, you can use the following single, efficient function call: |
|
84 | 83 |
|
85 | 84 |
```{r enrichment} |
86 | 85 |
enr <- functional_enrichment(exmpl_all, exmpl_sel, go_terms) |
87 | 86 |
``` |
88 | 87 |
|
89 |
- |
|
90 | 88 |
## The output |
91 | 89 |
|
92 | 90 |
The result of `functional_enrichment` is a tibble with enrichment results. |
... | ... |
@@ -98,19 +96,18 @@ enr |> |
98 | 96 |
|
99 | 97 |
The columns are as follows |
100 | 98 |
|
101 |
- - `N_with` - number of features (genes) with this term in the background of all genes, |
|
102 |
- - `n_with_sel` - number of features with this term in the selection, |
|
103 |
- - `n_expect` - expected number of features with this term under the null hypothesis (terms are randomly distributed), |
|
104 |
- - `enrichment` - ratio of observed to expected, |
|
105 |
- - `odds_ratio` - effect size, odds ratio from the contingency table, |
|
106 |
- - `ids` - identifiers of features with term in the selection, |
|
107 |
- - `p_value` - raw p-value from hypergeometric distribution, |
|
108 |
- - `p_adjust` - p-value adjusted for multiple tests using Benjamini-Hochberg approach. |
|
109 |
- |
|
110 |
- |
|
111 |
-# Interactive example |
|
99 |
+ - `N_with`: The number of features (genes) associated with this term in the background of all genes. |
|
100 |
+ - `n_with_sel`: The number of features associated with this term in the selection. |
|
101 |
+ - `n_expect`: The expected number of features associated with this term under the null hypothesis (terms are randomly distributed). |
|
102 |
+ - `enrichment`: The ratio of observed to expected. |
|
103 |
+ - `odds_ratio`: The effect size, represented by the odds ratio from the contingency table. |
|
104 |
+ - `ids`: The identifiers of features with the term in the selection. |
|
105 |
+ - `p_value`: The raw p-value from the hypergeometric distribution. |
|
106 |
+ - `p_adjust`: The p-value adjusted for multiple tests using the Benjamini-Hochberg approach. |
|
107 |
+ |
|
108 |
+# Interactive Example |
|
112 | 109 |
|
113 |
-A small Shiny app is included in the package to illustrate usage of `fenr` in intractive environment. All slow data loading and preparation is done before the app is started. |
|
110 |
+A small Shiny app is included in the package to demonstrate the usage of `fenr` in an interactive environment. All time-consuming data loading and preparation tasks are performed before the app is launched. |
|
114 | 111 |
|
115 | 112 |
```{r interactive_prepare, eval=FALSE} |
116 | 113 |
data(yeast_de) |
... | ... |
@@ -119,9 +116,9 @@ term_data <- fetch_terms_for_example(yeast_de) |
119 | 116 |
|
120 | 117 |
`yeast_de` is the result of differential expression (using `edgeR`) on a subset of 6+6 replicates from [Gierlinski et al. (2015)](https://academic.oup.com/bioinformatics/article/31/22/3625/240923). |
121 | 118 |
|
122 |
-The function `fetch_terms_for_example` uses `fetch_*` functions from `fenr` to download and process data from *GO*, *Reactome* and *KEGG*. One can see how this is done, step by step, by reading the function code from [GitHub](https://github.com/bartongroup/fenr/blob/main/R/iteractive_example.R). The object `term_data` is a named list of `fenr_terms` objects, one for each ontology. |
|
119 |
+The function `fetch_terms_for_example` uses `fetch_*` functions from `fenr` to download and process data from *GO*, *Reactome* and *KEGG*. You can view the step-by-step process by examining the function code on [GitHub](https://github.com/bartongroup/fenr/blob/main/R/iteractive_example.R). The object `term_data` is a named list of `fenr_terms` objects, one for each ontology. |
|
123 | 120 |
|
124 |
-Once the slow part is over, the Shiny app can be started with |
|
121 |
+After completing the slow tasks, you can start the Shiny app by running: |
|
125 | 122 |
|
126 | 123 |
```{r shiny_app, eval=FALSE} |
127 | 124 |
enrichment_interactive(yeast_de, term_data) |