0 | 2 |
new file mode 100644 |
... | ... |
@@ -0,0 +1,30 @@ |
1 |
+Package: ScaledMatrix |
|
2 |
+Version: 0.99.0 |
|
3 |
+Date: 2020-12-12 |
|
4 |
+Title: Creating a DelayedMatrix of Scaled and Centered Values |
|
5 |
+Authors@R: person("Aaron", "Lun", role=c("aut", "cre", "cph"), |
|
6 |
+ email="[email protected]") |
|
7 |
+Imports: |
|
8 |
+ methods, |
|
9 |
+ Matrix, |
|
10 |
+ S4Vectors, |
|
11 |
+ DelayedArray |
|
12 |
+Suggests: |
|
13 |
+ testthat, |
|
14 |
+ BiocStyle, |
|
15 |
+ knitr, |
|
16 |
+ rmarkdown, |
|
17 |
+ BiocSingular |
|
18 |
+biocViews: |
|
19 |
+ Software, |
|
20 |
+ DataRepresentation |
|
21 |
+Description: |
|
22 |
+ Provides delayed computation of a matrix of scaled and centered values. |
|
23 |
+ The result is equivalent to using the scale() function but avoids explicit |
|
24 |
+ realization of a dense matrix during block processing. This permits greater |
|
25 |
+ efficiency in common operations, most notably matrix multiplication. |
|
26 |
+License: GPL-3 |
|
27 |
+VignetteBuilder: knitr |
|
28 |
+RoxygenNote: 7.1.1 |
|
29 |
+BugReports: https://github.com/LTLA/ScaledMatrix/issues |
|
30 |
+URL: https://github.com/LTLA/ScaledMatrix |
0 | 31 |
new file mode 100644 |
... | ... |
@@ -0,0 +1,39 @@ |
1 |
+# Generated by roxygen2: do not edit by hand |
|
2 |
+ |
|
3 |
+export(ScaledMatrix) |
|
4 |
+export(ScaledMatrixSeed) |
|
5 |
+exportClasses(ScaledMatrix) |
|
6 |
+exportClasses(ScaledMatrixSeed) |
|
7 |
+exportMethods("%*%") |
|
8 |
+exportMethods("[") |
|
9 |
+exportMethods("dimnames<-") |
|
10 |
+exportMethods(DelayedArray) |
|
11 |
+exportMethods(colMeans) |
|
12 |
+exportMethods(colSums) |
|
13 |
+exportMethods(crossprod) |
|
14 |
+exportMethods(dim) |
|
15 |
+exportMethods(dimnames) |
|
16 |
+exportMethods(extract_array) |
|
17 |
+exportMethods(rowMeans) |
|
18 |
+exportMethods(rowSums) |
|
19 |
+exportMethods(show) |
|
20 |
+exportMethods(t) |
|
21 |
+exportMethods(tcrossprod) |
|
22 |
+importClassesFrom(DelayedArray,DelayedMatrix) |
|
23 |
+importFrom(DelayedArray,DelayedArray) |
|
24 |
+importFrom(DelayedArray,extract_array) |
|
25 |
+importFrom(DelayedArray,new_DelayedArray) |
|
26 |
+importFrom(DelayedArray,seed) |
|
27 |
+importFrom(DelayedArray,sweep) |
|
28 |
+importFrom(Matrix,colMeans) |
|
29 |
+importFrom(Matrix,colSums) |
|
30 |
+importFrom(Matrix,crossprod) |
|
31 |
+importFrom(Matrix,drop) |
|
32 |
+importFrom(Matrix,rowMeans) |
|
33 |
+importFrom(Matrix,rowSums) |
|
34 |
+importFrom(Matrix,t) |
|
35 |
+importFrom(Matrix,tcrossprod) |
|
36 |
+importFrom(S4Vectors,setValidity2) |
|
37 |
+importFrom(methods,is) |
|
38 |
+importFrom(methods,new) |
|
39 |
+importFrom(methods,show) |
0 | 40 |
new file mode 100644 |
... | ... |
@@ -0,0 +1,6 @@ |
1 |
+#' @export |
|
2 |
+setClass("ScaledMatrixSeed", slots=c(.matrix="ANY", center="numeric", scale="numeric", use_center="logical", use_scale="logical", transposed="logical")) |
|
3 |
+ |
|
4 |
+#' @export |
|
5 |
+#' @importClassesFrom DelayedArray DelayedMatrix |
|
6 |
+setClass("ScaledMatrix", contains="DelayedMatrix", slots=c(seed="ScaledMatrixSeed")) |
0 | 7 |
new file mode 100644 |
... | ... |
@@ -0,0 +1,197 @@ |
1 |
+#' The ScaledMatrix class |
|
2 |
+#' |
|
3 |
+#' Defines the ScaledMatrixSeed and ScaledMatrix classes and their associated methods. |
|
4 |
+#' These classes support delayed centering and scaling of the columns in the same manner as \code{\link{scale}}, |
|
5 |
+#' but preserving the original data structure for more efficient operations like matrix multiplication. |
|
6 |
+#' |
|
7 |
+#' @param x A matrix or any matrix-like object (e.g., from the \pkg{Matrix} package). |
|
8 |
+#' |
|
9 |
+#' This can alternatively be a ScaledMatrixSeed, in which case any values of \code{center} and \code{scale} are ignored. |
|
10 |
+#' @param center A numeric vector of length equal to \code{ncol(x)}, where each element is to be subtracted from the corresponding column of \code{x}. |
|
11 |
+#' A \code{NULL} value indicates that no subtraction is to be performed. |
|
12 |
+#' Alternatively \code{TRUE}, in which case it is set to the column means of \code{x}. |
|
13 |
+#' @param scale A numeric vector of length equal to \code{ncol(x)}, where each element is to divided from the corresponding column of \code{x} (after subtraction). |
|
14 |
+#' A \code{NULL} value indicates that no division is to be performed. |
|
15 |
+#' Alternatively \code{TRUE}, in which case it is set to the column-wise root-mean-squared differences from \code{center} |
|
16 |
+#' (interpretable as standard deviations if \code{center} is set to the column means, see \code{\link{scale}} for commentary). |
|
17 |
+#' |
|
18 |
+#' @return |
|
19 |
+#' The \code{ScaledMatrixSeed} constructor will return a ScaledMatrixSeed object. |
|
20 |
+#' |
|
21 |
+#' The \code{ScaledMatrix} constructor will return a ScaledMatrix object equivalent to \code{t((t(x) - center)/scale)}. |
|
22 |
+#' |
|
23 |
+#' @section Methods for ScaledMatrixSeed objects: |
|
24 |
+#' ScaledMatrixSeed objects are implemented as \linkS4class{DelayedMatrix} backends. |
|
25 |
+#' They support standard operations like \code{dim}, \code{dimnames} and \code{extract_array}. |
|
26 |
+#' |
|
27 |
+#' Passing a ScaledMatrixSeed object to the \code{\link{DelayedArray}} constructor will create a ScaledMatrix object. |
|
28 |
+#' |
|
29 |
+#' It is possible for \code{x} to contain a ScaledMatrix, thus nesting one ScaledMatrix inside another. |
|
30 |
+#' This can occasionally be useful in combination with transposition to achieve centering/scaling in both dimensions. |
|
31 |
+#' |
|
32 |
+#' @section Methods for ScaledMatrix objects: |
|
33 |
+#' ScaledMatrix objects are derived from \linkS4class{DelayedMatrix} objects and support all of valid operations on the latter. |
|
34 |
+#' Several functions are specialized for greater efficiency when operating on ScaledMatrix instances, including: |
|
35 |
+#' \itemize{ |
|
36 |
+#' \item Subsetting, transposition and replacement of row/column names. |
|
37 |
+#' These will return a new ScaledMatrix rather than a DelayedMatrix. |
|
38 |
+#' \item Matrix multiplication via \code{\%*\%}, \code{crossprod} and \code{tcrossprod}. |
|
39 |
+#' These functions will return a DelayedMatrix. |
|
40 |
+#' \item Calculation of row and column sums and means by \code{colSums}, \code{rowSums}, etc. |
|
41 |
+#' } |
|
42 |
+#' |
|
43 |
+#' All other operations applied to a ScaledMatrix will use the underlying \pkg{DelayedArray} machinery. |
|
44 |
+#' Unary or binary operations will generally create a new DelayedMatrix instance containing a ScaledMatrixSeed. |
|
45 |
+#' |
|
46 |
+#' Tranposition can effectively be used to allow centering/scaling on the rows if the input \code{x} is transposed. |
|
47 |
+#' |
|
48 |
+#' @section Efficiency vs precision: |
|
49 |
+#' The raison d'etre of the ScaledMatrix is that it can offer faster matrix multiplication by avoiding the \pkg{DelayedArray} block processing. |
|
50 |
+#' This is done by refactoring the scaling/centering operations to use the (hopefully more efficient) multiplication operator of the original matrix \code{x}. |
|
51 |
+#' Unfortunately, the speed-up comes at the cost of increasing the risk of catastrophic cancellation. |
|
52 |
+#' The procedure requires subtraction of one large intermediate number from another to obtain the values of the final matrix product. |
|
53 |
+#' This could result in a loss of numerical precision that compromises the accuracy of downstream algorithms. |
|
54 |
+#' In practice, this does not seem to be a major concern though one should be careful if the input \code{x} contains very large positive/negative values. |
|
55 |
+#' |
|
56 |
+#' @author |
|
57 |
+#' Aaron Lun |
|
58 |
+#' |
|
59 |
+#' @examples |
|
60 |
+#' library(Matrix) |
|
61 |
+#' y <- ScaledMatrix(rsparsematrix(10, 20, 0.1), |
|
62 |
+#' center=rnorm(20), scale=1+runif(20)) |
|
63 |
+#' y |
|
64 |
+#' |
|
65 |
+#' crossprod(y) |
|
66 |
+#' tcrossprod(y) |
|
67 |
+#' y %*% rnorm(20) |
|
68 |
+#' |
|
69 |
+#' @aliases |
|
70 |
+#' ScaledMatrixSeed |
|
71 |
+#' ScaledMatrixSeed-class |
|
72 |
+#' |
|
73 |
+#' dim,ScaledMatrixSeed-method |
|
74 |
+#' dimnames,ScaledMatrixSeed-method |
|
75 |
+#' extract_array,ScaledMatrixSeed-method |
|
76 |
+#' DelayedArray,ScaledMatrixSeed-method |
|
77 |
+#' show,ScaledMatrixSeed-method |
|
78 |
+#' |
|
79 |
+#' ScaledMatrix |
|
80 |
+#' ScaledMatrix-class |
|
81 |
+#' |
|
82 |
+#' dimnames<-,ScaledMatrix,ANY-method |
|
83 |
+#' t,ScaledMatrix-method |
|
84 |
+#' [,ScaledMatrix,ANY,ANY,ANY-method |
|
85 |
+#' |
|
86 |
+#' colSums,ScaledMatrix-method |
|
87 |
+#' rowSums,ScaledMatrix-method |
|
88 |
+#' colMeans,ScaledMatrix-method |
|
89 |
+#' rowMeans,ScaledMatrix-method |
|
90 |
+#' |
|
91 |
+#' %*%,ANY,ScaledMatrix-method |
|
92 |
+#' %*%,ScaledMatrix,ANY-method |
|
93 |
+#' %*%,ScaledMatrix,ScaledMatrix-method |
|
94 |
+#' |
|
95 |
+#' crossprod,ScaledMatrix,missing-method |
|
96 |
+#' crossprod,ScaledMatrix,ANY-method |
|
97 |
+#' crossprod,ANY,ScaledMatrix-method |
|
98 |
+#' crossprod,ScaledMatrix,ScaledMatrix-method |
|
99 |
+#' |
|
100 |
+#' tcrossprod,ScaledMatrix,missing-method |
|
101 |
+#' tcrossprod,ScaledMatrix,ANY-method |
|
102 |
+#' tcrossprod,ANY,ScaledMatrix-method |
|
103 |
+#' tcrossprod,ScaledMatrix,ScaledMatrix-method |
|
104 |
+#' |
|
105 |
+#' @docType class |
|
106 |
+#' @name ScaledMatrix |
|
107 |
+NULL |
|
108 |
+ |
|
109 |
+#' @export |
|
110 |
+#' @rdname ScaledMatrix |
|
111 |
+#' @importFrom DelayedArray DelayedArray |
|
112 |
+#' @importFrom Matrix colMeans rowSums t |
|
113 |
+ScaledMatrix <- function(x, center=NULL, scale=NULL) { |
|
114 |
+ if (isTRUE(center)) { |
|
115 |
+ center <- colMeans(x) |
|
116 |
+ } |
|
117 |
+ if (isTRUE(scale)) { |
|
118 |
+ tx <- t(DelayedArray(x)) |
|
119 |
+ if (!is.null(center)) { |
|
120 |
+ tx <- tx - center |
|
121 |
+ } |
|
122 |
+ ss <- rowSums(tx^2) |
|
123 |
+ scale <- sqrt(ss / (nrow(x) - 1)) |
|
124 |
+ } |
|
125 |
+ DelayedArray(ScaledMatrixSeed(x, center=center, scale=scale)) |
|
126 |
+} |
|
127 |
+ |
|
128 |
+#' @export |
|
129 |
+#' @importFrom DelayedArray DelayedArray new_DelayedArray |
|
130 |
+setMethod("DelayedArray", "ScaledMatrixSeed", |
|
131 |
+ function(seed) new_DelayedArray(seed, Class="ScaledMatrix") |
|
132 |
+) |
|
133 |
+ |
|
134 |
+################################### |
|
135 |
+# Overridden utilities from DelayedArray, for efficiency. |
|
136 |
+ |
|
137 |
+#' @export |
|
138 |
+#' @importFrom DelayedArray DelayedArray seed |
|
139 |
+setReplaceMethod("dimnames", "ScaledMatrix", function(x, value) { |
|
140 |
+ DelayedArray(rename_ScaledMatrixSeed(seed(x), value)) |
|
141 |
+}) |
|
142 |
+ |
|
143 |
+#' @export |
|
144 |
+#' @importFrom DelayedArray DelayedArray seed |
|
145 |
+setMethod("t", "ScaledMatrix", function(x) { |
|
146 |
+ DelayedArray(transpose_ScaledMatrixSeed(seed(x))) |
|
147 |
+}) |
|
148 |
+ |
|
149 |
+#' @export |
|
150 |
+#' @importFrom DelayedArray DelayedArray seed |
|
151 |
+setMethod("[", "ScaledMatrix", function(x, i, j, ..., drop=TRUE) { |
|
152 |
+ if (missing(i)) i <- NULL |
|
153 |
+ if (missing(j)) j <- NULL |
|
154 |
+ out <- DelayedArray(subset_ScaledMatrixSeed(seed(x), i=i, j=j)) |
|
155 |
+ |
|
156 |
+ if (drop && any(dim(out)==1L)) { |
|
157 |
+ return(drop(out)) |
|
158 |
+ } |
|
159 |
+ out |
|
160 |
+}) |
|
161 |
+ |
|
162 |
+################################### |
|
163 |
+# Basic matrix stats. |
|
164 |
+ |
|
165 |
+#' @export |
|
166 |
+#' @importFrom Matrix colSums rowSums drop |
|
167 |
+setMethod("colSums", "ScaledMatrix", function(x, na.rm = FALSE, dims = 1L) { |
|
168 |
+ if (is_transposed(seed(x))) { |
|
169 |
+ return(rowSums(t(x))) |
|
170 |
+ } |
|
171 |
+ |
|
172 |
+ out <- rep(1, nrow(x)) %*% x |
|
173 |
+ out <- drop(out) |
|
174 |
+ names(out) <- colnames(x) |
|
175 |
+ out |
|
176 |
+}) |
|
177 |
+ |
|
178 |
+#' @export |
|
179 |
+#' @importFrom Matrix colSums rowSums drop |
|
180 |
+setMethod("rowSums", "ScaledMatrix", function(x, na.rm = FALSE, dims = 1L) { |
|
181 |
+ if (is_transposed(seed(x))) { |
|
182 |
+ return(colSums(t(x))) |
|
183 |
+ } |
|
184 |
+ |
|
185 |
+ out <- x %*% rep(1, ncol(x)) |
|
186 |
+ out <- drop(out) |
|
187 |
+ names(out) <- rownames(x) |
|
188 |
+ out |
|
189 |
+}) |
|
190 |
+ |
|
191 |
+#' @export |
|
192 |
+#' @importFrom Matrix colMeans colSums |
|
193 |
+setMethod("colMeans", "ScaledMatrix", function(x, na.rm = FALSE, dims = 1L) colSums(x)/nrow(x)) |
|
194 |
+ |
|
195 |
+#' @export |
|
196 |
+#' @importFrom Matrix rowMeans rowSums |
|
197 |
+setMethod("rowMeans", "ScaledMatrix", function(x, na.rm = FALSE, dims = 1L) rowSums(x)/ncol(x)) |
0 | 198 |
new file mode 100644 |
... | ... |
@@ -0,0 +1,165 @@ |
1 |
+#' @export |
|
2 |
+#' @importFrom methods new is |
|
3 |
+ScaledMatrixSeed <- function(x, center=NULL, scale=NULL) { |
|
4 |
+ if (missing(x)) { |
|
5 |
+ x <- matrix(0, 0, 0) |
|
6 |
+ } else if (is(x, "ScaledMatrixSeed")) { |
|
7 |
+ return(x) |
|
8 |
+ } |
|
9 |
+ |
|
10 |
+ use_center <- !is.null(center) |
|
11 |
+ use_scale <- !is.null(scale) |
|
12 |
+ new("ScaledMatrixSeed", .matrix=x, center=as.numeric(center), scale=as.numeric(scale), use_center=use_center, use_scale=use_scale, transposed=FALSE) |
|
13 |
+} |
|
14 |
+ |
|
15 |
+#' @importFrom S4Vectors setValidity2 |
|
16 |
+setValidity2("ScaledMatrixSeed", function(object) { |
|
17 |
+ msg <- character(0) |
|
18 |
+ |
|
19 |
+ # Checking scalars. |
|
20 |
+ if (length(use_center(object))!=1L) { |
|
21 |
+ msg <- c(msg, "'use_center' must be a logical scalar") |
|
22 |
+ } |
|
23 |
+ if (length(use_scale(object))!=1L) { |
|
24 |
+ msg <- c(msg, "'use_scale' must be a logical scalar") |
|
25 |
+ } |
|
26 |
+ if (length(is_transposed(object))!=1L) { |
|
27 |
+ msg <- c(msg, "'transposed' must be a logical scalar") |
|
28 |
+ } |
|
29 |
+ |
|
30 |
+ # Checking vectors. |
|
31 |
+ if (use_center(object) && length(get_center(object))!=ncol(object)) { |
|
32 |
+ msg <- c(msg, "length of 'center' must equal 'ncol(object)'") |
|
33 |
+ } |
|
34 |
+ if (use_scale(object) && length(get_scale(object))!=ncol(object)) { |
|
35 |
+ msg <- c(msg, "length of 'scale' must equal 'ncol(object)'") |
|
36 |
+ } |
|
37 |
+ |
|
38 |
+ if (length(msg)) { |
|
39 |
+ return(msg) |
|
40 |
+ } |
|
41 |
+ return(TRUE) |
|
42 |
+}) |
|
43 |
+ |
|
44 |
+#' @export |
|
45 |
+#' @importFrom methods show |
|
46 |
+setMethod("show", "ScaledMatrixSeed", function(object) { |
|
47 |
+ cat(sprintf("%i x %i ScaledMatrixSeed object", nrow(object), ncol(object)), |
|
48 |
+ sprintf("representation: %s", class(get_matrix2(object))), |
|
49 |
+ sprintf("centering: %s", if (use_center(object)) "yes" else "no"), |
|
50 |
+ sprintf("scaling: %s", if (use_scale(object)) "yes" else "no"), |
|
51 |
+ sep="\n") |
|
52 |
+}) |
|
53 |
+ |
|
54 |
+################################### |
|
55 |
+# Internal getters. |
|
56 |
+ |
|
57 |
+get_matrix2 <- function(x) [email protected] |
|
58 |
+ |
|
59 |
+get_center <- function(x) x@center |
|
60 |
+ |
|
61 |
+get_scale <- function(x) x@scale |
|
62 |
+ |
|
63 |
+use_center <- function(x) x@use_center |
|
64 |
+ |
|
65 |
+use_scale <- function(x) x@use_scale |
|
66 |
+ |
|
67 |
+is_transposed <- function(x) x@transposed |
|
68 |
+ |
|
69 |
+################################### |
|
70 |
+# DelayedArray support utilities. |
|
71 |
+ |
|
72 |
+#' @export |
|
73 |
+setMethod("dim", "ScaledMatrixSeed", function(x) { |
|
74 |
+ d <- dim(get_matrix2(x)) |
|
75 |
+ if (is_transposed(x)) { d <- rev(d) } |
|
76 |
+ d |
|
77 |
+}) |
|
78 |
+ |
|
79 |
+#' @export |
|
80 |
+setMethod("dimnames", "ScaledMatrixSeed", function(x) { |
|
81 |
+ d <- dimnames(get_matrix2(x)) |
|
82 |
+ if (is_transposed(x)) { d <- rev(d) } |
|
83 |
+ d |
|
84 |
+}) |
|
85 |
+ |
|
86 |
+#' @export |
|
87 |
+#' @importFrom DelayedArray extract_array |
|
88 |
+setMethod("extract_array", "ScaledMatrixSeed", function(x, index) { |
|
89 |
+ x2 <- subset_ScaledMatrixSeed(x, index[[1]], index[[2]]) |
|
90 |
+ realize_ScaledMatrixSeed(x2) |
|
91 |
+}) |
|
92 |
+ |
|
93 |
+################################### |
|
94 |
+# Other utilities. |
|
95 |
+ |
|
96 |
+rename_ScaledMatrixSeed <- function(x, value) { |
|
97 |
+ if (is_transposed(x)) value <- rev(value) |
|
98 |
+ dimnames([email protected]) <- value |
|
99 |
+ x |
|
100 |
+} |
|
101 |
+ |
|
102 |
+transpose_ScaledMatrixSeed <- function(x) { |
|
103 |
+ x@transposed <- !is_transposed(x) |
|
104 |
+ x |
|
105 |
+} |
|
106 |
+ |
|
107 |
+#' @importFrom Matrix t |
|
108 |
+#' @importFrom methods is |
|
109 |
+realize_ScaledMatrixSeed <- function(x, ...) { |
|
110 |
+ out <- get_matrix2(x) |
|
111 |
+ |
|
112 |
+ if (use_scale(x) || use_center(x)) { |
|
113 |
+ if (is(out, "ScaledMatrix")) { |
|
114 |
+ # Any '-' and '/' would collapse this to a DelayedArray, |
|
115 |
+ # which would then call extract_array, which would then |
|
116 |
+ # call realize_ScaledMatrixSeed, forming an infinite loop. |
|
117 |
+ # So we might as well realize it now. |
|
118 |
+ out <- realize_ScaledMatrixSeed(seed(out)) |
|
119 |
+ } |
|
120 |
+ |
|
121 |
+ out <- t(out) |
|
122 |
+ if (use_center(x)) { |
|
123 |
+ out <- out - get_center(x) |
|
124 |
+ } |
|
125 |
+ if (use_scale(x)) { |
|
126 |
+ out <- out / get_scale(x) |
|
127 |
+ } |
|
128 |
+ |
|
129 |
+ if (!is_transposed(x)) out <- t(out) |
|
130 |
+ } else { |
|
131 |
+ if (is_transposed(x)) out <- t(out) |
|
132 |
+ } |
|
133 |
+ |
|
134 |
+ as.matrix(out) |
|
135 |
+} |
|
136 |
+ |
|
137 |
+subset_ScaledMatrixSeed <- function(x, i, j) { |
|
138 |
+ if (is_transposed(x)) { |
|
139 |
+ x2 <- transpose_ScaledMatrixSeed(x) |
|
140 |
+ x2 <- subset_ScaledMatrixSeed(x2, i=j, j=i) |
|
141 |
+ return(transpose_ScaledMatrixSeed(x2)) |
|
142 |
+ } |
|
143 |
+ |
|
144 |
+ if (!is.null(i)) { |
|
145 |
+ [email protected] <- get_matrix2(x)[i,,drop=FALSE] |
|
146 |
+ } |
|
147 |
+ |
|
148 |
+ if (!is.null(j)) { |
|
149 |
+ if (is.character(j)) { |
|
150 |
+ j <- match(j, colnames(x)) |
|
151 |
+ } |
|
152 |
+ |
|
153 |
+ [email protected] <- get_matrix2(x)[,j,drop=FALSE] |
|
154 |
+ |
|
155 |
+ if (use_scale(x)) { |
|
156 |
+ x@scale <- get_scale(x)[j] |
|
157 |
+ } |
|
158 |
+ |
|
159 |
+ if (use_center(x)) { |
|
160 |
+ x@center <- get_center(x)[j] |
|
161 |
+ } |
|
162 |
+ } |
|
163 |
+ |
|
164 |
+ return(x) |
|
165 |
+} |
0 | 166 |
new file mode 100644 |
... | ... |
@@ -0,0 +1,634 @@ |
1 |
+# We attempt to use operators defined for '.matrix' in the 'ScaledMatrixSeed'. |
|
2 |
+# This avoids expensive modifications such as loss of sparsity. |
|
3 |
+# Centering and scaling are factored out into separate operations. |
|
4 |
+# |
|
5 |
+# We assume that the non-'ScaledMatrix' argument is small and can be modified cheaply. |
|
6 |
+# We also assume that the matrix product is small and can be modified cheaply. |
|
7 |
+# This allows centering and scaling to be applied *after* multiplication. |
|
8 |
+# |
|
9 |
+# Here are some ground rules for how these functions must work: |
|
10 |
+# |
|
11 |
+# - NO arithmetic operations shall be applied to a ScaledMatrix. |
|
12 |
+# This includes nested ScaledMatrices that are present in '.matrix'. |
|
13 |
+# Such operations collapses the ScaledMatrix to a DelayedMatrix, |
|
14 |
+# resulting in slow block processing during multiplication. |
|
15 |
+# |
|
16 |
+# - NO addition/subtraction operations shall be applied to '.matrix'. |
|
17 |
+# This is necessary to avoid loss of sparsity for sparse '.matrix', |
|
18 |
+# as well as to avoid block processing for ScaledMatrix '.matrix'. |
|
19 |
+# |
|
20 |
+# - NO division/multiplication operations should be applied to '.matrix'. |
|
21 |
+# This is largely a consequence of the first point above. |
|
22 |
+# Exceptions are only allowed when this is unavoidable, e.g., in '.internal_tcrossprod'. |
|
23 |
+# |
|
24 |
+# - NO calling of %*% or (t)crossprod on a ScaledMatrix of the same nesting depth as an input ScaledMatrix. |
|
25 |
+# Internal multiplication should always be applied to '.matrix', to avoid infinite S4 recursion. |
|
26 |
+# Each method call should strip away one nesting level, i.e., operate on the seed. |
|
27 |
+# Exceptions are allowed for dual ScaledMatrix multiplication, |
|
28 |
+# where one argument is allowed to be of the same depth. |
|
29 |
+ |
|
30 |
+#' @export |
|
31 |
+#' @importFrom Matrix t |
|
32 |
+#' @importFrom DelayedArray seed DelayedArray |
|
33 |
+setMethod("%*%", c("ScaledMatrix", "ANY"), function(x, y) { |
|
34 |
+ x_seed <- seed(x) |
|
35 |
+ if (is_transposed(x_seed)) { |
|
36 |
+ out <- t(.leftmult_ScaledMatrix(t(y), x_seed)) |
|
37 |
+ } else { |
|
38 |
+ out <- .rightmult_ScaledMatrix(x_seed, y) |
|
39 |
+ } |
|
40 |
+ DelayedArray(out) |
|
41 |
+}) |
|
42 |
+ |
|
43 |
+#' @importFrom DelayedArray sweep |
|
44 |
+.rightmult_ScaledMatrix <- function(x_seed, y) { |
|
45 |
+ if (use_scale(x_seed)) { |
|
46 |
+ y <- y / get_scale(x_seed) |
|
47 |
+ } |
|
48 |
+ |
|
49 |
+ out <- as.matrix(get_matrix2(x_seed) %*% y) |
|
50 |
+ |
|
51 |
+ if (use_center(x_seed)) { |
|
52 |
+ out <- sweep(out, 2, as.numeric(get_center(x_seed) %*% y), "-", check.margin=FALSE) |
|
53 |
+ } |
|
54 |
+ |
|
55 |
+ out |
|
56 |
+} |
|
57 |
+ |
|
58 |
+#' @export |
|
59 |
+#' @importFrom Matrix t |
|
60 |
+#' @importFrom DelayedArray seed DelayedArray |
|
61 |
+setMethod("%*%", c("ANY", "ScaledMatrix"), function(x, y) { |
|
62 |
+ y_seed <- seed(y) |
|
63 |
+ if (is_transposed(y_seed)) { |
|
64 |
+ if (!is.null(dim(x))) { |
|
65 |
+ # Vectors don't quite behave as 1-column matrices here. |
|
66 |
+ # so we need to be a bit more careful. |
|
67 |
+ x <- t(x) |
|
68 |
+ } |
|
69 |
+ out <- t(.rightmult_ScaledMatrix(y_seed, x)) |
|
70 |
+ } else { |
|
71 |
+ out <- .leftmult_ScaledMatrix(x, y_seed) |
|
72 |
+ } |
|
73 |
+ DelayedArray(out) |
|
74 |
+}) |
|
75 |
+ |
|
76 |
+#' @importFrom Matrix rowSums |
|
77 |
+#' @importFrom DelayedArray sweep |
|
78 |
+.leftmult_ScaledMatrix <- function(x, y_seed) { |
|
79 |
+ out <- as.matrix(x %*% get_matrix2(y_seed)) |
|
80 |
+ |
|
81 |
+ if (use_center(y_seed)) { |
|
82 |
+ if (is.null(dim(x))) { |
|
83 |
+ out <- out - get_center(y_seed) * sum(x) |
|
84 |
+ } else { |
|
85 |
+ out <- out - outer(rowSums(x), get_center(y_seed), "*") |
|
86 |
+ } |
|
87 |
+ } |
|
88 |
+ |
|
89 |
+ if (use_scale(y_seed)) { |
|
90 |
+ out <- sweep(out, 2, get_scale(y_seed), "/", check.margin=FALSE) |
|
91 |
+ } |
|
92 |
+ |
|
93 |
+ out |
|
94 |
+} |
|
95 |
+ |
|
96 |
+#' @export |
|
97 |
+#' @importFrom DelayedArray seed DelayedArray |
|
98 |
+setMethod("%*%", c("ScaledMatrix", "ScaledMatrix"), function(x, y) { |
|
99 |
+ x_seed <- seed(x) |
|
100 |
+ y_seed <- seed(y) |
|
101 |
+ res <- .dual_mult_dispatcher(x_seed, y_seed, is_transposed(x_seed), is_transposed(y_seed)) |
|
102 |
+ DelayedArray(res) |
|
103 |
+}) |
|
104 |
+ |
|
105 |
+#' @importFrom Matrix t |
|
106 |
+.dual_mult_dispatcher <- function(x_seed, y_seed, x_trans, y_trans) { |
|
107 |
+ if (!x_trans) { |
|
108 |
+ if (!y_trans) { |
|
109 |
+ res <- .multiply_u2u(x_seed, y_seed) |
|
110 |
+ } else { |
|
111 |
+ res <- .multiply_u2t(x_seed, y_seed) |
|
112 |
+ } |
|
113 |
+ } else { |
|
114 |
+ if (!y_trans) { |
|
115 |
+ res <- .multiply_t2u(x_seed, y_seed) |
|
116 |
+ } else { |
|
117 |
+ res <- .multiply_u2u(y_seed, x_seed) |
|
118 |
+ res <- t(res) |
|
119 |
+ } |
|
120 |
+ } |
|
121 |
+ res |
|
122 |
+} |
|
123 |
+ |
|
124 |
+################################### |
|
125 |
+# ScMat %*% ScMat utilities. |
|
126 |
+ |
|
127 |
+# We do not implement ScMat %*% ScMat in terms of left/right %*%. |
|
128 |
+# This would cause scaling to be applied on one of the ScMats, |
|
129 |
+# collapsing it into a DelayedMatrix. Subsequent multiplication |
|
130 |
+# would use block processing, which would be too slow. |
|
131 |
+ |
|
132 |
+#' @importFrom Matrix drop rowSums |
|
133 |
+#' @importFrom DelayedArray sweep |
|
134 |
+.multiply_u2u <- function(x_seed, y_seed) |
|
135 |
+# Considering the problem of (X - C_x)S_x (Y - C_y)S_y. |
|
136 |
+{ |
|
137 |
+ # Computing X S_x Y S_y |
|
138 |
+ x0 <- get_matrix2(x_seed) |
|
139 |
+ if (use_scale(x_seed)) { |
|
140 |
+ x0 <- ScaledMatrix(x0, scale=get_scale(x_seed)) |
|
141 |
+ } |
|
142 |
+ |
|
143 |
+ result <- as.matrix(x0 %*% get_matrix2(y_seed)) |
|
144 |
+ if (use_scale(y_seed)) { |
|
145 |
+ result <- sweep(result, 2, get_scale(y_seed), "/", check.margin=FALSE) |
|
146 |
+ } |
|
147 |
+ |
|
148 |
+ # Computing C_x S_x Y S_y, and subtracting it from 'result'. |
|
149 |
+ if (use_center(x_seed)) { |
|
150 |
+ x.center <- get_center(x_seed) |
|
151 |
+ if (use_scale(x_seed)) { |
|
152 |
+ x.center <- x.center / get_scale(x_seed) |
|
153 |
+ } |
|
154 |
+ |
|
155 |
+ component2 <- drop(x.center %*% get_matrix2(y_seed)) |
|
156 |
+ if (use_scale(y_seed)) { |
|
157 |
+ component2 <- component2 / get_scale(y_seed) |
|
158 |
+ } |
|
159 |
+ |
|
160 |
+ result <- sweep(result, 2, component2, "-", check.margin=FALSE) |
|
161 |
+ } |
|
162 |
+ |
|
163 |
+ # Computing C_x S_x C_y S_y, and adding it to 'result'. |
|
164 |
+ if (use_center(x_seed) && use_center(y_seed)) { |
|
165 |
+ x.center <- get_center(x_seed) |
|
166 |
+ if (use_scale(x_seed)) { |
|
167 |
+ x.center <- x.center / get_scale(x_seed) |
|
168 |
+ } |
|
169 |
+ |
|
170 |
+ y.center <- get_center(y_seed) |
|
171 |
+ if (use_scale(y_seed)) { |
|
172 |
+ y.center <- y.center / get_scale(y_seed) |
|
173 |
+ } |
|
174 |
+ |
|
175 |
+ component4 <- sum(x.center) * y.center |
|
176 |
+ result <- sweep(result, 2, component4, "+", check.margin=FALSE) |
|
177 |
+ } |
|
178 |
+ |
|
179 |
+ # Computing X S_x C_y S_y, and subtracting it from 'result'. |
|
180 |
+ # This is done last to avoid subtracting large values. |
|
181 |
+ if (use_center(y_seed)) { |
|
182 |
+ y.center <- get_center(y_seed) |
|
183 |
+ if (use_scale(y_seed)) { |
|
184 |
+ y.center <- y.center / get_scale(y_seed) |
|
185 |
+ } |
|
186 |
+ |
|
187 |
+ component3 <- outer(rowSums(x0), y.center) |
|
188 |
+ result <- result - component3 |
|
189 |
+ } |
|
190 |
+ |
|
191 |
+ result |
|
192 |
+} |
|
193 |
+ |
|
194 |
+#' @importFrom Matrix tcrossprod drop |
|
195 |
+#' @importFrom DelayedArray sweep |
|
196 |
+.multiply_u2t <- function(x_seed, y_seed) |
|
197 |
+# Considering the problem of (X - C_x)S_x S_y(Y' - C_y') |
|
198 |
+{ |
|
199 |
+ # Computing X S_x S_y Y' |
|
200 |
+ x0 <- get_matrix2(x_seed) |
|
201 |
+ if (use_scale(x_seed) || use_scale(y_seed)) { |
|
202 |
+ scaling <- 1 |
|
203 |
+ if (use_scale(x_seed)) { |
|
204 |
+ scaling <- scaling * get_scale(x_seed) |
|
205 |
+ } |
|
206 |
+ if (use_scale(y_seed)) { |
|
207 |
+ scaling <- scaling * get_scale(y_seed) |
|
208 |
+ } |
|
209 |
+ x0 <- ScaledMatrix(x0, scale=scaling) |
|
210 |
+ } |
|
211 |
+ result <- as.matrix(tcrossprod(x0, get_matrix2(y_seed))) |
|
212 |
+ |
|
213 |
+ # Computing C_x S_x S_y Y', and subtracting it from 'result'. |
|
214 |
+ if (use_center(x_seed)) { |
|
215 |
+ x.center <- get_center(x_seed) |
|
216 |
+ if (use_scale(x_seed)) { |
|
217 |
+ x.center <- x.center / get_scale(x_seed) |
|
218 |
+ } |
|
219 |
+ if (use_scale(y_seed)) { |
|
220 |
+ x.center <- x.center / get_scale(y_seed) |
|
221 |
+ } |
|
222 |
+ |
|
223 |
+ component2 <- drop(tcrossprod(x.center, get_matrix2(y_seed))) |
|
224 |
+ result <- sweep(result, 2, component2, "-", check.margin=FALSE) |
|
225 |
+ } |
|
226 |
+ |
|
227 |
+ # Computing C_x S_x S_y C_y', and adding it to 'result'. |
|
228 |
+ if (use_center(x_seed) && use_center(y_seed)) { |
|
229 |
+ x.center <- get_center(x_seed) |
|
230 |
+ if (use_scale(x_seed)) { |
|
231 |
+ x.center <- x.center / get_scale(x_seed) |
|
232 |
+ } |
|
233 |
+ |
|
234 |
+ y.center <- get_center(y_seed) |
|
235 |
+ if (use_scale(y_seed)) { |
|
236 |
+ y.center <- y.center / get_scale(y_seed) |
|
237 |
+ } |
|
238 |
+ |
|
239 |
+ component4 <- sum(x.center*y.center) |
|
240 |
+ result <- result + component4 |
|
241 |
+ } |
|
242 |
+ |
|
243 |
+ # Computing X S_x S_y C_y', and subtracting it from 'result'. |
|
244 |
+ # This is done last to avoid subtracting large values. |
|
245 |
+ if (use_center(y_seed)) { |
|
246 |
+ component3 <- drop(x0 %*% get_center(y_seed)) |
|
247 |
+ result <- result - component3 |
|
248 |
+ } |
|
249 |
+ |
|
250 |
+ result |
|
251 |
+} |
|
252 |
+ |
|
253 |
+#' @importFrom Matrix crossprod colSums |
|
254 |
+#' @importFrom DelayedArray sweep |
|
255 |
+.multiply_t2u <- function(x_seed, y_seed) |
|
256 |
+# Considering the problem of S_x(X' - C_x') (Y - C_y)S_y |
|
257 |
+{ |
|
258 |
+ # C mputing X' Y |
|
259 |
+ x0 <- get_matrix2(x_seed) |
|
260 |
+ y0 <- get_matrix2(y_seed) |
|
261 |
+ result <- as.matrix(crossprod(x0, y0)) |
|
262 |
+ |
|
263 |
+ # Computing C_x' Y, and subtracting it from 'result'. |
|
264 |
+ if (use_center(x_seed)) { |
|
265 |
+ x.center <- get_center(x_seed) |
|
266 |
+ component2 <- outer(x.center, colSums(y0)) |
|
267 |
+ result <- result - component2 |
|
268 |
+ } |
|
269 |
+ |
|
270 |
+ # Computing C_x' C_y, and adding it to 'result'. |
|
271 |
+ if (use_center(x_seed) && use_center(y_seed)) { |
|
272 |
+ x.center <- get_center(x_seed) |
|
273 |
+ y.center <- get_center(y_seed) |
|
274 |
+ component4 <- outer(x.center, y.center) * nrow(y0) |
|
275 |
+ result <- result + component4 |
|
276 |
+ } |
|
277 |
+ |
|
278 |
+ # Computing X' C_y, and subtracting it from 'result'. |
|
279 |
+ # This is done last to avoid subtracting large values. |
|
280 |
+ if (use_center(y_seed)) { |
|
281 |
+ component3 <- outer(colSums(x0), get_center(y_seed)) |
|
282 |
+ result <- result - component3 |
|
283 |
+ } |
|
284 |
+ |
|
285 |
+ if (use_scale(x_seed)) { |
|
286 |
+ result <- result / get_scale(x_seed) |
|
287 |
+ } |
|
288 |
+ if (use_scale(y_seed)) { |
|
289 |
+ result <- sweep(result, 2, get_scale(y_seed), "/", check.margin=FALSE) |
|
290 |
+ } |
|
291 |
+ |
|
292 |
+ result |
|
293 |
+} |
|
294 |
+ |
|
295 |
+################################### |
|
296 |
+# Cross-product. |
|
297 |
+ |
|
298 |
+# Technically, we could implement this in terms of '%*%', |
|
299 |
+# but we use specializations to exploit native crossprod() for '.matrix', |
|
300 |
+# which is probably more efficient. |
|
301 |
+ |
|
302 |
+#' @export |
|
303 |
+#' @importFrom Matrix crossprod |
|
304 |
+#' @importFrom DelayedArray seed DelayedArray |
|
305 |
+setMethod("crossprod", c("ScaledMatrix", "missing"), function(x, y) { |
|
306 |
+ x_seed <- seed(x) |
|
307 |
+ if (is_transposed(x_seed)) { |
|
308 |
+ # No need to t(), the output is symmetric anyway. |
|
309 |
+ out <- .tcp_ScaledMatrix(x_seed) |
|
310 |
+ } else { |
|
311 |
+ out <- .cross_ScaledMatrix(x_seed) |
|
312 |
+ } |
|
313 |
+ |
|
314 |
+ DelayedArray(out) |
|
315 |
+}) |
|
316 |
+ |
|
317 |
+#' @importFrom Matrix crossprod colSums |
|
318 |
+#' @importFrom DelayedArray sweep |
|
319 |
+.cross_ScaledMatrix <- function(x_seed) { |
|
320 |
+ x0 <- get_matrix2(x_seed) |
|
321 |
+ out <- as.matrix(crossprod(x0)) |
|
322 |
+ |
|
323 |
+ if (use_center(x_seed)) { |
|
324 |
+ centering <- get_center(x_seed) |
|
325 |
+ colsums <- colSums(x0) |
|
326 |
+ |
|
327 |
+ # Minus, then add, then minus, to mitigate cancellation. |
|
328 |
+ out <- out - outer(centering, colsums) |
|
329 |
+ out <- out + outer(centering, centering) * nrow(x0) |
|
330 |
+ out <- out - outer(colsums, centering) |
|
331 |
+ } |
|
332 |
+ |
|
333 |
+ if (use_scale(x_seed)) { |
|
334 |
+ scaling <- get_scale(x_seed) |
|
335 |
+ out <- sweep(out / scaling, 2, scaling, "/", check.margin=FALSE) |
|
336 |
+ } |
|
337 |
+ out |
|
338 |
+} |
|
339 |
+ |
|
340 |
+#' @export |
|
341 |
+#' @importFrom Matrix crossprod |
|
342 |
+#' @importFrom DelayedArray seed DelayedArray |
|
343 |
+setMethod("crossprod", c("ScaledMatrix", "ANY"), function(x, y) { |
|
344 |
+ x_seed <- seed(x) |
|
345 |
+ if (is_transposed(x_seed)) { |
|
346 |
+ out <- .rightmult_ScaledMatrix(x_seed, y) |
|
347 |
+ } else { |
|
348 |
+ out <- .rightcross_ScaledMatrix(x_seed, y) |
|
349 |
+ } |
|
350 |
+ DelayedArray(out) |
|
351 |
+}) |
|
352 |
+ |
|
353 |
+#' @importFrom Matrix crossprod colSums |
|
354 |
+.rightcross_ScaledMatrix <- function(x_seed, y) { |
|
355 |
+ out <- as.matrix(crossprod(get_matrix2(x_seed), y)) |
|
356 |
+ |
|
357 |
+ if (use_center(x_seed)) { |
|
358 |
+ if (is.null(dim(y))) { |
|
359 |
+ out <- out - get_center(x_seed) * sum(y) |
|
360 |
+ } else { |
|
361 |
+ out <- out - outer(get_center(x_seed), colSums(y)) |
|
362 |
+ } |
|
363 |
+ } |
|
364 |
+ |
|
365 |
+ if (use_scale(x_seed)) { |
|
366 |
+ out <- out / get_scale(x_seed) |
|
367 |
+ } |
|
368 |
+ |
|
369 |
+ out |
|
370 |
+} |
|
371 |
+ |
|
372 |
+#' @export |
|
373 |
+#' @importFrom Matrix crossprod |
|
374 |
+#' @importFrom DelayedArray seed DelayedArray |
|
375 |
+setMethod("crossprod", c("ANY", "ScaledMatrix"), function(x, y) { |
|
376 |
+ y_seed <- seed(y) |
|
377 |
+ if (is_transposed(y_seed)) { |
|
378 |
+ out <- t(.rightmult_ScaledMatrix(y_seed, x)) |
|
379 |
+ } else { |
|
380 |
+ out <- .leftcross_ScaledMatrix(x, y_seed) |
|
381 |
+ } |
|
382 |
+ DelayedArray(out) |
|
383 |
+}) |
|
384 |
+ |
|
385 |
+#' @importFrom Matrix crossprod colSums |
|
386 |
+#' @importFrom DelayedArray sweep |
|
387 |
+.leftcross_ScaledMatrix <- function(x, y_seed) { |
|
388 |
+ out <- as.matrix(crossprod(x, get_matrix2(y_seed))) |
|
389 |
+ |
|
390 |
+ if (use_center(y_seed)) { |
|
391 |
+ if (is.null(dim(x))) { |
|
392 |
+ out <- sweep(out, 2, sum(x) * get_center(y_seed), "-", check.margin=FALSE) |
|
393 |
+ } else { |
|
394 |
+ out <- out - outer(colSums(x), get_center(y_seed)) |
|
395 |
+ } |
|
396 |
+ } |
|
397 |
+ |
|
398 |
+ if (use_scale(y_seed)) { |
|
399 |
+ out <- sweep(out, 2, get_scale(y_seed), "/", check.margin=FALSE) |
|
400 |
+ } |
|
401 |
+ |
|
402 |
+ out |
|
403 |
+} |
|
404 |
+ |
|
405 |
+#' @export |
|
406 |
+#' @importFrom Matrix crossprod |
|
407 |
+#' @importFrom DelayedArray DelayedArray seed |
|
408 |
+setMethod("crossprod", c("ScaledMatrix", "ScaledMatrix"), function(x, y) { |
|
409 |
+ x_seed <- seed(x) |
|
410 |
+ y_seed <- seed(y) |
|
411 |
+ res <- .dual_mult_dispatcher(x_seed, y_seed, !is_transposed(x_seed), is_transposed(y_seed)) |
|
412 |
+ DelayedArray(res) |
|
413 |
+}) |
|
414 |
+ |
|
415 |
+################################### |
|
416 |
+# Transposed cross-product. |
|
417 |
+ |
|
418 |
+# Technically, we could implement this in terms of '%*%', |
|
419 |
+# but we use specializations to exploit native tcrossprod() for '.matrix', |
|
420 |
+# which is probably more efficient. |
|
421 |
+ |
|
422 |
+#' @export |
|
423 |
+#' @importFrom Matrix tcrossprod |
|
424 |
+#' @importFrom DelayedArray seed DelayedArray sweep |
|
425 |
+setMethod("tcrossprod", c("ScaledMatrix", "missing"), function(x, y) { |
|
426 |
+ x_seed <- seed(x) |
|
427 |
+ if (is_transposed(x_seed)) { |
|
428 |
+ out <- .cross_ScaledMatrix(x_seed) |
|
429 |
+ } else { |
|
430 |
+ out <- .tcp_ScaledMatrix(x_seed) |
|
431 |
+ } |
|
432 |
+ DelayedArray(out) |
|
433 |
+}) |
|
434 |
+ |
|
435 |
+#' @importFrom Matrix tcrossprod |
|
436 |
+.tcp_ScaledMatrix <- function(x_seed) { |
|
437 |
+ x0 <- get_matrix2(x_seed) |
|
438 |
+ |
|
439 |
+ if (use_scale(x_seed)) { |
|
440 |
+ out <- as.matrix(.internal_tcrossprod(x0, get_scale(x_seed))) |
|
441 |
+ } else { |
|
442 |
+ out <- as.matrix(tcrossprod(x0)) |
|
443 |
+ } |
|
444 |
+ |
|
445 |
+ if (use_center(x_seed)) { |
|
446 |
+ centering <- get_center(x_seed) |
|
447 |
+ |
|
448 |
+ if (use_scale(x_seed)) { |
|
449 |
+ centering <- centering / get_scale(x_seed) |
|
450 |
+ extra <- centering / get_scale(x_seed) |
|
451 |
+ } else { |
|
452 |
+ extra <- centering |
|
453 |
+ } |
|
454 |
+ |
|
455 |
+ # With scaling, the use of 'extra' mimics sweep(x0, 2, get_scale(x), "/"), |
|
456 |
+ # except that the scaling is applied to 'centering' rather than directly to 'x0'. |
|
457 |
+ # Without scaling, 'extra' and 'centering' are interchangeable. |
|
458 |
+ component <- tcrossprod(extra, x0) |
|
459 |
+ |
|
460 |
+ # Minus, then add, then minus, to mitigate cancellation. |
|
461 |
+ out <- sweep(out, 2, as.numeric(component), "-", check.margin=FALSE) |
|
462 |
+ out <- out + sum(centering^2) |
|
463 |
+ out <- out - as.numeric(x0 %*% extra) |
|
464 |
+ } |
|
465 |
+ |
|
466 |
+ out |
|
467 |
+} |
|
468 |
+ |
|
469 |
+#' @export |
|
470 |
+#' @importFrom Matrix tcrossprod t |
|
471 |
+#' @importFrom DelayedArray seed DelayedArray sweep |
|
472 |
+setMethod("tcrossprod", c("ScaledMatrix", "ANY"), function(x, y) { |
|
473 |
+ if (is.null(dim(y))) { # for consistency with base::tcrossprod. |
|
474 |
+ stop("non-conformable arguments") |
|
475 |
+ } |
|
476 |
+ |
|
477 |
+ x_seed <- seed(x) |
|
478 |
+ if (is_transposed(x_seed)) { |
|
479 |
+ out <- t(.leftmult_ScaledMatrix(y, x_seed)) |
|
480 |
+ } else { |
|
481 |
+ out <- .righttcp_ScaledMatrix(x_seed, y) |
|
482 |
+ } |
|
483 |
+ DelayedArray(out) |
|
484 |
+}) |
|
485 |
+ |
|
486 |
+#' @importFrom Matrix tcrossprod |
|
487 |
+.righttcp_ScaledMatrix <- function(x_seed, y) { |
|
488 |
+ if (use_scale(x_seed)) { |
|
489 |
+ # 'y' cannot be a vector anymore, due to the check above. |
|
490 |
+ y <- sweep(y, 2, get_scale(x_seed), "/", check.margin=FALSE) |
|
491 |
+ } |
|
492 |
+ |
|
493 |
+ out <- as.matrix(tcrossprod(get_matrix2(x_seed), y)) |
|
494 |
+ |
|
495 |
+ if (use_center(x_seed)) { |
|
496 |
+ out <- sweep(out, 2, as.numeric(tcrossprod(get_center(x_seed), y)), "-", check.margin=FALSE) |
|
497 |
+ } |
|
498 |
+ |
|
499 |
+ out |
|
500 |
+} |
|
501 |
+ |
|
502 |
+#' @export |
|
503 |
+#' @importFrom Matrix tcrossprod t |
|
504 |
+#' @importFrom DelayedArray seed DelayedArray |
|
505 |
+setMethod("tcrossprod", c("ANY", "ScaledMatrix"), function(x, y) { |
|
506 |
+ y_seed <- seed(y) |
|
507 |
+ if (is_transposed(y_seed)) { |
|
508 |
+ out <- .leftmult_ScaledMatrix(x, y_seed) |
|
509 |
+ } else { |
|
510 |
+ out <- .lefttcp_ScaledMatrix(x, y_seed) |
|
511 |
+ } |
|
512 |
+ DelayedArray(out) |
|
513 |
+}) |
|
514 |
+ |
|
515 |
+#' @importFrom Matrix tcrossprod |
|
516 |
+.lefttcp_ScaledMatrix <- function(x, y_seed) { |
|
517 |
+ if (use_scale(y_seed)) { |
|
518 |
+ if (is.null(dim(x))) { |
|
519 |
+ x <- x / get_scale(y_seed) |
|
520 |
+ } else { |
|
521 |
+ x <- sweep(x, 2, get_scale(y_seed), "/", check.margin=FALSE) |
|
522 |
+ } |
|
523 |
+ } |
|
524 |
+ |
|
525 |
+ out <- as.matrix(tcrossprod(x, get_matrix2(y_seed))) |
|
526 |
+ |
|
527 |
+ if (use_center(y_seed)) { |
|
528 |
+ out <- out - as.numeric(x %*% get_center(y_seed)) |
|
529 |
+ } |
|
530 |
+ out |
|
531 |
+} |
|
532 |
+ |
|
533 |
+#' @export |
|
534 |
+#' @importFrom Matrix tcrossprod |
|
535 |
+#' @importFrom DelayedArray DelayedArray seed |
|
536 |
+setMethod("tcrossprod", c("ScaledMatrix", "ScaledMatrix"), function(x, y) { |
|
537 |
+ x_seed <- seed(x) |
|
538 |
+ y_seed <- seed(y) |
|
539 |
+ res <- .dual_mult_dispatcher(x_seed, y_seed, is_transposed(x_seed), !is_transposed(y_seed)) |
|
540 |
+ DelayedArray(res) |
|
541 |
+}) |
|
542 |
+ |
|
543 |
+################################### |
|
544 |
+# Extra code for corner-case calculations of the transposed cross-product. |
|
545 |
+ |
|
546 |
+#' @importFrom DelayedArray seed DelayedArray |
|
547 |
+.update_scale <- function(x, s) { |
|
548 |
+ x_seed <- seed(x) |
|
549 |
+ if (use_scale(x_seed)) { |
|
550 |
+ s <- s * get_scale(x_seed) |
|
551 |
+ } |
|
552 |
+ x_seed@scale <- s |
|
553 |
+ x_seed@use_scale <- TRUE |
|
554 |
+ DelayedArray(x_seed) |
|
555 |
+} |
|
556 |
+ |
|
557 |
+#' @importFrom Matrix tcrossprod |
|
558 |
+#' @importFrom methods is |
|
559 |
+#' @importFrom DelayedArray seed |
|
560 |
+.internal_tcrossprod <- function(x, scale.) |
|
561 |
+# Computes tcrossprod(sweep(x, 2, scale, "/")) when 'x' is a matrix-like object. |
|
562 |
+# 'scale' can be assumed to be non-NULL here. |
|
563 |
+# This will always return a dense ordinary matrix. |
|
564 |
+{ |
|
565 |
+ if (!is(x, "ScaledMatrix")) { |
|
566 |
+ x <- sweep(x, 2, scale., "/", check.margin=FALSE) |
|
567 |
+ return(as.matrix(tcrossprod(x))) |
|
568 |
+ } |
|
569 |
+ |
|
570 |
+ x_seed <- seed(x) |
|
571 |
+ if (!is_transposed(x_seed)) { |
|
572 |
+ x <- .update_scale(x, scale.) |
|
573 |
+ return(as.matrix(tcrossprod(x))) |
|
574 |
+ } |
|
575 |
+ |
|
576 |
+ inner <- get_matrix2(x_seed) |
|
577 |
+ if (is(inner, "ScaledMatrix")) { |
|
578 |
+ if (is_transposed(seed(inner))) { |
|
579 |
+ component1 <- as.matrix(crossprod(.update_scale(inner, scale.))) |
|
580 |
+ } else { |
|
581 |
+ component1 <- .internal_tcrossprod(t(inner), scale.) # recurses. |
|
582 |
+ } |
|
583 |
+ } else { |
|
584 |
+ component1 <- as.matrix(crossprod(inner/scale.)) |
|
585 |
+ } |
|
586 |
+ |
|
587 |
+ if (use_center(x_seed)) { |
|
588 |
+ centering <- get_center(x_seed) |
|
589 |
+ component2 <- .internal_mult_special(centering, scale., inner) |
|
590 |
+ component3 <- t(component2) |
|
591 |
+ component4 <- outer(centering, centering) * sum(1/scale.^2) |
|
592 |
+ final <- (component1 - component2) + (component4 - component3) |
|
593 |
+ } else { |
|
594 |
+ final <- component1 |
|
595 |
+ } |
|
596 |
+ |
|
597 |
+ if (use_scale(x_seed)) { |
|
598 |
+ x.scale <- get_scale(x_seed) |
|
599 |
+ final <- final / x.scale |
|
600 |
+ final <- sweep(final, 2, x.scale, "/", check.margin=FALSE) |
|
601 |
+ } |
|
602 |
+ |
|
603 |
+ final |
|
604 |
+} |
|
605 |
+ |
|
606 |
+#' @importFrom methods is |
|
607 |
+#' @importFrom DelayedArray seed |
|
608 |
+.internal_mult_special <- function(center, scale., Z) |
|
609 |
+# Computes C^T * S^2 * Z where C is a matrix of 'centers' copied byrow=TRUE; |
|
610 |
+# S is a diagonal matrix filled with '1/scale'; and 'Z' is a ScaledMatrix. |
|
611 |
+# This will always return a dense ordinary matrix. |
|
612 |
+{ |
|
613 |
+ if (!is(Z, "ScaledMatrix")) { |
|
614 |
+ return(outer(center, colSums(Z / scale.^2))) |
|
615 |
+ } |
|
616 |
+ |
|
617 |
+ Z_seed <- seed(Z) |
|
618 |
+ if (is_transposed(Z_seed)) { |
|
619 |
+ Z <- .update_scale(Z, scale.^2) |
|
620 |
+ return(outer(center, colSums(Z))) |
|
621 |
+ } |
|
622 |
+ |
|
623 |
+ output <- .internal_mult_special(center, scale., get_matrix2(Z_seed)) # recurses. |
|
624 |
+ |
|
625 |
+ if (use_center(Z_seed)) { |
|
626 |
+ output <- output - outer(center, get_center(Z_seed)) * sum(1/scale.^2) |
|
627 |
+ } |
|
628 |
+ |
|
629 |
+ if (use_scale(Z_seed)) { |
|
630 |
+ output <- sweep(output, 2, get_scale(Z_seed), "/") |
|
631 |
+ } |
|
632 |
+ |
|
633 |
+ output |
|
634 |
+} |
0 | 635 |
new file mode 100644 |
... | ... |
@@ -0,0 +1,112 @@ |
1 |
+% Generated by roxygen2: do not edit by hand |
|
2 |
+% Please edit documentation in R/ScaledMatrix.R |
|
3 |
+\docType{class} |
|
4 |
+\name{ScaledMatrix} |
|
5 |
+\alias{ScaledMatrix} |
|
6 |
+\alias{ScaledMatrixSeed} |
|
7 |
+\alias{ScaledMatrixSeed-class} |
|
8 |
+\alias{dim,ScaledMatrixSeed-method} |
|
9 |
+\alias{dimnames,ScaledMatrixSeed-method} |
|
10 |
+\alias{extract_array,ScaledMatrixSeed-method} |
|
11 |
+\alias{DelayedArray,ScaledMatrixSeed-method} |
|
12 |
+\alias{show,ScaledMatrixSeed-method} |
|
13 |
+\alias{ScaledMatrix-class} |
|
14 |
+\alias{dimnames<-,ScaledMatrix,ANY-method} |
|
15 |
+\alias{t,ScaledMatrix-method} |
|
16 |
+\alias{[,ScaledMatrix,ANY,ANY,ANY-method} |
|
17 |
+\alias{colSums,ScaledMatrix-method} |
|
18 |
+\alias{rowSums,ScaledMatrix-method} |
|
19 |
+\alias{colMeans,ScaledMatrix-method} |
|
20 |
+\alias{rowMeans,ScaledMatrix-method} |
|
21 |
+\alias{\%*\%,ANY,ScaledMatrix-method} |
|
22 |
+\alias{\%*\%,ScaledMatrix,ANY-method} |
|
23 |
+\alias{\%*\%,ScaledMatrix,ScaledMatrix-method} |
|
24 |
+\alias{crossprod,ScaledMatrix,missing-method} |
|
25 |
+\alias{crossprod,ScaledMatrix,ANY-method} |
|
26 |
+\alias{crossprod,ANY,ScaledMatrix-method} |
|
27 |
+\alias{crossprod,ScaledMatrix,ScaledMatrix-method} |
|
28 |
+\alias{tcrossprod,ScaledMatrix,missing-method} |
|
29 |
+\alias{tcrossprod,ScaledMatrix,ANY-method} |
|
30 |
+\alias{tcrossprod,ANY,ScaledMatrix-method} |
|
31 |
+\alias{tcrossprod,ScaledMatrix,ScaledMatrix-method} |
|
32 |
+\title{The ScaledMatrix class} |
|
33 |
+\usage{ |
|
34 |
+ScaledMatrix(x, center = NULL, scale = NULL) |
|
35 |
+} |
|
36 |
+\arguments{ |
|
37 |
+\item{x}{A matrix or any matrix-like object (e.g., from the \pkg{Matrix} package). |
|
38 |
+ |
|
39 |
+This can alternatively be a ScaledMatrixSeed, in which case any values of \code{center} and \code{scale} are ignored.} |
|
40 |
+ |
|
41 |
+\item{center}{A numeric vector of length equal to \code{ncol(x)}, where each element is to be subtracted from the corresponding column of \code{x}. |
|
42 |
+A \code{NULL} value indicates that no subtraction is to be performed. |
|
43 |
+Alternatively \code{TRUE}, in which case it is set to the column means of \code{x}.} |
|
44 |
+ |
|
45 |
+\item{scale}{A numeric vector of length equal to \code{ncol(x)}, where each element is to divided from the corresponding column of \code{x} (after subtraction). |
|
46 |
+A \code{NULL} value indicates that no division is to be performed. |
|
47 |
+Alternatively \code{TRUE}, in which case it is set to the column-wise root-mean-squared differences from \code{center} |
|
48 |
+(interpretable as standard deviations if \code{center} is set to the column means, see \code{\link{scale}} for commentary).} |
|
49 |
+} |
|
50 |
+\value{ |
|
51 |
+The \code{ScaledMatrixSeed} constructor will return a ScaledMatrixSeed object. |
|
52 |
+ |
|
53 |
+The \code{ScaledMatrix} constructor will return a ScaledMatrix object equivalent to \code{t((t(x) - center)/scale)}. |
|
54 |
+} |
|
55 |
+\description{ |
|
56 |
+Defines the ScaledMatrixSeed and ScaledMatrix classes and their associated methods. |
|
57 |
+These classes support delayed centering and scaling of the columns in the same manner as \code{\link{scale}}, |
|
58 |
+but preserving the original data structure for more efficient operations like matrix multiplication. |
|
59 |
+} |
|
60 |
+\section{Methods for ScaledMatrixSeed objects}{ |
|
61 |
+ |
|
62 |
+ScaledMatrixSeed objects are implemented as \linkS4class{DelayedMatrix} backends. |
|
63 |
+They support standard operations like \code{dim}, \code{dimnames} and \code{extract_array}. |
|
64 |
+ |
|
65 |
+Passing a ScaledMatrixSeed object to the \code{\link{DelayedArray}} constructor will create a ScaledMatrix object. |
|
66 |
+ |
|
67 |
+It is possible for \code{x} to contain a ScaledMatrix, thus nesting one ScaledMatrix inside another. |
|
68 |
+This can occasionally be useful in combination with transposition to achieve centering/scaling in both dimensions. |
|
69 |
+} |
|
70 |
+ |
|
71 |
+\section{Methods for ScaledMatrix objects}{ |
|
72 |
+ |
|
73 |
+ScaledMatrix objects are derived from \linkS4class{DelayedMatrix} objects and support all of valid operations on the latter. |
|
74 |
+Several functions are specialized for greater efficiency when operating on ScaledMatrix instances, including: |
|
75 |
+\itemize{ |
|
76 |
+ \item Subsetting, transposition and replacement of row/column names. |
|
77 |
+ These will return a new ScaledMatrix rather than a DelayedMatrix. |
|
78 |
+ \item Matrix multiplication via \code{\%*\%}, \code{crossprod} and \code{tcrossprod}. |
|
79 |
+ These functions will return a DelayedMatrix. |
|
80 |
+ \item Calculation of row and column sums and means by \code{colSums}, \code{rowSums}, etc. |
|
81 |
+} |
|
82 |
+ |
|
83 |
+All other operations applied to a ScaledMatrix will use the underlying \pkg{DelayedArray} machinery. |
|
84 |
+Unary or binary operations will generally create a new DelayedMatrix instance containing a ScaledMatrixSeed. |
|
85 |
+ |
|
86 |
+Tranposition can effectively be used to allow centering/scaling on the rows if the input \code{x} is transposed. |
|
87 |
+} |
|
88 |
+ |
|
89 |
+\section{Efficiency vs precision}{ |
|
90 |
+ |
|
91 |
+The raison d'etre of the ScaledMatrix is that it can offer faster matrix multiplication by avoiding the \pkg{DelayedArray} block processing. |
|
92 |
+This is done by refactoring the scaling/centering operations to use the (hopefully more efficient) multiplication operator of the original matrix \code{x}. |
|
93 |
+Unfortunately, the speed-up comes at the cost of increasing the risk of catastrophic cancellation. |
|
94 |
+The procedure requires subtraction of one large intermediate number from another to obtain the values of the final matrix product. |
|
95 |
+This could result in a loss of numerical precision that compromises the accuracy of downstream algorithms. |
|
96 |
+In practice, this does not seem to be a major concern though one should be careful if the input \code{x} contains very large positive/negative values. |
|
97 |
+} |
|
98 |
+ |
|
99 |
+\examples{ |
|
100 |
+library(Matrix) |
|
101 |
+y <- ScaledMatrix(rsparsematrix(10, 20, 0.1), |
|
102 |
+ center=rnorm(20), scale=1+runif(20)) |
|
103 |
+y |
|
104 |
+ |
|
105 |
+crossprod(y) |
|
106 |
+tcrossprod(y) |
|
107 |
+y \%*\% rnorm(20) |
|
108 |
+ |
|
109 |
+} |
|
110 |
+\author{ |
|
111 |
+Aaron Lun |
|
112 |
+} |
0 | 4 |
new file mode 100644 |
... | ... |
@@ -0,0 +1,76 @@ |
1 |
+scale_and_center <- function(y, ref, code) { |
|
2 |
+ center <- scale <- NULL |
|
3 |
+ |
|
4 |
+ if (code==1L) { |
|
5 |
+ center <- colMeans(ref) |
|
6 |
+ scale <- runif(ncol(ref)) |
|
7 |
+ ref <- scale(ref, center=center, scale=scale) |
|
8 |
+ } else if (code==2L) { |
|
9 |
+ center <- rnorm(ncol(ref)) |
|
10 |
+ ref <- scale(ref, center=center, scale=FALSE) |
|
11 |
+ } else if (code==3L) { |
|
12 |
+ scale <- runif(ncol(ref)) |
|
13 |
+ ref <- scale(ref, center=FALSE, scale=scale) |
|
14 |
+ } |
|
15 |
+ |
|
16 |
+ # Getting rid of excess attributes. |
|
17 |
+ attr(ref, "scaled:center") <- NULL |
|
18 |
+ attr(ref, "scaled:scale") <- NULL |
|
19 |
+ |
|
20 |
+ def <- ScaledMatrix(y, center=center, scale=scale) |
|
21 |
+ list(def=def, ref=ref) |
|
22 |
+} |
|
23 |
+ |
|
24 |
+spawn_scenarios_basic <- function(NR, NC, CREATOR, REALIZER) { |
|
25 |
+ output <- vector("list", 8) |
|
26 |
+ counter <- 1L |
|
27 |
+ |
|
28 |
+ for (trans in c(FALSE, TRUE)) { |
|
29 |
+ for (it in 1:4) { |
|
30 |
+ if (trans) { |
|
31 |
+ # Ensure output matrix has NR rows and NC columns after t(). |
|
32 |
+ y <- CREATOR(NC, NR) |
|
33 |
+ } else { |
|
34 |
+ y <- CREATOR(NR, NC) |
|
35 |
+ } |
|
36 |
+ ref <- REALIZER(y) |
|
37 |
+ |
|
38 |
+ adjusted <- scale_and_center(y, ref, it) |
|
39 |
+ if (trans) { |
|
40 |
+ adjusted$def <- t(adjusted$def) |
|
41 |
+ adjusted$ref <- t(adjusted$ref) |
|
42 |
+ } |
|
43 |
+ |
|
44 |
+ output[[counter]] <- adjusted |
|
45 |
+ counter <- counter+1L |
|
46 |
+ } |
|
47 |
+ } |
|
48 |
+ output |
|
49 |
+} |
|
50 |
+ |
|
51 |
+spawn_scenarios <- function(NR=50, NC=20) { |
|
52 |
+ c( |
|
53 |
+ spawn_scenarios_basic(NR, NC, |
|
54 |
+ CREATOR=function(r, c) { |
|
55 |
+ matrix(rnorm(r*c), ncol=c) |
|
56 |
+ }, |
|
57 |
+ REALIZER=identity |
|
58 |
+ ), |
|
59 |
+ spawn_scenarios_basic(NR, NC, |
|
60 |
+ CREATOR=function(r, c) { |
|
61 |
+ Matrix::rsparsematrix(r, c, 0.1) |
|
62 |
+ }, |
|
63 |
+ REALIZER=as.matrix |
|
64 |
+ ) |
|
65 |
+ ) |
|
66 |
+} |
|
67 |
+ |
|
68 |
+expect_equal_product <- function(x, y) { |
|
69 |
+ expect_s4_class(x, "DelayedMatrix") |
|
70 |
+ X <- as.matrix(x) |
|
71 |
+ |
|
72 |
+ # standardize NULL dimnames. |
|
73 |
+ if (all(lengths(dimnames(X))==0L)) dimnames(X) <- NULL |
|
74 |
+ if (all(lengths(dimnames(y))==0L)) dimnames(y) <- NULL |
|
75 |
+ expect_equal(X, y) |
|
76 |
+} |
0 | 77 |
new file mode 100644 |
... | ... |
@@ -0,0 +1,101 @@ |
1 |
+# Tests the ScaledMatrix implementation. |
|
2 |
+# library(testthat); library(ScaledMatrix); source("setup.R"); source("test-class.R") |
|
3 |
+ |
|
4 |
+set.seed(100001) |
|
5 |
+test_that("ScaledMatrix utility functions work as expected", { |
|
6 |
+ possibles <- spawn_scenarios() |
|
7 |
+ for (test in possibles) { |
|
8 |
+ expect_s4_class(test$def, "ScaledMatrix") |
|
9 |
+ expect_identical(test$def, ScaledMatrix(DelayedArray::seed(test$def))) |
|
10 |
+ |
|
11 |
+ expect_identical(dim(test$def), dim(test$ref)) |
|
12 |
+ expect_identical(extract_array(test$def, list(1:10, 1:10)), test$ref[1:10, 1:10]) |
|
13 |
+ expect_identical(extract_array(test$def, list(1:10, NULL)), test$ref[1:10,]) |
|
14 |
+ expect_identical(extract_array(test$def, list(NULL, 1:10)), test$ref[,1:10]) |
|
15 |
+ expect_identical(as.matrix(test$def), test$ref) |
|
16 |
+ |
|
17 |
+ expect_equal(rowSums(test$def), rowSums(test$ref)) |
|
18 |
+ expect_equal(colSums(test$def), colSums(test$ref)) |
|
19 |
+ expect_equal(rowMeans(test$def), rowMeans(test$ref)) |
|
20 |
+ expect_equal(colMeans(test$def), colMeans(test$ref)) |
|
21 |
+ |
|
22 |
+ tdef <- t(test$def) |
|
23 |
+ expect_s4_class(tdef, "ScaledMatrix") # still a DefMat! |
|
24 |
+ expect_identical(t(tdef), test$def) |
|
25 |
+ expect_identical(as.matrix(tdef), t(test$ref)) |
|
26 |
+ |
|
27 |
+ # Checking column names getting and setting. |
|
28 |
+ spawn_names <- sprintf("THING_%i", seq_len(ncol(test$def))) |
|
29 |
+ colnames(test$def) <- spawn_names |
|
30 |
+ expect_identical(spawn_names, colnames(test$def)) |
|
31 |
+ expect_s4_class(test$def, "ScaledMatrix") # still a DefMat! |
|
32 |
+ } |
|
33 |
+}) |
|
34 |
+ |
|
35 |
+set.seed(10000101) |
|
36 |
+test_that("ScaledMatrix silly inputs work as expected", { |
|
37 |
+ default <- ScaledMatrix() |
|
38 |
+ expect_identical(dim(default), c(0L, 0L)) |
|
39 |
+ val <- as.matrix(default) |
|
40 |
+ dimnames(val) <- NULL |
|
41 |
+ expect_identical(val, matrix(0,0,0)) |
|
42 |
+ |
|
43 |
+ # Checking erronious inputs. |
|
44 |
+ y <- matrix(rnorm(400), ncol=20) |
|
45 |
+ expect_error(ScaledMatrix(y, center=1), "length of 'center' must equal") |
|
46 |
+ expect_error(ScaledMatrix(y, scale=1), "length of 'scale' must equal") |
|
47 |
+}) |
|
48 |
+ |
|
49 |
+set.seed(1000011) |
|
50 |
+test_that("ScaledMatrix subsetting works as expected", { |
|
51 |
+ expect_identical_and_defmat <- function(x, y) { |
|
52 |
+ expect_s4_class(x, "ScaledMatrix") # class is correctly preserved by direct seed modification. |
|
53 |
+ expect_identical(as.matrix(x), y) |
|
54 |
+ } |
|
55 |
+ |
|
56 |
+ possibles <- spawn_scenarios() |
|
57 |
+ for (test in possibles) { |
|
58 |
+ i <- sample(nrow(test$def)) |
|
59 |
+ j <- sample(ncol(test$def)) |
|
60 |
+ expect_identical_and_defmat(test$def[i,], test$ref[i,]) |
|
61 |
+ expect_identical_and_defmat(test$def[,j], test$ref[,j]) |
|
62 |
+ expect_identical_and_defmat(test$def[i,j], test$ref[i,j]) |
|
63 |
+ |
|
64 |
+ # Works with zero dimensions. |
|
65 |
+ expect_identical_and_defmat(test$def[0,], test$ref[0,]) |
|
66 |
+ expect_identical_and_defmat(test$def[,0], test$ref[,0]) |
|
67 |
+ expect_identical_and_defmat(test$def[0,0], test$ref[0,0]) |
|
68 |
+ |
|
69 |
+ # Dimension dropping works as expected. |
|
70 |
+ expect_identical(test$def[i[1],], test$ref[i[1],]) |
|
71 |
+ expect_identical(test$def[,j[1]], test$ref[,j[1]]) |
|
72 |
+ expect_identical_and_defmat(test$def[i[1],drop=FALSE], test$ref[i[1],,drop=FALSE]) |
|
73 |
+ expect_identical_and_defmat(test$def[,j[1],drop=FALSE], test$ref[,j[1],drop=FALSE]) |
|
74 |
+ |
|
75 |
+ # Transposition with subsetting works as expected. |
|
76 |
+ alt <- t(test$def) |
|
77 |
+ expect_identical(t(alt[,i]), test$def[i,]) |
|
78 |
+ expect_identical(t(alt[j,]), test$def[,j]) |
|
79 |
+ |
|
80 |
+ # Subsetting behaves with column names. |
|
81 |
+ spawn_names <- sprintf("THING_%i", seq_len(ncol(test$def))) |
|
82 |
+ colnames(test$def) <- spawn_names |
|
83 |
+ colnames(test$ref) <- spawn_names |
|
84 |
+ ch <- sample(spawn_names) |
|
85 |
+ expect_identical_and_defmat(test$def[,ch], test$ref[,ch]) |
|
86 |
+ } |
|
87 |
+}) |
|
88 |
+ |
|
89 |
+test_that("DelayedMatrix wrapping works", { |
|
90 |
+ possibles <- spawn_scenarios(80, 50) |
|
91 |
+ for (test in possibles) { |
|
92 |
+ expect_equal_product(test$def+1, test$ref+1) |
|
93 |
+ |
|
94 |
+ v <- rnorm(nrow(test$def)) |
|
95 |
+ expect_equal_product(test$def+v, test$ref+v) |
|
96 |
+ expect_equal_product(test$def*v, test$ref*v) |
|
97 |
+ |
|
98 |
+ w <- rnorm(ncol(test$def)) |
|
99 |
+ expect_equal_product(sweep(test$def, 2, w, "*"), sweep(test$ref, 2, w, "*")) |
|
100 |
+ } |
|
101 |
+}) |
0 | 102 |
new file mode 100644 |
... | ... |
@@ -0,0 +1,381 @@ |
1 |
+# Tests the ScaledMatrix implementation. |
|
2 |
+# library(testthat); library(ScaledMatrix); source("setup.R"); source("test-mult.R") |
|
3 |
+ |
|
4 |
+########################## |
|
5 |
+# Defining a class that can't do anything but get multiplied. |
|
6 |
+# This checks that there isn't any hidden DelayedArray realization |
|
7 |
+# happening, which would give the same results but slower. |
|
8 |
+ |
|
9 |
+setClass("CrippledMatrix", slots=c(x="matrix")) |
|
10 |
+ |
|
11 |
+setMethod("dim", c("CrippledMatrix"), function(x) dim(x@x)) |
|
12 |
+ |
|
13 |
+setMethod("colSums", c("CrippledMatrix"), function(x) colSums(x@x)) |
|
14 |
+ |
|
15 |
+setMethod("rowSums", c("CrippledMatrix"), function(x) rowSums(x@x)) |
|
16 |
+ |
|
17 |
+setMethod("sweep", c("CrippledMatrix"), function (x, MARGIN, STATS, FUN = "-", check.margin = TRUE, ...) { |
|
18 |
+ sweep(x@x, MARGIN, STATS, FUN, check.margin, ...) |
|
19 |
+}) |
|
20 |
+ |
|
21 |
+setMethod("%*%", c("CrippledMatrix", "ANY"), function(x, y) x@x %*% y) |
|
22 |
+ |
|
23 |
+setMethod("%*%", c("ANY", "CrippledMatrix"), function(x, y) x %*% y@x) |
|
24 |
+ |
|
25 |
+setMethod("crossprod", c("CrippledMatrix", "missing"), function(x, y) crossprod(x@x)) |
|
26 |
+ |
|
27 |
+setMethod("crossprod", c("CrippledMatrix", "ANY"), function(x, y) crossprod(x@x, y)) |
|
28 |
+ |
|
29 |
+setMethod("crossprod", c("ANY", "CrippledMatrix"), function(x, y) crossprod(x, y@x)) |
|
30 |
+ |
|
31 |
+setMethod("tcrossprod", c("CrippledMatrix", "missing"), function(x, y) tcrossprod(x@x)) |
|
32 |
+ |
|
33 |
+setMethod("tcrossprod", c("CrippledMatrix", "ANY"), function(x, y) tcrossprod(x@x, y)) |
|
34 |
+ |
|
35 |
+setMethod("tcrossprod", c("ANY", "CrippledMatrix"), function(x, y) tcrossprod(x, y@x)) |
|
36 |
+ |
|
37 |
+spawn_extra_scenarios <- function(NR=50, NC=20) { |
|
38 |
+ c( |
|
39 |
+ spawn_scenarios(NR, NC), |
|
40 |
+ spawn_scenarios_basic(NR, NC, |
|
41 |
+ CREATOR=function(r, c) { |
|
42 |
+ new("CrippledMatrix", x=matrix(runif(NR*NC), ncol=NC)) |
|
43 |
+ }, |
|
44 |
+ REALIZER=function(x) x@x |
|
45 |
+ ) |
|
46 |
+ ) |
|
47 |
+} |
|
48 |
+ |
|
49 |
+########################## |
|
50 |
+ |
|
51 |
+test_that("ScaledMatrix right multiplication works as expected", { |
|
52 |
+ possibles <- spawn_extra_scenarios(100, 50) |
|
53 |
+ for (test in possibles) { |
|
54 |
+ ref.y <- test$ref |
|
55 |
+ bs.y <- test$def |
|
56 |
+ |
|
57 |
+ # Multiply by a vector. |
|
58 |
+ z <- rnorm(ncol(ref.y)) |
|
59 |
+ expect_equal_product(bs.y %*% z, ref.y %*% z) |
|
60 |
+ |
|
61 |
+ # Multiply by a matrix. |
|
62 |
+ z <- matrix(rnorm(ncol(ref.y)*10), ncol=10) |
|
63 |
+ expect_equal_product(bs.y %*% z, ref.y %*% z) |
|
64 |
+ |
|
65 |
+ # Multiply by an empty matrix. |
|
66 |
+ z <- matrix(0, ncol=0, nrow=ncol(ref.y)) |
|
67 |
+ expect_equal_product(bs.y %*% z, ref.y %*% z) |
|
68 |
+ } |
|
69 |
+}) |
|
70 |
+ |
|
71 |
+test_that("ScaledMatrix left multiplication works as expected", { |
|
72 |
+ possibles <- spawn_extra_scenarios(50, 80) |
|
73 |
+ for (test in possibles) { |
|
74 |
+ ref.y <- test$ref |
|
75 |
+ bs.y <- test$def |
|
76 |
+ |
|
77 |
+ # Multiply by a vector. |
|
78 |
+ z <- rnorm(nrow(ref.y)) |
|
79 |
+ expect_equal_product(z %*% bs.y, z %*% ref.y) |
|
80 |
+ |
|
81 |
+ # Multiply by a matrix. |
|
82 |
+ z <- matrix(rnorm(nrow(ref.y)*10), nrow=10) |
|
83 |
+ expect_equal_product(z %*% bs.y, z %*% ref.y) |
|
84 |
+ |
|
85 |
+ # Multiply by an empty matrix. |
|
86 |
+ z <- matrix(0, nrow=0, ncol=nrow(ref.y)) |
|
87 |
+ expect_equal_product(z %*% bs.y, z %*% ref.y) |
|
88 |
+ } |
|
89 |
+}) |
|
90 |
+ |
|
91 |
+test_that("ScaledMatrix dual multiplication works as expected", { |
|
92 |
+ # Not using the CrippledMatrix here; some scaling of the inner matrix is unavoidable |
|
93 |
+ # when the inner matrix is _not_ a ScaledMatrix but is being multiplied by one. |
|
94 |
+ possibles1 <- spawn_scenarios(10, 20) |
|
95 |
+ for (test1 in possibles1) { |
|
96 |
+ possibles2 <- spawn_scenarios(20, 15) |
|
97 |
+ for (test2 in possibles2) { |
|
98 |
+ |
|
99 |
+ expect_equal_product(test1$def %*% test2$def, test1$ref %*% test2$ref) |
|
100 |
+ |
|
101 |
+ # Checking that zero-dimension behaviour is as expected. |
|
102 |
+ expect_equal_product(test1$def[0,] %*% test2$def, test1$ref[0,] %*% test2$ref) |
|
103 |
+ expect_equal_product(test1$def %*% test2$def[,0], test1$ref %*% test2$ref[,0]) |
|
104 |
+ expect_equal_product(test1$def[,0] %*% test2$def[0,], test1$ref[,0] %*% test2$ref[0,]) |
|
105 |
+ expect_equal_product(test1$def[0,] %*% test2$def[,0], test1$ref[0,] %*% test2$ref[,0]) |
|
106 |
+ } |
|
107 |
+ } |
|
108 |
+}) |
|
109 |
+ |
|
110 |
+########################## |
|
111 |
+ |
|
112 |
+test_that("ScaledMatrix lonely crossproduct works as expected", { |
|
113 |
+ possibles <- spawn_extra_scenarios(90, 30) |
|
114 |
+ for (test in possibles) { |
|
115 |
+ ref.y <- test$ref |
|
116 |
+ bs.y <- test$def |
|
117 |
+ expect_equal_product(crossprod(bs.y), crossprod(ref.y)) |
|
118 |
+ } |
|
119 |
+}) |
|
120 |
+ |
|
121 |
+test_that("ScaledMatrix crossproduct from right works as expected", { |
|
122 |
+ possibles <- spawn_extra_scenarios(60, 50) |
|
123 |
+ for (test in possibles) { |
|
124 |
+ ref.y <- test$ref |
|
125 |
+ bs.y <- test$def |
|
126 |
+ |
|
127 |
+ # Multiply by a vector. |
|
128 |
+ z <- rnorm(nrow(ref.y)) |
|
129 |
+ expect_equal_product(crossprod(bs.y, z), crossprod(ref.y, z)) |
|
130 |
+ |
|
131 |
+ # Multiply by a matrix. |
|
132 |
+ z <- matrix(rnorm(nrow(ref.y)*10), ncol=10) |
|
133 |
+ expect_equal_product(crossprod(bs.y, z), crossprod(ref.y, z)) |
|
134 |
+ |
|
135 |
+ # Multiply by an empty matrix. |
|
136 |
+ z <- matrix(0, ncol=0, nrow=nrow(ref.y)) |
|
137 |
+ expect_equal_product(crossprod(bs.y, z), crossprod(ref.y, z)) |
|
138 |
+ } |
|
139 |
+}) |
|
140 |
+ |
|
141 |
+test_that("ScaledMatrix crossproduct from left works as expected", { |
|
142 |
+ possibles <- spawn_extra_scenarios(40, 100) |
|
143 |
+ for (test in possibles) { |
|
144 |
+ ref.y <- test$ref |
|
145 |
+ bs.y <- test$def |
|
146 |
+ |
|
147 |
+ # Multiply by a vector. |
|
148 |
+ z <- rnorm(nrow(ref.y)) |
|
149 |
+ expect_equal_product(crossprod(z, bs.y), crossprod(z, ref.y)) |
|
150 |
+ |
|
151 |
+ # Multiply by a matrix. |
|
152 |
+ z <- matrix(rnorm(nrow(ref.y)*10), ncol=10) |
|
153 |
+ expect_equal_product(crossprod(z, bs.y), crossprod(z, ref.y)) |
|
154 |
+ |
|
155 |
+ # Multiply by an empty matrix. |
|
156 |
+ z <- matrix(0, ncol=0, nrow=nrow(ref.y)) |
|
157 |
+ expect_equal_product(crossprod(z, bs.y), crossprod(z, ref.y)) |
|
158 |
+ } |
|
159 |
+}) |
|
160 |
+ |
|
161 |
+test_that("ScaledMatrix dual crossprod works as expected", { |
|
162 |
+ possibles1 <- spawn_scenarios(20, 50) |
|
163 |
+ for (test1 in possibles1) { |
|
164 |
+ possibles2 <- spawn_scenarios(20, 15) |
|
165 |
+ for (test2 in possibles2) { |
|
166 |
+ |
|
167 |
+ expect_equal_product(crossprod(test1$def, test2$def), crossprod(test1$ref, test2$ref)) |
|
168 |
+ |
|
169 |
+ # Checking that zero-dimension behaviour is as expected. |
|
170 |
+ expect_equal_product(crossprod(test1$def[,0], test2$def), crossprod(test1$ref[,0], test2$ref)) |
|
171 |
+ expect_equal_product(crossprod(test1$def, test2$def[,0]), crossprod(test1$ref, test2$ref[,0])) |
|
172 |
+ expect_equal_product(crossprod(test1$def[0,], test2$def[0,]), crossprod(test1$ref[0,], test2$ref[0,])) |
|
173 |
+ } |
|
174 |
+ } |
|
175 |
+}) |
|
176 |
+ |
|
177 |
+########################## |
|
178 |
+ |
|
179 |
+test_that("ScaledMatrix lonely tcrossproduct works as expected", { |
|
180 |
+ possibles <- spawn_extra_scenarios(50, 80) |
|
181 |
+ for (test in possibles) { |
|
182 |
+ ref.y <- test$ref |
|
183 |
+ bs.y <- test$def |
|
184 |
+ expect_equal_product(tcrossprod(bs.y), tcrossprod(ref.y)) |
|
185 |
+ } |
|
186 |
+}) |
|
187 |
+ |
|
188 |
+test_that("ScaledMatrix tcrossproduct from right works as expected", { |
|
189 |
+ possibles <- spawn_extra_scenarios(60, 70) |
|
190 |
+ for (test in possibles) { |
|
191 |
+ ref.y <- test$ref |
|
192 |
+ bs.y <- test$def |
|
193 |
+ |
|
194 |
+ # Multiply by a vector (this doesn't work). |
|
195 |
+ z <- rnorm(ncol(ref.y)) |
|
196 |
+ expect_error(tcrossprod(bs.y, z), "non-conformable") |
|
197 |
+ expect_error(tcrossprod(ref.y, z), "non-conformable") |
|
198 |
+ |
|
199 |
+ # Multiply by a matrix. |
|
200 |
+ z <- matrix(rnorm(ncol(ref.y)*10), nrow=10) |
|
201 |
+ expect_equal_product(tcrossprod(bs.y, z), tcrossprod(ref.y, z)) |
|
202 |
+ |
|
203 |
+ # Multiply by an empty matrix. |
|
204 |
+ z <- matrix(0, nrow=0, ncol=ncol(ref.y)) |
|
205 |
+ expect_equal_product(tcrossprod(bs.y, z), tcrossprod(ref.y, z)) |
|
206 |
+ } |
|
207 |
+}) |
|
208 |
+ |
|
209 |
+test_that("ScaledMatrix tcrossproduct from left works as expected", { |
|
210 |
+ possibles <- spawn_extra_scenarios(80, 50) |
|
211 |
+ for (test in possibles) { |
|
212 |
+ ref.y <- test$ref |
|
213 |
+ bs.y <- test$def |
|
214 |
+ |
|
215 |
+ # Multiply by a vector. |
|
216 |
+ z <- rnorm(ncol(ref.y)) |
|
217 |
+ expect_equal_product(tcrossprod(z, bs.y), tcrossprod(z, ref.y)) |
|
218 |
+ |
|
219 |
+ # Multiply by a matrix. |
|
220 |
+ z <- matrix(rnorm(ncol(ref.y)*10), nrow=10) |
|
221 |
+ expect_equal_product(tcrossprod(z, bs.y), tcrossprod(z, ref.y)) |
|
222 |
+ |
|
223 |
+ # Multiply by an empty matrix. |
|
224 |
+ z <- matrix(0, nrow=0, ncol=ncol(ref.y)) |
|
225 |
+ expect_equal_product(tcrossprod(z, bs.y), tcrossprod(z, ref.y)) |
|
226 |
+ } |
|
227 |
+}) |
|
228 |
+ |
|
229 |
+test_that("ScaledMatrix dual tcrossprod works as expected", { |
|
230 |
+ possibles1 <- spawn_scenarios(20, 50) |
|
231 |
+ for (test1 in possibles1) { |
|
232 |
+ possibles2 <- spawn_scenarios(25, 50) |
|
233 |
+ for (test2 in possibles2) { |
|
234 |
+ |
|
235 |
+ expect_equal_product(tcrossprod(test1$def, test2$def), tcrossprod(test1$ref, test2$ref)) |
|
236 |
+ |
|
237 |
+ # Checking that zero-dimension behaviour is as expected. |
|
238 |
+ expect_equal_product(tcrossprod(test1$def[0,], test2$def), tcrossprod(test1$ref[0,], test2$ref)) |
|
239 |
+ expect_equal_product(tcrossprod(test1$def, test2$def[0,]), tcrossprod(test1$ref, test2$ref[0,])) |
|
240 |
+ expect_equal_product(tcrossprod(test1$def[,0], test2$def[,0]), tcrossprod(test1$ref[,0], test2$ref[,0])) |
|
241 |
+ } |
|
242 |
+ } |
|
243 |
+}) |
|
244 |
+ |
|
245 |
+########################## |
|
246 |
+ |
|
247 |
+wrap_in_ScMat <- function(input, reference) |
|
248 |
+# Wrapping an input matrix in a ScaledMatrix. |
|
249 |
+{ |
|
250 |
+ output <- vector("list", 8) |
|
251 |
+ counter <- 1L |
|
252 |
+ |
|
253 |
+ for (trans in c(FALSE, TRUE)) { |
|
254 |
+ for (it in 1:4) { |
|
255 |
+ if (trans) { |
|
256 |
+ y <- t(input) |
|
257 |
+ ref <- t(reference) |
|
258 |
+ } else { |
|
259 |
+ ref <- reference |
|
260 |
+ y <- input |
|
261 |
+ } |
|
262 |
+ |
|
263 |
+ adjusted <- scale_and_center(y, ref, it) |
|
264 |
+ if (trans) { |
|
265 |
+ adjusted$def <- t(adjusted$def) |
|
266 |
+ adjusted$ref <- t(adjusted$ref) |
|
267 |
+ } |
|
268 |
+ |
|
269 |
+ output[[counter]] <- adjusted |
|
270 |
+ counter <- counter+1L |
|
271 |
+ } |
|
272 |
+ } |
|
273 |
+ output |
|
274 |
+} |
|
275 |
+ |
|
276 |
+test_that("nested ScaledMatrix works as expected", { |
|
277 |
+ basic <- matrix(rnorm(400), ncol=20) |
|
278 |
+ |
|
279 |
+ available <- list(list(def=basic, ref=basic)) |
|
280 |
+ for (nesting in 1:2) { |
|
281 |
+ # Creating nested ScMats with and without scaling/centering/transposition. |
|
282 |
+ next_available <- vector("list", length(available)) |
|
283 |
+ for (i in seq_along(available)) { |
|
284 |
+ current <- available[[i]] |
|
285 |
+ next_available[[i]] <- wrap_in_ScMat(current$def, current$ref) |
|
286 |
+ } |
|
287 |
+ |
|
288 |
+ # Testing each one of the newly created ScMats. |
|
289 |
+ available <- unlist(next_available, recursive=FALSE) |
|
290 |
+ for (i in seq_along(available)) { |
|
291 |
+ test <- available[[i]] |
|
292 |
+ |
|
293 |
+ # Coercion works. |
|
294 |
+ expect_equal(as.matrix(test$def), test$ref) |
|
295 |
+ |
|
296 |
+ # Basic stats work. |
|
297 |
+ expect_equal(rowSums(test$ref), rowSums(test$def)) |
|
298 |
+ expect_equal(colSums(test$ref), colSums(test$def)) |
|
299 |
+ |
|
300 |
+ # Multiplication works. |
|
301 |
+ y <- matrix(rnorm(20*2), ncol=2) |
|
302 |
+ expect_equal_product(test$def %*% y, test$ref %*% y) |
|
303 |
+ expect_equal_product(t(y) %*% test$def, t(y) %*% test$ref) |
|
304 |
+ |
|
305 |
+ # Cross product. |
|
306 |
+ y <- matrix(rnorm(20*2), ncol=2) |
|
307 |
+ expect_equal_product(crossprod(test$def), crossprod(test$ref)) |
|
308 |
+ expect_equal_product(crossprod(test$def, y), crossprod(test$ref, y)) |
|
309 |
+ expect_equal_product(crossprod(y, test$def), crossprod(y, test$ref)) |
|
310 |
+ |
|
311 |
+ # Transposed cross product. |
|
312 |
+ y <- matrix(rnorm(20*2), nrow=2) |
|
313 |
+ expect_equal_product(tcrossprod(test$def), tcrossprod(test$ref)) |
|
314 |
+ expect_equal_product(tcrossprod(test$def, y), tcrossprod(test$ref, y)) |
|
315 |
+ expect_equal_product(tcrossprod(y, test$def), tcrossprod(y, test$ref)) |
|
316 |
+ } |
|
317 |
+ } |
|
318 |
+}) |
|
319 |
+ |
|
320 |
+set.seed(1200001) |
|
321 |
+test_that("deep testing of tcrossproduct internals: special mult", { |
|
322 |
+ NR <- 20 |
|
323 |
+ NC <- 10 |
|
324 |
+ basic <- matrix(rnorm(NC*NR), ncol=NC) |
|
325 |
+ c <- runif(NC) |
|
326 |
+ s <- runif(NR) |
|
327 |
+ |
|
328 |
+ ref <- t(matrix(c, NR, NC, byrow=TRUE)) %*% (basic/s^2) |
|
329 |
+ out <- BiocSingular:::.internal_mult_special(c, s, basic) |
|
330 |
+ expect_equal(ref, out) |
|
331 |
+ |
|
332 |
+ available <- list(list(def=basic, ref=basic)) |
|
333 |
+ for (nesting in 1:2) { |
|
334 |
+ # Creating nested ScMats with and without scaling/centering/transposition. |
|
335 |
+ next_available <- vector("list", length(available)) |
|
336 |
+ for (i in seq_along(available)) { |
|
337 |
+ current <- available[[i]] |
|
338 |
+ next_available[[i]] <- wrap_in_ScMat(current$def, current$ref) |
|
339 |
+ } |
|
340 |
+ |
|
341 |
+ # Testing each one of the newly created nested ScMats. |
|
342 |
+ available <- unlist(next_available, recursive=FALSE) |
|
343 |
+ for (i in seq_along(available)) { |
|
344 |
+ test <- available[[i]] |
|
345 |
+ ref <- t(matrix(c, NR, NC, byrow=TRUE)) %*% (test$ref/s^2) |
|
346 |
+ out <- BiocSingular:::.internal_mult_special(c, s, test$def) |
|
347 |
+ expect_equal(ref, out) |
|
348 |
+ } |
|
349 |
+ } |
|
350 |
+}) |
|
351 |
+ |
|
352 |
+set.seed(1200002) |
|
353 |
+test_that("deep testing of tcrossproduct internals: scaled tcrossprod", { |
|
354 |
+ NC <- 30 |
|
355 |
+ NR <- 15 |
|
356 |
+ s <- runif(NC) |
|
357 |
+ basic <- matrix(rnorm(NC*NR), ncol=NC) |
|
358 |
+ |
|
359 |
+ ref <- crossprod(t(basic)/s) |
|
360 |
+ out <- BiocSingular:::.internal_tcrossprod(basic, s) |
|
361 |
+ expect_equal(ref, out) |
|
362 |
+ |
|
363 |
+ available <- list(list(def=basic, ref=basic)) |
|
364 |
+ for (nesting in 1:2) { |
|
365 |
+ # Creating nested ScMats with and without scaling/centering/transposition. |
|
366 |
+ next_available <- vector("list", length(available)) |
|
367 |
+ for (i in seq_along(available)) { |
|
368 |
+ current <- available[[i]] |
|
369 |
+ next_available[[i]] <- wrap_in_ScMat(current$def, current$ref) |
|
370 |
+ } |
|
371 |
+ |
|
372 |
+ # Testing each one of the newly created nested ScMats. |
|
373 |
+ available <- unlist(next_available, recursive=FALSE) |
|
374 |
+ for (i in seq_along(available)) { |
|
375 |
+ test <- available[[i]] |
|
376 |
+ ref <- crossprod(t(test$ref)/s) |
|
377 |
+ out <- BiocSingular:::.internal_tcrossprod(test$def, s) |
|
378 |
+ expect_equal(ref, out) |
|
379 |
+ } |
|
380 |
+ } |
|
381 |
+}) |
0 | 382 |
new file mode 100644 |
... | ... |
@@ -0,0 +1,25 @@ |
1 |
+# This tests the auto-scaling in the constructor. |
|
2 |
+# library(testthat); library(ScaledMatrix); source("test-scale.R") |
|
3 |
+ |
|
4 |
+stripscale <- function(mat, ...) { |
|
5 |
+ out <- scale(mat, ...) |
|
6 |
+ attr(out, "scaled:center") <- NULL |
|
7 |
+ attr(out, "scaled:scale") <- NULL |
|
8 |
+ out |
|
9 |
+} |
|
10 |
+ |
|
11 |
+test_that("ScaledMatrix mimics scale()", { |
|
12 |
+ mat <- matrix(rnorm(10000), ncol=10) |
|
13 |
+ |
|
14 |
+ out <- ScaledMatrix(mat, center=TRUE) |
|
15 |
+ expect_identical(as.matrix(out), stripscale(mat, scale=FALSE)) |
|
16 |
+ |
|
17 |
+ out <- ScaledMatrix(mat, scale=TRUE) |
|
18 |
+ expect_identical(as.matrix(out), stripscale(mat, center=FALSE)) |
|
19 |
+ |
|
20 |
+ out <- ScaledMatrix(mat, center=TRUE, scale=TRUE) |
|
21 |
+ expect_identical(as.matrix(out), stripscale(mat)) |
|
22 |
+ |
|
23 |
+ out <- ScaledMatrix(mat) |
|
24 |
+ expect_identical(as.matrix(out), mat) |
|
25 |
+}) |
0 | 26 |
new file mode 100644 |
... | ... |
@@ -0,0 +1,128 @@ |
1 |
+--- |
|
2 |
+title: Using the `ScaledMatrix` class |
|
3 |
+author: |
|
4 |
+- name: Aaron Lun |
|
5 |
+ email: [email protected] |
|
6 |
+date: "Revised: 12 December 2020" |
|
7 |
+output: |
|
8 |
+ BiocStyle::html_document: |
|
9 |
+ toc_float: true |
|
10 |
+package: ResidualMatrix |
|
11 |
+vignette: > |
|
12 |
+ %\VignetteIndexEntry{Using the ScaledMatrix} |
|
13 |
+ %\VignetteEngine{knitr::rmarkdown} |
|
14 |
+ %\VignetteEncoding{UTF-8} |
|
15 |
+--- |
|
16 |
+ |
|
17 |
+```{r, echo=FALSE, results="hide", message=FALSE} |
|
18 |
+knitr::opts_chunk$set(error=FALSE, message=FALSE, warning=FALSE) |
|
19 |
+``` |
|
20 |
+ |
|
21 |
+# Overview |
|
22 |
+ |
|
23 |
+The `ScaledMatrix` provides yet another method of running `scale()` on a matrix. |
|
24 |
+In other words, these three operations are equivalent: |
|
25 |
+ |
|
26 |
+```{r} |
|
27 |
+mat <- matrix(rnorm(10000), ncol=10) |
|
28 |
+ |
|
29 |
+smat1 <- scale(mat) |
|
30 |
+head(smat1) |
|
31 |
+ |
|
32 |
+library(DelayedArray) |
|
33 |
+smat2 <- scale(DelayedArray(mat)) |
|
34 |
+head(smat2) |
|
35 |
+ |
|
36 |
+library(ScaledMatrix) |
|
37 |
+smat3 <- ScaledMatrix(mat, center=TRUE, scale=TRUE) |
|
38 |
+head(smat3) |
|
39 |
+``` |
|
40 |
+ |
|
41 |
+The biggest difference lies in how they behave in downstream matrix operations. |
|
42 |
+ |
|
43 |
+- `smat1` is an ordinary matrix, with the scaled and centered values fully realized in memory. |
|
44 |
+Nothing too unusual here. |
|
45 |
+- `smat2` is a `DelayedMatrix` and undergoes block processing whereby chunks are realized and operated on, one at a time. |
|
46 |
+This sacrifices speed for greater memory efficiency by avoiding a copy of the entire matrix. |
|
47 |
+In particular, it preserves the structure of the original `mat`, e.g., from a sparse or file-backed representation. |
|
48 |
+- `smat3` is a `ScaledMatrix` that refactors certain operations so that they can be applied to the original `mat` without any scaling or centering. |
|
49 |
+This takes advantage of the original data structure to speed up matrix multiplication and row/column sums, |
|
50 |
+albeit at the cost of numerical precision. |
|
51 |
+ |
|
52 |
+# Matrix multiplication |
|
53 |
+ |
|
54 |
+Given an original matrix $\mathbf{X}$ with $n$ columns, a vector of column centers $\mathbf{c}$ and a vector of column scaling values $\mathbf{s}$, |
|
55 |
+our scaled matrix can be written as: |
|
56 |
+ |
|
57 |
+$$ |
|
58 |
+\mathbf{Y} = (\mathbf{X} - \mathbf{c} \cdot \mathbf{1}_n^T) \mathbf{S} |
|
59 |
+$$ |
|
60 |
+ |
|
61 |
+where $\mathbf{S} = \text{diag}(s_1^{-1}, ..., s_n^{-1})$. |
|
62 |
+If we wanted to right-multiply it with another matrix $\mathbf{A}$, we would have: |
|
63 |
+ |
|
64 |
+$$ |
|
65 |
+\mathbf{YA} = \mathbf{X}\mathbf{S}\mathbf{A} - \mathbf{c} \cdot \mathbf{1}_n^T \mathbf{S}\mathbf{A} |
|
66 |
+$$ |
|
67 |
+ |
|
68 |
+The right-most expression is simply the outer product of $\mathbf{c}$ with the column sums of $\mathbf{SA}$. |
|
69 |
+More important is the fact that we can use the matrix multiplication operator for $\mathbf{X}$ with $\mathbf{SA}$, |
|
70 |
+as this allows us to use highly efficient algorithms for certain data representations, e.g., sparse matrices. |
|
71 |
+ |
|
72 |
+```{r} |
|
73 |
+library(Matrix) |
|
74 |
+mat <- rsparsematrix(20000, 10000, density=0.01) |
|
75 |
+smat <- ScaledMatrix(mat, center=TRUE, scale=TRUE) |
|
76 |
+ |
|
77 |
+blob <- matrix(runif(ncol(mat) * 5), ncol=5) |
|
78 |
+system.time(out <- smat %*% blob) |
|
79 |
+ |
|
80 |
+# The slower way with block processing. |
|
81 |
+da <- scale(DelayedArray(mat)) |
|
82 |
+system.time(out2 <- da %*% blob) |
|
83 |
+``` |
|
84 |
+ |
|
85 |
+The same logic applies for left-multiplication and cross-products. |
|
86 |
+This allows us to easily speed up high-level operations involving matrix multiplication by just switching to a `ScaledMatrix`, |
|
87 |
+e.g., in approximate PCA algorithms from the `r Biocpkg("BiocSingular")` package. |
|
88 |
+ |
|
89 |
+# Other utilities |
|
90 |
+ |
|
91 |
+Row and column sums are special cases of matrix multiplication and can be computed quickly: |
|
92 |
+ |
|
93 |
+```{r} |
|
94 |
+system.time(rowSums(smat)) |
|
95 |
+system.time(rowSums(da)) |
|
96 |
+``` |
|
97 |
+ |
|
98 |
+Subsetting, transposition and renaming of the dimensions are all supported without loss of the `ScaledMatrix` representation: |
|
99 |
+ |
|
100 |
+```{r} |
|
101 |
+smat[,1:5] |
|
102 |
+t(smat) |
|
103 |
+rownames(smat) <- paste0("GENE_", 1:20000) |
|
104 |
+smat |
|
105 |
+``` |
|
106 |
+ |
|
107 |
+Other operations will cause the `ScaledMatrix` to collapse to the general `DelayedMatrix` representation, after which point block processing will be used. |
|
108 |
+ |
|
109 |
+```{r} |
|
110 |
+smat + 1 |
|
111 |
+``` |
|
112 |
+ |
|
113 |
+# Caveats |
|
114 |
+ |
|
115 |
+For most part, the implementation of the multiplication assumes that the $\mathbf{A}$ matrix and the matrix product are small compared to $\mathbf{X}$. |
|
116 |
+It is also possible to multiply two `ScaledMatrix`es together if the underlying matrices have efficient operators for their product. |
|
117 |
+However, if this is not the case, the `ScaledMatrix` offers little benefit for increased overhead. |
|
118 |
+ |
|
119 |
+It is also worth noting that this speed-up is not entirely free. |
|
120 |
+The expression above involves subtracting two matrix with potentially large values, which runs the risk of catastrophic cancellation. |
|
121 |
+In most practical applications, though, this does not seem to be a major concern, |
|
122 |
+especially as most values (e.g., log-normalized expression matrices) lie close to zero anyway. |
|
123 |
+ |
|
124 |
+# Session information {-} |
|
125 |
+ |
|
126 |
+```{r} |
|
127 |
+sessionInfo() |
|
128 |
+``` |