Browse code

Initial commit.

LTLA authored on 13/12/2020 01:02:11
Showing 15 changed files

1 1
new file mode 100644
... ...
@@ -0,0 +1 @@
1
+^\.gitignore$
0 2
new file mode 100644
... ...
@@ -0,0 +1 @@
1
+*.html
0 2
new file mode 100644
... ...
@@ -0,0 +1,30 @@
1
+Package: ScaledMatrix
2
+Version: 0.99.0
3
+Date: 2020-12-12
4
+Title: Creating a DelayedMatrix of Scaled and Centered Values 
5
+Authors@R: person("Aaron", "Lun", role=c("aut", "cre", "cph"),
6
+        email="[email protected]")
7
+Imports:
8
+    methods,
9
+    Matrix,
10
+    S4Vectors,
11
+    DelayedArray
12
+Suggests: 
13
+    testthat, 
14
+    BiocStyle, 
15
+    knitr, 
16
+    rmarkdown,
17
+    BiocSingular 
18
+biocViews: 
19
+    Software, 
20
+    DataRepresentation
21
+Description: 
22
+    Provides delayed computation of a matrix of scaled and centered values.
23
+    The result is equivalent to using the scale() function but avoids explicit 
24
+    realization of a dense matrix during block processing. This permits greater
25
+    efficiency in common operations, most notably matrix multiplication.
26
+License: GPL-3
27
+VignetteBuilder: knitr
28
+RoxygenNote: 7.1.1
29
+BugReports: https://github.com/LTLA/ScaledMatrix/issues
30
+URL: https://github.com/LTLA/ScaledMatrix
0 31
new file mode 100644
... ...
@@ -0,0 +1,39 @@
1
+# Generated by roxygen2: do not edit by hand
2
+
3
+export(ScaledMatrix)
4
+export(ScaledMatrixSeed)
5
+exportClasses(ScaledMatrix)
6
+exportClasses(ScaledMatrixSeed)
7
+exportMethods("%*%")
8
+exportMethods("[")
9
+exportMethods("dimnames<-")
10
+exportMethods(DelayedArray)
11
+exportMethods(colMeans)
12
+exportMethods(colSums)
13
+exportMethods(crossprod)
14
+exportMethods(dim)
15
+exportMethods(dimnames)
16
+exportMethods(extract_array)
17
+exportMethods(rowMeans)
18
+exportMethods(rowSums)
19
+exportMethods(show)
20
+exportMethods(t)
21
+exportMethods(tcrossprod)
22
+importClassesFrom(DelayedArray,DelayedMatrix)
23
+importFrom(DelayedArray,DelayedArray)
24
+importFrom(DelayedArray,extract_array)
25
+importFrom(DelayedArray,new_DelayedArray)
26
+importFrom(DelayedArray,seed)
27
+importFrom(DelayedArray,sweep)
28
+importFrom(Matrix,colMeans)
29
+importFrom(Matrix,colSums)
30
+importFrom(Matrix,crossprod)
31
+importFrom(Matrix,drop)
32
+importFrom(Matrix,rowMeans)
33
+importFrom(Matrix,rowSums)
34
+importFrom(Matrix,t)
35
+importFrom(Matrix,tcrossprod)
36
+importFrom(S4Vectors,setValidity2)
37
+importFrom(methods,is)
38
+importFrom(methods,new)
39
+importFrom(methods,show)
0 40
new file mode 100644
... ...
@@ -0,0 +1,6 @@
1
+#' @export
2
+setClass("ScaledMatrixSeed", slots=c(.matrix="ANY", center="numeric", scale="numeric", use_center="logical", use_scale="logical", transposed="logical"))
3
+
4
+#' @export
5
+#' @importClassesFrom DelayedArray DelayedMatrix
6
+setClass("ScaledMatrix", contains="DelayedMatrix", slots=c(seed="ScaledMatrixSeed"))
0 7
new file mode 100644
... ...
@@ -0,0 +1,197 @@
1
+#' The ScaledMatrix class
2
+#'
3
+#' Defines the ScaledMatrixSeed and ScaledMatrix classes and their associated methods.
4
+#' These classes support delayed centering and scaling of the columns in the same manner as \code{\link{scale}},
5
+#' but preserving the original data structure for more efficient operations like matrix multiplication.
6
+#' 
7
+#' @param x A matrix or any matrix-like object (e.g., from the \pkg{Matrix} package).
8
+#' 
9
+#' This can alternatively be a ScaledMatrixSeed, in which case any values of \code{center} and \code{scale} are ignored.
10
+#' @param center A numeric vector of length equal to \code{ncol(x)}, where each element is to be subtracted from the corresponding column of \code{x}.
11
+#' A \code{NULL} value indicates that no subtraction is to be performed.
12
+#' Alternatively \code{TRUE}, in which case it is set to the column means of \code{x}.
13
+#' @param scale A numeric vector of length equal to \code{ncol(x)}, where each element is to divided from the corresponding column of \code{x} (after subtraction).
14
+#' A \code{NULL} value indicates that no division is to be performed.
15
+#' Alternatively \code{TRUE}, in which case it is set to the column-wise root-mean-squared differences from \code{center} 
16
+#' (interpretable as standard deviations if \code{center} is set to the column means, see \code{\link{scale}} for commentary).
17
+#' 
18
+#' @return 
19
+#' The \code{ScaledMatrixSeed} constructor will return a ScaledMatrixSeed object.
20
+#' 
21
+#' The \code{ScaledMatrix} constructor will return a ScaledMatrix object equivalent to \code{t((t(x) - center)/scale)}.
22
+#' 
23
+#' @section Methods for ScaledMatrixSeed objects:
24
+#' ScaledMatrixSeed objects are implemented as \linkS4class{DelayedMatrix} backends.
25
+#' They support standard operations like \code{dim}, \code{dimnames} and \code{extract_array}.
26
+#' 
27
+#' Passing a ScaledMatrixSeed object to the \code{\link{DelayedArray}} constructor will create a ScaledMatrix object.
28
+#' 
29
+#' It is possible for \code{x} to contain a ScaledMatrix, thus nesting one ScaledMatrix inside another.
30
+#' This can occasionally be useful in combination with transposition to achieve centering/scaling in both dimensions.
31
+#' 
32
+#' @section Methods for ScaledMatrix objects:
33
+#' ScaledMatrix objects are derived from \linkS4class{DelayedMatrix} objects and support all of valid operations on the latter.
34
+#' Several functions are specialized for greater efficiency when operating on ScaledMatrix instances, including:
35
+#' \itemize{
36
+#'     \item Subsetting, transposition and replacement of row/column names.
37
+#'         These will return a new ScaledMatrix rather than a DelayedMatrix.
38
+#'     \item Matrix multiplication via \code{\%*\%}, \code{crossprod} and \code{tcrossprod}.
39
+#'         These functions will return a DelayedMatrix.
40
+#'     \item Calculation of row and column sums and means by \code{colSums}, \code{rowSums}, etc. 
41
+#' }
42
+#' 
43
+#' All other operations applied to a ScaledMatrix will use the underlying \pkg{DelayedArray} machinery.
44
+#' Unary or binary operations will generally create a new DelayedMatrix instance containing a ScaledMatrixSeed.
45
+#' 
46
+#' Tranposition can effectively be used to allow centering/scaling on the rows if the input \code{x} is transposed.
47
+#'
48
+#' @section Efficiency vs precision:
49
+#' The raison d'etre of the ScaledMatrix is that it can offer faster matrix multiplication by avoiding the \pkg{DelayedArray} block processing.
50
+#' This is done by refactoring the scaling/centering operations to use the (hopefully more efficient) multiplication operator of the original matrix \code{x}.
51
+#' Unfortunately, the speed-up comes at the cost of increasing the risk of catastrophic cancellation.
52
+#' The procedure requires subtraction of one large intermediate number from another to obtain the values of the final matrix product.
53
+#' This could result in a loss of numerical precision that compromises the accuracy of downstream algorithms. 
54
+#' In practice, this does not seem to be a major concern though one should be careful if the input \code{x} contains very large positive/negative values.
55
+#' 
56
+#' @author
57
+#' Aaron Lun
58
+#' 
59
+#' @examples
60
+#' library(Matrix)
61
+#' y <- ScaledMatrix(rsparsematrix(10, 20, 0.1), 
62
+#'     center=rnorm(20), scale=1+runif(20))
63
+#' y
64
+#' 
65
+#' crossprod(y)
66
+#' tcrossprod(y)
67
+#' y %*% rnorm(20)
68
+#' 
69
+#' @aliases
70
+#' ScaledMatrixSeed
71
+#' ScaledMatrixSeed-class
72
+#' 
73
+#' dim,ScaledMatrixSeed-method
74
+#' dimnames,ScaledMatrixSeed-method
75
+#' extract_array,ScaledMatrixSeed-method
76
+#' DelayedArray,ScaledMatrixSeed-method
77
+#' show,ScaledMatrixSeed-method
78
+#' 
79
+#' ScaledMatrix
80
+#' ScaledMatrix-class
81
+#' 
82
+#' dimnames<-,ScaledMatrix,ANY-method
83
+#' t,ScaledMatrix-method
84
+#' [,ScaledMatrix,ANY,ANY,ANY-method
85
+#' 
86
+#' colSums,ScaledMatrix-method
87
+#' rowSums,ScaledMatrix-method
88
+#' colMeans,ScaledMatrix-method
89
+#' rowMeans,ScaledMatrix-method
90
+#' 
91
+#' %*%,ANY,ScaledMatrix-method
92
+#' %*%,ScaledMatrix,ANY-method
93
+#' %*%,ScaledMatrix,ScaledMatrix-method
94
+#' 
95
+#' crossprod,ScaledMatrix,missing-method
96
+#' crossprod,ScaledMatrix,ANY-method
97
+#' crossprod,ANY,ScaledMatrix-method
98
+#' crossprod,ScaledMatrix,ScaledMatrix-method
99
+#' 
100
+#' tcrossprod,ScaledMatrix,missing-method
101
+#' tcrossprod,ScaledMatrix,ANY-method
102
+#' tcrossprod,ANY,ScaledMatrix-method
103
+#' tcrossprod,ScaledMatrix,ScaledMatrix-method
104
+#' 
105
+#' @docType class
106
+#' @name ScaledMatrix
107
+NULL
108
+
109
+#' @export
110
+#' @rdname ScaledMatrix
111
+#' @importFrom DelayedArray DelayedArray
112
+#' @importFrom Matrix colMeans rowSums t
113
+ScaledMatrix <- function(x, center=NULL, scale=NULL) {
114
+    if (isTRUE(center)) {
115
+        center <- colMeans(x)
116
+    }
117
+    if (isTRUE(scale)) {
118
+        tx <- t(DelayedArray(x))
119
+        if (!is.null(center)) {
120
+            tx <- tx - center
121
+        }
122
+        ss <- rowSums(tx^2)
123
+        scale <- sqrt(ss / (nrow(x) - 1))
124
+    }
125
+    DelayedArray(ScaledMatrixSeed(x, center=center, scale=scale))
126
+}
127
+
128
+#' @export
129
+#' @importFrom DelayedArray DelayedArray new_DelayedArray
130
+setMethod("DelayedArray", "ScaledMatrixSeed",
131
+    function(seed) new_DelayedArray(seed, Class="ScaledMatrix")
132
+)
133
+
134
+###################################
135
+# Overridden utilities from DelayedArray, for efficiency.
136
+
137
+#' @export
138
+#' @importFrom DelayedArray DelayedArray seed
139
+setReplaceMethod("dimnames", "ScaledMatrix", function(x, value) {
140
+    DelayedArray(rename_ScaledMatrixSeed(seed(x), value))
141
+})
142
+
143
+#' @export
144
+#' @importFrom DelayedArray DelayedArray seed
145
+setMethod("t", "ScaledMatrix", function(x) {
146
+    DelayedArray(transpose_ScaledMatrixSeed(seed(x)))
147
+})
148
+
149
+#' @export
150
+#' @importFrom DelayedArray DelayedArray seed
151
+setMethod("[", "ScaledMatrix", function(x, i, j, ..., drop=TRUE) {
152
+    if (missing(i)) i <- NULL
153
+    if (missing(j)) j <- NULL
154
+    out <- DelayedArray(subset_ScaledMatrixSeed(seed(x), i=i, j=j))
155
+
156
+    if (drop && any(dim(out)==1L)) {
157
+        return(drop(out))
158
+    }
159
+    out
160
+})
161
+
162
+###################################
163
+# Basic matrix stats.
164
+
165
+#' @export
166
+#' @importFrom Matrix colSums rowSums drop
167
+setMethod("colSums", "ScaledMatrix", function(x, na.rm = FALSE, dims = 1L) {
168
+    if (is_transposed(seed(x))) {
169
+        return(rowSums(t(x)))
170
+    }
171
+
172
+    out <- rep(1, nrow(x)) %*% x
173
+    out <- drop(out)
174
+    names(out) <- colnames(x)
175
+    out
176
+})
177
+
178
+#' @export
179
+#' @importFrom Matrix colSums rowSums drop
180
+setMethod("rowSums", "ScaledMatrix", function(x, na.rm = FALSE, dims = 1L) {
181
+    if (is_transposed(seed(x))) {
182
+        return(colSums(t(x)))
183
+    }
184
+
185
+    out <- x %*% rep(1, ncol(x))
186
+    out <- drop(out)
187
+    names(out) <- rownames(x)
188
+    out
189
+})
190
+
191
+#' @export
192
+#' @importFrom Matrix colMeans colSums
193
+setMethod("colMeans", "ScaledMatrix", function(x, na.rm = FALSE, dims = 1L) colSums(x)/nrow(x))
194
+
195
+#' @export
196
+#' @importFrom Matrix rowMeans rowSums
197
+setMethod("rowMeans", "ScaledMatrix", function(x, na.rm = FALSE, dims = 1L) rowSums(x)/ncol(x))
0 198
new file mode 100644
... ...
@@ -0,0 +1,165 @@
1
+#' @export
2
+#' @importFrom methods new is
3
+ScaledMatrixSeed <- function(x, center=NULL, scale=NULL) {
4
+    if (missing(x)) {
5
+        x <- matrix(0, 0, 0)
6
+    } else if (is(x, "ScaledMatrixSeed")) {
7
+        return(x)
8
+    } 
9
+
10
+    use_center <- !is.null(center)
11
+    use_scale <- !is.null(scale)
12
+    new("ScaledMatrixSeed", .matrix=x, center=as.numeric(center), scale=as.numeric(scale), use_center=use_center, use_scale=use_scale, transposed=FALSE)
13
+}
14
+
15
+#' @importFrom S4Vectors setValidity2
16
+setValidity2("ScaledMatrixSeed", function(object) {
17
+    msg <- character(0)
18
+
19
+    # Checking scalars.
20
+    if (length(use_center(object))!=1L) {
21
+        msg <- c(msg, "'use_center' must be a logical scalar")
22
+    } 
23
+    if (length(use_scale(object))!=1L) {
24
+        msg <- c(msg, "'use_scale' must be a logical scalar")
25
+    } 
26
+    if (length(is_transposed(object))!=1L) {
27
+        msg <- c(msg, "'transposed' must be a logical scalar")
28
+    } 
29
+
30
+    # Checking vectors.
31
+    if (use_center(object) && length(get_center(object))!=ncol(object)) {
32
+        msg <- c(msg, "length of 'center' must equal 'ncol(object)'")
33
+    }
34
+    if (use_scale(object) && length(get_scale(object))!=ncol(object)) {
35
+        msg <- c(msg, "length of 'scale' must equal 'ncol(object)'")
36
+    }
37
+
38
+    if (length(msg)) {
39
+        return(msg)
40
+    } 
41
+    return(TRUE)
42
+})
43
+
44
+#' @export
45
+#' @importFrom methods show
46
+setMethod("show", "ScaledMatrixSeed", function(object) {
47
+    cat(sprintf("%i x %i ScaledMatrixSeed object", nrow(object), ncol(object)),
48
+        sprintf("representation: %s", class(get_matrix2(object))),
49
+        sprintf("centering: %s", if (use_center(object)) "yes" else "no"),
50
+        sprintf("scaling: %s", if (use_scale(object)) "yes" else "no"),
51
+    sep="\n")
52
+})
53
+
54
+###################################
55
+# Internal getters. 
56
+
57
+get_matrix2 <- function(x) [email protected]
58
+
59
+get_center <- function(x) x@center
60
+
61
+get_scale <- function(x) x@scale
62
+
63
+use_center <- function(x) x@use_center
64
+
65
+use_scale <- function(x) x@use_scale
66
+
67
+is_transposed <- function(x) x@transposed
68
+
69
+###################################
70
+# DelayedArray support utilities. 
71
+
72
+#' @export
73
+setMethod("dim", "ScaledMatrixSeed", function(x) {
74
+    d <- dim(get_matrix2(x))
75
+    if (is_transposed(x)) { d <- rev(d) }
76
+    d
77
+})
78
+
79
+#' @export
80
+setMethod("dimnames", "ScaledMatrixSeed", function(x) {
81
+    d <- dimnames(get_matrix2(x))
82
+    if (is_transposed(x)) { d <- rev(d) }
83
+    d
84
+})
85
+
86
+#' @export
87
+#' @importFrom DelayedArray extract_array
88
+setMethod("extract_array", "ScaledMatrixSeed", function(x, index) {
89
+    x2 <- subset_ScaledMatrixSeed(x, index[[1]], index[[2]])        
90
+    realize_ScaledMatrixSeed(x2)
91
+})
92
+
93
+###################################
94
+# Other utilities. 
95
+
96
+rename_ScaledMatrixSeed <- function(x, value) {
97
+    if (is_transposed(x)) value <- rev(value)
98
+    dimnames([email protected]) <- value
99
+    x
100
+}
101
+
102
+transpose_ScaledMatrixSeed <- function(x) {
103
+    x@transposed <- !is_transposed(x)
104
+    x
105
+}
106
+
107
+#' @importFrom Matrix t
108
+#' @importFrom methods is
109
+realize_ScaledMatrixSeed <- function(x, ...) {
110
+    out <- get_matrix2(x)
111
+
112
+    if (use_scale(x) || use_center(x)) {
113
+        if (is(out, "ScaledMatrix")) {
114
+            # Any '-' and '/' would collapse this to a DelayedArray, 
115
+            # which would then call extract_array, which would then 
116
+            # call realize_ScaledMatrixSeed, forming an infinite loop.
117
+            # So we might as well realize it now.
118
+            out <- realize_ScaledMatrixSeed(seed(out))
119
+        }
120
+
121
+        out <- t(out)
122
+        if (use_center(x)) {
123
+            out <- out - get_center(x)
124
+        }
125
+        if (use_scale(x)) {
126
+            out <- out / get_scale(x)
127
+        }
128
+
129
+        if (!is_transposed(x)) out <- t(out)
130
+    } else {
131
+        if (is_transposed(x)) out <- t(out) 
132
+    }
133
+
134
+    as.matrix(out)
135
+}
136
+
137
+subset_ScaledMatrixSeed <- function(x, i, j) {
138
+    if (is_transposed(x)) {
139
+        x2 <- transpose_ScaledMatrixSeed(x)
140
+        x2 <- subset_ScaledMatrixSeed(x2, i=j, j=i)
141
+        return(transpose_ScaledMatrixSeed(x2))
142
+    }
143
+
144
+    if (!is.null(i)) {
145
+        [email protected] <- get_matrix2(x)[i,,drop=FALSE]
146
+    }
147
+    
148
+    if (!is.null(j)) {
149
+        if (is.character(j)) {
150
+            j <- match(j, colnames(x))
151
+        }
152
+        
153
+        [email protected] <- get_matrix2(x)[,j,drop=FALSE]
154
+            
155
+        if (use_scale(x)) {
156
+            x@scale <- get_scale(x)[j]
157
+        }
158
+        
159
+        if (use_center(x)) {
160
+            x@center <- get_center(x)[j]
161
+        }
162
+    }
163
+
164
+    return(x)
165
+}
0 166
new file mode 100644
... ...
@@ -0,0 +1,634 @@
1
+# We attempt to use operators defined for '.matrix' in the 'ScaledMatrixSeed'.
2
+# This avoids expensive modifications such as loss of sparsity.
3
+# Centering and scaling are factored out into separate operations.
4
+#
5
+# We assume that the non-'ScaledMatrix' argument is small and can be modified cheaply.
6
+# We also assume that the matrix product is small and can be modified cheaply.
7
+# This allows centering and scaling to be applied *after* multiplication.
8
+#
9
+# Here are some ground rules for how these functions must work:
10
+#
11
+#  - NO arithmetic operations shall be applied to a ScaledMatrix.
12
+#    This includes nested ScaledMatrices that are present in '.matrix'.
13
+#    Such operations collapses the ScaledMatrix to a DelayedMatrix, 
14
+#    resulting in slow block processing during multiplication.
15
+# 
16
+#  - NO addition/subtraction operations shall be applied to '.matrix'.
17
+#    This is necessary to avoid loss of sparsity for sparse '.matrix',
18
+#    as well as to avoid block processing for ScaledMatrix '.matrix'.
19
+# 
20
+#  - NO division/multiplication operations should be applied to '.matrix'.
21
+#    This is largely a consequence of the first point above.
22
+#    Exceptions are only allowed when this is unavoidable, e.g., in '.internal_tcrossprod'.
23
+#
24
+#  - NO calling of %*% or (t)crossprod on a ScaledMatrix of the same nesting depth as an input ScaledMatrix.
25
+#    Internal multiplication should always be applied to '.matrix', to avoid infinite S4 recursion.
26
+#    Each method call should strip away one nesting level, i.e., operate on the seed.
27
+#    Exceptions are allowed for dual ScaledMatrix multiplication,
28
+#    where one argument is allowed to be of the same depth.
29
+
30
+#' @export
31
+#' @importFrom Matrix t
32
+#' @importFrom DelayedArray seed DelayedArray
33
+setMethod("%*%", c("ScaledMatrix", "ANY"), function(x, y) {
34
+    x_seed <- seed(x)
35
+    if (is_transposed(x_seed)) {
36
+        out <- t(.leftmult_ScaledMatrix(t(y), x_seed))
37
+    } else {
38
+        out <- .rightmult_ScaledMatrix(x_seed, y)
39
+    }
40
+    DelayedArray(out)
41
+})
42
+
43
+#' @importFrom DelayedArray sweep
44
+.rightmult_ScaledMatrix <- function(x_seed, y) {
45
+    if (use_scale(x_seed)) {
46
+        y <- y / get_scale(x_seed)
47
+    }
48
+
49
+    out <- as.matrix(get_matrix2(x_seed) %*% y)
50
+
51
+    if (use_center(x_seed)) {
52
+        out <- sweep(out, 2, as.numeric(get_center(x_seed) %*% y), "-", check.margin=FALSE)
53
+    }
54
+
55
+    out
56
+}
57
+
58
+#' @export
59
+#' @importFrom Matrix t 
60
+#' @importFrom DelayedArray seed DelayedArray
61
+setMethod("%*%", c("ANY", "ScaledMatrix"), function(x, y) {
62
+    y_seed <- seed(y)
63
+    if (is_transposed(y_seed)) {
64
+        if (!is.null(dim(x))) {
65
+            # Vectors don't quite behave as 1-column matrices here.
66
+            # so we need to be a bit more careful.
67
+            x <- t(x) 
68
+        }
69
+        out <- t(.rightmult_ScaledMatrix(y_seed, x))
70
+    } else {
71
+        out <- .leftmult_ScaledMatrix(x, y_seed)
72
+    }
73
+    DelayedArray(out)
74
+})
75
+
76
+#' @importFrom Matrix rowSums
77
+#' @importFrom DelayedArray sweep
78
+.leftmult_ScaledMatrix <- function(x, y_seed) { 
79
+    out <- as.matrix(x %*% get_matrix2(y_seed))
80
+
81
+    if (use_center(y_seed)) {
82
+        if (is.null(dim(x))) {
83
+            out <- out - get_center(y_seed) * sum(x)
84
+        } else {
85
+            out <- out - outer(rowSums(x), get_center(y_seed), "*")
86
+        }
87
+    }
88
+
89
+    if (use_scale(y_seed)) {
90
+        out <- sweep(out, 2, get_scale(y_seed), "/", check.margin=FALSE)
91
+    }
92
+
93
+    out
94
+}
95
+
96
+#' @export
97
+#' @importFrom DelayedArray seed DelayedArray
98
+setMethod("%*%", c("ScaledMatrix", "ScaledMatrix"), function(x, y) {
99
+    x_seed <- seed(x)
100
+    y_seed <- seed(y)
101
+    res <- .dual_mult_dispatcher(x_seed, y_seed, is_transposed(x_seed), is_transposed(y_seed))
102
+    DelayedArray(res)
103
+})
104
+
105
+#' @importFrom Matrix t
106
+.dual_mult_dispatcher <- function(x_seed, y_seed, x_trans, y_trans) {
107
+    if (!x_trans) {
108
+        if (!y_trans) {
109
+            res <- .multiply_u2u(x_seed, y_seed)
110
+        } else {
111
+            res <- .multiply_u2t(x_seed, y_seed)
112
+        }
113
+    } else {
114
+        if (!y_trans) {
115
+            res <- .multiply_t2u(x_seed, y_seed)
116
+        } else {
117
+            res <- .multiply_u2u(y_seed, x_seed)
118
+            res <- t(res)
119
+        }
120
+    }
121
+    res
122
+}
123
+
124
+###################################
125
+# ScMat %*% ScMat utilities.
126
+
127
+# We do not implement ScMat %*% ScMat in terms of left/right %*%.
128
+# This would cause scaling to be applied on one of the ScMats,
129
+# collapsing it into a DelayedMatrix. Subsequent multiplication 
130
+# would use block processing, which would be too slow.
131
+
132
+#' @importFrom Matrix drop rowSums
133
+#' @importFrom DelayedArray sweep
134
+.multiply_u2u <- function(x_seed, y_seed) 
135
+# Considering the problem of (X - C_x)S_x (Y - C_y)S_y.
136
+{
137
+    # Computing X S_x Y S_y
138
+    x0 <- get_matrix2(x_seed)
139
+    if (use_scale(x_seed)) {
140
+        x0 <- ScaledMatrix(x0, scale=get_scale(x_seed))
141
+    } 
142
+
143
+    result <- as.matrix(x0 %*% get_matrix2(y_seed))
144
+    if (use_scale(y_seed)) {
145
+        result <- sweep(result, 2, get_scale(y_seed), "/", check.margin=FALSE)
146
+    }
147
+
148
+    # Computing C_x S_x Y S_y, and subtracting it from 'result'.
149
+    if (use_center(x_seed)) {
150
+        x.center <- get_center(x_seed)
151
+        if (use_scale(x_seed)) {
152
+            x.center <- x.center / get_scale(x_seed)
153
+        }
154
+
155
+        component2 <- drop(x.center %*% get_matrix2(y_seed))
156
+        if (use_scale(y_seed)) {
157
+            component2 <- component2 / get_scale(y_seed)
158
+        }
159
+
160
+        result <- sweep(result, 2, component2, "-", check.margin=FALSE)
161
+    }
162
+
163
+    # Computing C_x S_x C_y S_y, and adding it to 'result'.
164
+    if (use_center(x_seed) && use_center(y_seed)) {
165
+        x.center <- get_center(x_seed)
166
+        if (use_scale(x_seed)) {
167
+            x.center <- x.center / get_scale(x_seed)
168
+        }
169
+
170
+        y.center <- get_center(y_seed)
171
+        if (use_scale(y_seed)) {
172
+            y.center <- y.center / get_scale(y_seed)
173
+        }
174
+
175
+        component4 <- sum(x.center) * y.center
176
+        result <- sweep(result, 2, component4, "+", check.margin=FALSE)
177
+    }
178
+
179
+    # Computing X S_x C_y S_y, and subtracting it from 'result'.
180
+    # This is done last to avoid subtracting large values.
181
+    if (use_center(y_seed)) {
182
+        y.center <- get_center(y_seed)
183
+        if (use_scale(y_seed)) {
184
+            y.center <- y.center / get_scale(y_seed)
185
+        }
186
+
187
+        component3 <- outer(rowSums(x0), y.center)
188
+        result <- result - component3
189
+    }
190
+
191
+    result
192
+}
193
+
194
+#' @importFrom Matrix tcrossprod drop
195
+#' @importFrom DelayedArray sweep
196
+.multiply_u2t <- function(x_seed, y_seed) 
197
+# Considering the problem of (X - C_x)S_x S_y(Y' - C_y')
198
+{
199
+    # Computing X S_x S_y Y'
200
+    x0 <- get_matrix2(x_seed)
201
+    if (use_scale(x_seed) || use_scale(y_seed)) {
202
+        scaling <- 1
203
+        if (use_scale(x_seed)) {
204
+            scaling <- scaling * get_scale(x_seed)
205
+        }
206
+        if (use_scale(y_seed)) {
207
+            scaling <- scaling * get_scale(y_seed)
208
+        }
209
+        x0 <- ScaledMatrix(x0, scale=scaling)
210
+    }
211
+    result <- as.matrix(tcrossprod(x0, get_matrix2(y_seed)))
212
+
213
+    # Computing C_x S_x S_y Y', and subtracting it from 'result'.
214
+    if (use_center(x_seed)) {
215
+        x.center <- get_center(x_seed)
216
+        if (use_scale(x_seed)) {
217
+            x.center <- x.center / get_scale(x_seed)
218
+        }
219
+        if (use_scale(y_seed)) {
220
+            x.center <- x.center / get_scale(y_seed)
221
+        }
222
+
223
+        component2 <- drop(tcrossprod(x.center, get_matrix2(y_seed)))
224
+        result <- sweep(result, 2, component2, "-", check.margin=FALSE)
225
+    }
226
+
227
+    # Computing C_x S_x S_y C_y', and adding it to 'result'.
228
+    if (use_center(x_seed) && use_center(y_seed)) {
229
+        x.center <- get_center(x_seed)
230
+        if (use_scale(x_seed)) {
231
+            x.center <- x.center / get_scale(x_seed)
232
+        }
233
+
234
+        y.center <- get_center(y_seed)
235
+        if (use_scale(y_seed)) {
236
+            y.center <- y.center / get_scale(y_seed)
237
+        }
238
+
239
+        component4 <- sum(x.center*y.center)
240
+        result <- result + component4
241
+    }
242
+
243
+    # Computing X S_x S_y C_y', and subtracting it from 'result'.
244
+    # This is done last to avoid subtracting large values.
245
+    if (use_center(y_seed)) {
246
+        component3 <- drop(x0 %*% get_center(y_seed))
247
+        result <- result - component3
248
+    }
249
+
250
+    result
251
+}
252
+
253
+#' @importFrom Matrix crossprod colSums
254
+#' @importFrom DelayedArray sweep
255
+.multiply_t2u <- function(x_seed, y_seed) 
256
+# Considering the problem of S_x(X' - C_x') (Y - C_y)S_y
257
+{
258
+    # C mputing X' Y 
259
+    x0 <- get_matrix2(x_seed)
260
+    y0 <- get_matrix2(y_seed)
261
+    result <- as.matrix(crossprod(x0, y0))
262
+
263
+    # Computing C_x' Y, and subtracting it from 'result'.
264
+    if (use_center(x_seed)) {
265
+        x.center <- get_center(x_seed)
266
+        component2 <- outer(x.center, colSums(y0))
267
+        result <- result - component2
268
+    }
269
+
270
+    # Computing C_x' C_y, and adding it to 'result'.
271
+    if (use_center(x_seed) && use_center(y_seed)) {
272
+        x.center <- get_center(x_seed)
273
+        y.center <- get_center(y_seed)
274
+        component4 <- outer(x.center, y.center) * nrow(y0)
275
+        result <- result + component4
276
+    }
277
+
278
+    # Computing X' C_y, and subtracting it from 'result'.
279
+    # This is done last to avoid subtracting large values.
280
+    if (use_center(y_seed)) {
281
+        component3 <- outer(colSums(x0), get_center(y_seed))
282
+        result <- result - component3
283
+    }
284
+
285
+    if (use_scale(x_seed)) {
286
+        result <- result / get_scale(x_seed)
287
+    } 
288
+    if (use_scale(y_seed)) {
289
+        result <- sweep(result, 2, get_scale(y_seed), "/", check.margin=FALSE)
290
+    }
291
+
292
+    result
293
+}
294
+
295
+###################################
296
+# Cross-product. 
297
+
298
+# Technically, we could implement this in terms of '%*%',
299
+# but we use specializations to exploit native crossprod() for '.matrix',
300
+# which is probably more efficient.
301
+
302
+#' @export
303
+#' @importFrom Matrix crossprod
304
+#' @importFrom DelayedArray seed DelayedArray
305
+setMethod("crossprod", c("ScaledMatrix", "missing"), function(x, y) {
306
+    x_seed <- seed(x)
307
+    if (is_transposed(x_seed)) {
308
+        # No need to t(), the output is symmetric anyway. 
309
+        out <- .tcp_ScaledMatrix(x_seed)
310
+    } else {
311
+        out <- .cross_ScaledMatrix(x_seed)
312
+    }
313
+
314
+    DelayedArray(out)
315
+})
316
+
317
+#' @importFrom Matrix crossprod colSums
318
+#' @importFrom DelayedArray sweep
319
+.cross_ScaledMatrix <- function(x_seed) {
320
+    x0 <- get_matrix2(x_seed)
321
+    out <- as.matrix(crossprod(x0))
322
+
323
+    if (use_center(x_seed)) {
324
+        centering <- get_center(x_seed)
325
+        colsums <- colSums(x0)
326
+
327
+        # Minus, then add, then minus, to mitigate cancellation.
328
+        out <- out - outer(centering, colsums)
329
+        out <- out + outer(centering, centering) * nrow(x0)
330
+        out <- out - outer(colsums, centering)
331
+    }
332
+
333
+    if (use_scale(x_seed)) {
334
+        scaling <- get_scale(x_seed)
335
+        out <- sweep(out / scaling, 2, scaling, "/", check.margin=FALSE)
336
+    }
337
+    out
338
+}
339
+
340
+#' @export
341
+#' @importFrom Matrix crossprod
342
+#' @importFrom DelayedArray seed DelayedArray
343
+setMethod("crossprod", c("ScaledMatrix", "ANY"), function(x, y) {
344
+    x_seed <- seed(x)
345
+    if (is_transposed(x_seed)) {
346
+        out <- .rightmult_ScaledMatrix(x_seed, y)
347
+    } else {
348
+        out <- .rightcross_ScaledMatrix(x_seed, y)
349
+    }
350
+    DelayedArray(out)
351
+})
352
+
353
+#' @importFrom Matrix crossprod colSums
354
+.rightcross_ScaledMatrix <- function(x_seed, y) {
355
+    out <- as.matrix(crossprod(get_matrix2(x_seed), y))
356
+
357
+    if (use_center(x_seed)) {
358
+        if (is.null(dim(y))) {
359
+            out <- out - get_center(x_seed) * sum(y)
360
+        } else {
361
+            out <- out - outer(get_center(x_seed), colSums(y))
362
+        }
363
+    }
364
+    
365
+    if (use_scale(x_seed)) {
366
+        out <- out / get_scale(x_seed)
367
+    }
368
+
369
+    out
370
+}
371
+
372
+#' @export
373
+#' @importFrom Matrix crossprod
374
+#' @importFrom DelayedArray seed DelayedArray
375
+setMethod("crossprod", c("ANY", "ScaledMatrix"), function(x, y) {
376
+    y_seed <- seed(y)
377
+    if (is_transposed(y_seed)) {
378
+        out <- t(.rightmult_ScaledMatrix(y_seed, x))
379
+    } else {
380
+        out <- .leftcross_ScaledMatrix(x, y_seed)
381
+    }
382
+    DelayedArray(out)
383
+})
384
+
385
+#' @importFrom Matrix crossprod colSums
386
+#' @importFrom DelayedArray sweep
387
+.leftcross_ScaledMatrix <- function(x, y_seed) {
388
+    out <- as.matrix(crossprod(x, get_matrix2(y_seed)))
389
+
390
+    if (use_center(y_seed)) {
391
+        if (is.null(dim(x))) {
392
+            out <- sweep(out, 2, sum(x) * get_center(y_seed), "-", check.margin=FALSE)
393
+        } else {
394
+            out <- out - outer(colSums(x), get_center(y_seed))
395
+        }
396
+    }
397
+
398
+    if (use_scale(y_seed)) {
399
+        out <- sweep(out, 2, get_scale(y_seed), "/", check.margin=FALSE)
400
+    }
401
+
402
+    out
403
+}
404
+
405
+#' @export
406
+#' @importFrom Matrix crossprod
407
+#' @importFrom DelayedArray DelayedArray seed
408
+setMethod("crossprod", c("ScaledMatrix", "ScaledMatrix"), function(x, y) {
409
+    x_seed <- seed(x)
410
+    y_seed <- seed(y)
411
+    res <- .dual_mult_dispatcher(x_seed, y_seed, !is_transposed(x_seed), is_transposed(y_seed))
412
+    DelayedArray(res)
413
+})
414
+
415
+###################################
416
+# Transposed cross-product. 
417
+
418
+# Technically, we could implement this in terms of '%*%',
419
+# but we use specializations to exploit native tcrossprod() for '.matrix',
420
+# which is probably more efficient.
421
+
422
+#' @export
423
+#' @importFrom Matrix tcrossprod
424
+#' @importFrom DelayedArray seed DelayedArray sweep
425
+setMethod("tcrossprod", c("ScaledMatrix", "missing"), function(x, y) {
426
+    x_seed <- seed(x)
427
+    if (is_transposed(x_seed)) {
428
+        out <- .cross_ScaledMatrix(x_seed)
429
+    } else {
430
+        out <- .tcp_ScaledMatrix(x_seed)
431
+    }
432
+    DelayedArray(out)
433
+})
434
+
435
+#' @importFrom Matrix tcrossprod
436
+.tcp_ScaledMatrix <- function(x_seed) {
437
+    x0 <- get_matrix2(x_seed)
438
+
439
+    if (use_scale(x_seed)) {
440
+        out <- as.matrix(.internal_tcrossprod(x0, get_scale(x_seed)))
441
+    } else {
442
+        out <- as.matrix(tcrossprod(x0))
443
+    }
444
+    
445
+    if (use_center(x_seed)) {
446
+        centering <- get_center(x_seed)
447
+
448
+        if (use_scale(x_seed)) {
449
+            centering <- centering / get_scale(x_seed)
450
+            extra <- centering / get_scale(x_seed)
451
+        } else {
452
+            extra <- centering
453
+        }
454
+            
455
+        # With scaling, the use of 'extra' mimics sweep(x0, 2, get_scale(x), "/"),
456
+        # except that the scaling is applied to 'centering' rather than directly to 'x0'.
457
+        # Without scaling, 'extra' and 'centering' are interchangeable.
458
+        component <- tcrossprod(extra, x0)
459
+
460
+        # Minus, then add, then minus, to mitigate cancellation.
461
+        out <- sweep(out, 2, as.numeric(component), "-", check.margin=FALSE)
462
+        out <- out + sum(centering^2)
463
+        out <- out - as.numeric(x0 %*% extra)
464
+    }
465
+
466
+    out
467
+}
468
+
469
+#' @export
470
+#' @importFrom Matrix tcrossprod t
471
+#' @importFrom DelayedArray seed DelayedArray sweep
472
+setMethod("tcrossprod", c("ScaledMatrix", "ANY"), function(x, y) {
473
+    if (is.null(dim(y))) { # for consistency with base::tcrossprod.
474
+        stop("non-conformable arguments")
475
+    }
476
+
477
+    x_seed <- seed(x)
478
+    if (is_transposed(x_seed)) {
479
+        out <- t(.leftmult_ScaledMatrix(y, x_seed))
480
+    } else {
481
+        out <- .righttcp_ScaledMatrix(x_seed, y)
482
+    }
483
+    DelayedArray(out)
484
+})
485
+
486
+#' @importFrom Matrix tcrossprod
487
+.righttcp_ScaledMatrix <- function(x_seed, y) {
488
+    if (use_scale(x_seed)) {
489
+        # 'y' cannot be a vector anymore, due to the check above.
490
+        y <- sweep(y, 2, get_scale(x_seed), "/", check.margin=FALSE)
491
+    }
492
+
493
+    out <- as.matrix(tcrossprod(get_matrix2(x_seed), y))
494
+
495
+    if (use_center(x_seed)) {
496
+        out <- sweep(out, 2, as.numeric(tcrossprod(get_center(x_seed), y)), "-", check.margin=FALSE)
497
+    }
498
+
499
+    out
500
+}
501
+
502
+#' @export
503
+#' @importFrom Matrix tcrossprod t
504
+#' @importFrom DelayedArray seed DelayedArray
505
+setMethod("tcrossprod", c("ANY", "ScaledMatrix"), function(x, y) {
506
+    y_seed <- seed(y) 
507
+    if (is_transposed(y_seed)) {
508
+        out <- .leftmult_ScaledMatrix(x, y_seed)
509
+    } else {
510
+        out <- .lefttcp_ScaledMatrix(x, y_seed)
511
+    }
512
+    DelayedArray(out)
513
+})
514
+
515
+#' @importFrom Matrix tcrossprod 
516
+.lefttcp_ScaledMatrix <- function(x, y_seed) {
517
+    if (use_scale(y_seed)) {
518
+        if (is.null(dim(x))) {
519
+            x <- x / get_scale(y_seed)
520
+        } else { 
521
+            x <- sweep(x, 2, get_scale(y_seed), "/", check.margin=FALSE)
522
+        }
523
+    }
524
+
525
+    out <- as.matrix(tcrossprod(x, get_matrix2(y_seed)))
526
+
527
+    if (use_center(y_seed)) {
528
+        out <- out - as.numeric(x %*% get_center(y_seed))
529
+    }
530
+    out
531
+}
532
+
533
+#' @export
534
+#' @importFrom Matrix tcrossprod
535
+#' @importFrom DelayedArray DelayedArray seed
536
+setMethod("tcrossprod", c("ScaledMatrix", "ScaledMatrix"), function(x, y) {
537
+    x_seed <- seed(x)
538
+    y_seed <- seed(y)
539
+    res <- .dual_mult_dispatcher(x_seed, y_seed, is_transposed(x_seed), !is_transposed(y_seed))
540
+    DelayedArray(res)
541
+})
542
+
543
+###################################
544
+# Extra code for corner-case calculations of the transposed cross-product.
545
+
546
+#' @importFrom DelayedArray seed DelayedArray
547
+.update_scale <- function(x, s) {
548
+    x_seed <- seed(x)
549
+    if (use_scale(x_seed)) {
550
+        s <- s * get_scale(x_seed)
551
+    }
552
+    x_seed@scale <- s
553
+    x_seed@use_scale <- TRUE
554
+    DelayedArray(x_seed)
555
+}
556
+
557
+#' @importFrom Matrix tcrossprod 
558
+#' @importFrom methods is
559
+#' @importFrom DelayedArray seed
560
+.internal_tcrossprod <- function(x, scale.) 
561
+# Computes tcrossprod(sweep(x, 2, scale, "/")) when 'x' is a matrix-like object.
562
+# 'scale' can be assumed to be non-NULL here.
563
+# This will always return a dense ordinary matrix.
564
+{
565
+    if (!is(x, "ScaledMatrix")) {
566
+        x <- sweep(x, 2, scale., "/", check.margin=FALSE) 
567
+        return(as.matrix(tcrossprod(x)))
568
+    }
569
+
570
+    x_seed <- seed(x)
571
+    if (!is_transposed(x_seed)) {
572
+        x <- .update_scale(x, scale.)
573
+        return(as.matrix(tcrossprod(x)))
574
+    }
575
+
576
+    inner <- get_matrix2(x_seed)
577
+    if (is(inner, "ScaledMatrix")) {
578
+        if (is_transposed(seed(inner))) {
579
+            component1 <- as.matrix(crossprod(.update_scale(inner, scale.)))
580
+        } else {
581
+            component1 <- .internal_tcrossprod(t(inner), scale.) # recurses. 
582
+        }
583
+    } else {
584
+        component1 <- as.matrix(crossprod(inner/scale.))
585
+    }
586
+           
587
+    if (use_center(x_seed)) {
588
+        centering <- get_center(x_seed)
589
+        component2 <- .internal_mult_special(centering, scale., inner)
590
+        component3 <- t(component2)
591
+        component4 <- outer(centering, centering) * sum(1/scale.^2)
592
+        final <- (component1 - component2) + (component4 - component3)
593
+    } else {
594
+        final <- component1
595
+    }
596
+
597
+    if (use_scale(x_seed)) {
598
+        x.scale <- get_scale(x_seed)
599
+        final <- final / x.scale
600
+        final <- sweep(final, 2, x.scale, "/", check.margin=FALSE) 
601
+    }
602
+
603
+    final 
604
+}
605
+
606
+#' @importFrom methods is
607
+#' @importFrom DelayedArray seed
608
+.internal_mult_special <- function(center, scale., Z)
609
+# Computes C^T * S^2 * Z where C is a matrix of 'centers' copied byrow=TRUE;
610
+# S is a diagonal matrix filled with '1/scale'; and 'Z' is a ScaledMatrix.
611
+# This will always return a dense ordinary matrix.
612
+{
613
+    if (!is(Z, "ScaledMatrix")) {
614
+        return(outer(center, colSums(Z / scale.^2)))
615
+    }
616
+    
617
+    Z_seed <- seed(Z)
618
+    if (is_transposed(Z_seed)) {
619
+        Z <- .update_scale(Z, scale.^2)
620
+        return(outer(center, colSums(Z)))
621
+    }
622
+
623
+    output <- .internal_mult_special(center, scale., get_matrix2(Z_seed)) # recurses.
624
+
625
+    if (use_center(Z_seed)) {
626
+        output <- output - outer(center, get_center(Z_seed)) * sum(1/scale.^2)
627
+    }
628
+
629
+    if (use_scale(Z_seed)) {
630
+        output <- sweep(output, 2, get_scale(Z_seed), "/")
631
+    }
632
+
633
+    output
634
+}
0 635
new file mode 100644
... ...
@@ -0,0 +1,112 @@
1
+% Generated by roxygen2: do not edit by hand
2
+% Please edit documentation in R/ScaledMatrix.R
3
+\docType{class}
4
+\name{ScaledMatrix}
5
+\alias{ScaledMatrix}
6
+\alias{ScaledMatrixSeed}
7
+\alias{ScaledMatrixSeed-class}
8
+\alias{dim,ScaledMatrixSeed-method}
9
+\alias{dimnames,ScaledMatrixSeed-method}
10
+\alias{extract_array,ScaledMatrixSeed-method}
11
+\alias{DelayedArray,ScaledMatrixSeed-method}
12
+\alias{show,ScaledMatrixSeed-method}
13
+\alias{ScaledMatrix-class}
14
+\alias{dimnames<-,ScaledMatrix,ANY-method}
15
+\alias{t,ScaledMatrix-method}
16
+\alias{[,ScaledMatrix,ANY,ANY,ANY-method}
17
+\alias{colSums,ScaledMatrix-method}
18
+\alias{rowSums,ScaledMatrix-method}
19
+\alias{colMeans,ScaledMatrix-method}
20
+\alias{rowMeans,ScaledMatrix-method}
21
+\alias{\%*\%,ANY,ScaledMatrix-method}
22
+\alias{\%*\%,ScaledMatrix,ANY-method}
23
+\alias{\%*\%,ScaledMatrix,ScaledMatrix-method}
24
+\alias{crossprod,ScaledMatrix,missing-method}
25
+\alias{crossprod,ScaledMatrix,ANY-method}
26
+\alias{crossprod,ANY,ScaledMatrix-method}
27
+\alias{crossprod,ScaledMatrix,ScaledMatrix-method}
28
+\alias{tcrossprod,ScaledMatrix,missing-method}
29
+\alias{tcrossprod,ScaledMatrix,ANY-method}
30
+\alias{tcrossprod,ANY,ScaledMatrix-method}
31
+\alias{tcrossprod,ScaledMatrix,ScaledMatrix-method}
32
+\title{The ScaledMatrix class}
33
+\usage{
34
+ScaledMatrix(x, center = NULL, scale = NULL)
35
+}
36
+\arguments{
37
+\item{x}{A matrix or any matrix-like object (e.g., from the \pkg{Matrix} package).
38
+
39
+This can alternatively be a ScaledMatrixSeed, in which case any values of \code{center} and \code{scale} are ignored.}
40
+
41
+\item{center}{A numeric vector of length equal to \code{ncol(x)}, where each element is to be subtracted from the corresponding column of \code{x}.
42
+A \code{NULL} value indicates that no subtraction is to be performed.
43
+Alternatively \code{TRUE}, in which case it is set to the column means of \code{x}.}
44
+
45
+\item{scale}{A numeric vector of length equal to \code{ncol(x)}, where each element is to divided from the corresponding column of \code{x} (after subtraction).
46
+A \code{NULL} value indicates that no division is to be performed.
47
+Alternatively \code{TRUE}, in which case it is set to the column-wise root-mean-squared differences from \code{center} 
48
+(interpretable as standard deviations if \code{center} is set to the column means, see \code{\link{scale}} for commentary).}
49
+}
50
+\value{
51
+The \code{ScaledMatrixSeed} constructor will return a ScaledMatrixSeed object.
52
+
53
+The \code{ScaledMatrix} constructor will return a ScaledMatrix object equivalent to \code{t((t(x) - center)/scale)}.
54
+}
55
+\description{
56
+Defines the ScaledMatrixSeed and ScaledMatrix classes and their associated methods.
57
+These classes support delayed centering and scaling of the columns in the same manner as \code{\link{scale}},
58
+but preserving the original data structure for more efficient operations like matrix multiplication.
59
+}
60
+\section{Methods for ScaledMatrixSeed objects}{
61
+
62
+ScaledMatrixSeed objects are implemented as \linkS4class{DelayedMatrix} backends.
63
+They support standard operations like \code{dim}, \code{dimnames} and \code{extract_array}.
64
+
65
+Passing a ScaledMatrixSeed object to the \code{\link{DelayedArray}} constructor will create a ScaledMatrix object.
66
+
67
+It is possible for \code{x} to contain a ScaledMatrix, thus nesting one ScaledMatrix inside another.
68
+This can occasionally be useful in combination with transposition to achieve centering/scaling in both dimensions.
69
+}
70
+
71
+\section{Methods for ScaledMatrix objects}{
72
+
73
+ScaledMatrix objects are derived from \linkS4class{DelayedMatrix} objects and support all of valid operations on the latter.
74
+Several functions are specialized for greater efficiency when operating on ScaledMatrix instances, including:
75
+\itemize{
76
+    \item Subsetting, transposition and replacement of row/column names.
77
+        These will return a new ScaledMatrix rather than a DelayedMatrix.
78
+    \item Matrix multiplication via \code{\%*\%}, \code{crossprod} and \code{tcrossprod}.
79
+        These functions will return a DelayedMatrix.
80
+    \item Calculation of row and column sums and means by \code{colSums}, \code{rowSums}, etc. 
81
+}
82
+
83
+All other operations applied to a ScaledMatrix will use the underlying \pkg{DelayedArray} machinery.
84
+Unary or binary operations will generally create a new DelayedMatrix instance containing a ScaledMatrixSeed.
85
+
86
+Tranposition can effectively be used to allow centering/scaling on the rows if the input \code{x} is transposed.
87
+}
88
+
89
+\section{Efficiency vs precision}{
90
+
91
+The raison d'etre of the ScaledMatrix is that it can offer faster matrix multiplication by avoiding the \pkg{DelayedArray} block processing.
92
+This is done by refactoring the scaling/centering operations to use the (hopefully more efficient) multiplication operator of the original matrix \code{x}.
93
+Unfortunately, the speed-up comes at the cost of increasing the risk of catastrophic cancellation.
94
+The procedure requires subtraction of one large intermediate number from another to obtain the values of the final matrix product.
95
+This could result in a loss of numerical precision that compromises the accuracy of downstream algorithms. 
96
+In practice, this does not seem to be a major concern though one should be careful if the input \code{x} contains very large positive/negative values.
97
+}
98
+
99
+\examples{
100
+library(Matrix)
101
+y <- ScaledMatrix(rsparsematrix(10, 20, 0.1), 
102
+    center=rnorm(20), scale=1+runif(20))
103
+y
104
+
105
+crossprod(y)
106
+tcrossprod(y)
107
+y \%*\% rnorm(20)
108
+
109
+}
110
+\author{
111
+Aaron Lun
112
+}
0 113
new file mode 100644
... ...
@@ -0,0 +1,3 @@
1
+library(testthat)
2
+library(ScaledMatrix)
3
+test_check("ScaledMatrix")
0 4
new file mode 100644
... ...
@@ -0,0 +1,76 @@
1
+scale_and_center <- function(y, ref, code) {
2
+    center <- scale <- NULL
3
+
4
+    if (code==1L) {
5
+        center <- colMeans(ref)
6
+        scale <- runif(ncol(ref))
7
+        ref <- scale(ref, center=center, scale=scale)
8
+    } else if (code==2L) {
9
+        center <- rnorm(ncol(ref))
10
+        ref <- scale(ref, center=center, scale=FALSE)
11
+    } else if (code==3L) {
12
+        scale <- runif(ncol(ref))
13
+        ref <- scale(ref, center=FALSE, scale=scale)
14
+    }
15
+
16
+    # Getting rid of excess attributes.
17
+    attr(ref, "scaled:center") <- NULL
18
+    attr(ref, "scaled:scale") <- NULL
19
+
20
+    def <- ScaledMatrix(y, center=center, scale=scale)
21
+    list(def=def, ref=ref)
22
+}
23
+
24
+spawn_scenarios_basic <- function(NR, NC, CREATOR, REALIZER) {
25
+    output <- vector("list", 8)
26
+    counter <- 1L
27
+
28
+    for (trans in c(FALSE, TRUE)) {
29
+        for (it in 1:4) {
30
+            if (trans) { 
31
+                # Ensure output matrix has NR rows and NC columns after t().
32
+                y <- CREATOR(NC, NR)
33
+            } else {
34
+                y <- CREATOR(NR, NC)
35
+            }
36
+            ref <- REALIZER(y) 
37
+
38
+            adjusted <- scale_and_center(y, ref, it) 
39
+            if (trans) {
40
+                adjusted$def <- t(adjusted$def)
41
+                adjusted$ref <- t(adjusted$ref)
42
+            }
43
+            
44
+            output[[counter]] <- adjusted
45
+            counter <- counter+1L
46
+        }
47
+    }
48
+    output
49
+}
50
+
51
+spawn_scenarios <- function(NR=50, NC=20) {
52
+    c(
53
+        spawn_scenarios_basic(NR, NC,
54
+            CREATOR=function(r, c) {
55
+                matrix(rnorm(r*c), ncol=c)
56
+            }, 
57
+            REALIZER=identity
58
+        ),
59
+        spawn_scenarios_basic(NR, NC, 
60
+            CREATOR=function(r, c) {
61
+                Matrix::rsparsematrix(r, c, 0.1)
62
+            },
63
+            REALIZER=as.matrix
64
+        )
65
+    )
66
+}
67
+
68
+expect_equal_product <- function(x, y) {
69
+    expect_s4_class(x, "DelayedMatrix")
70
+    X <- as.matrix(x)
71
+
72
+    # standardize NULL dimnames.
73
+    if (all(lengths(dimnames(X))==0L)) dimnames(X) <- NULL 
74
+    if (all(lengths(dimnames(y))==0L)) dimnames(y) <- NULL 
75
+    expect_equal(X, y)
76
+}
0 77
new file mode 100644
... ...
@@ -0,0 +1,101 @@
1
+# Tests the ScaledMatrix implementation.
2
+# library(testthat); library(ScaledMatrix); source("setup.R"); source("test-class.R")
3
+
4
+set.seed(100001)
5
+test_that("ScaledMatrix utility functions work as expected", {
6
+    possibles <- spawn_scenarios()
7
+    for (test in possibles) {
8
+        expect_s4_class(test$def, "ScaledMatrix")
9
+        expect_identical(test$def, ScaledMatrix(DelayedArray::seed(test$def)))
10
+
11
+        expect_identical(dim(test$def), dim(test$ref))
12
+        expect_identical(extract_array(test$def, list(1:10, 1:10)), test$ref[1:10, 1:10])
13
+        expect_identical(extract_array(test$def, list(1:10, NULL)), test$ref[1:10,])
14
+        expect_identical(extract_array(test$def, list(NULL, 1:10)), test$ref[,1:10])
15
+        expect_identical(as.matrix(test$def), test$ref)
16
+
17
+        expect_equal(rowSums(test$def), rowSums(test$ref))
18
+        expect_equal(colSums(test$def), colSums(test$ref))
19
+        expect_equal(rowMeans(test$def), rowMeans(test$ref))
20
+        expect_equal(colMeans(test$def), colMeans(test$ref))
21
+
22
+        tdef <- t(test$def)
23
+        expect_s4_class(tdef, "ScaledMatrix") # still a DefMat!
24
+        expect_identical(t(tdef), test$def)
25
+        expect_identical(as.matrix(tdef), t(test$ref))
26
+
27
+        # Checking column names getting and setting.
28
+        spawn_names <- sprintf("THING_%i", seq_len(ncol(test$def)))
29
+        colnames(test$def) <- spawn_names
30
+        expect_identical(spawn_names, colnames(test$def))
31
+        expect_s4_class(test$def, "ScaledMatrix") # still a DefMat!
32
+    }
33
+})
34
+
35
+set.seed(10000101)
36
+test_that("ScaledMatrix silly inputs work as expected", {
37
+    default <- ScaledMatrix()
38
+    expect_identical(dim(default), c(0L, 0L))
39
+    val <- as.matrix(default)
40
+    dimnames(val) <- NULL
41
+    expect_identical(val, matrix(0,0,0))
42
+
43
+    # Checking erronious inputs.
44
+    y <- matrix(rnorm(400), ncol=20)
45
+    expect_error(ScaledMatrix(y, center=1), "length of 'center' must equal")
46
+    expect_error(ScaledMatrix(y, scale=1), "length of 'scale' must equal")
47
+})
48
+
49
+set.seed(1000011)
50
+test_that("ScaledMatrix subsetting works as expected", {
51
+    expect_identical_and_defmat <- function(x, y) {
52
+        expect_s4_class(x, "ScaledMatrix") # class is correctly preserved by direct seed modification.
53
+        expect_identical(as.matrix(x), y)
54
+    }
55
+
56
+    possibles <- spawn_scenarios()
57
+    for (test in possibles) {
58
+        i <- sample(nrow(test$def))
59
+        j <- sample(ncol(test$def))
60
+        expect_identical_and_defmat(test$def[i,], test$ref[i,])
61
+        expect_identical_and_defmat(test$def[,j], test$ref[,j])
62
+        expect_identical_and_defmat(test$def[i,j], test$ref[i,j])
63
+        
64
+        # Works with zero dimensions.
65
+        expect_identical_and_defmat(test$def[0,], test$ref[0,])
66
+        expect_identical_and_defmat(test$def[,0], test$ref[,0])
67
+        expect_identical_and_defmat(test$def[0,0], test$ref[0,0])
68
+        
69
+        # Dimension dropping works as expected.
70
+        expect_identical(test$def[i[1],], test$ref[i[1],])
71
+        expect_identical(test$def[,j[1]], test$ref[,j[1]])
72
+        expect_identical_and_defmat(test$def[i[1],drop=FALSE], test$ref[i[1],,drop=FALSE])
73
+        expect_identical_and_defmat(test$def[,j[1],drop=FALSE], test$ref[,j[1],drop=FALSE])
74
+
75
+        # Transposition with subsetting works as expected.
76
+        alt <- t(test$def)
77
+        expect_identical(t(alt[,i]), test$def[i,])
78
+        expect_identical(t(alt[j,]), test$def[,j])
79
+
80
+        # Subsetting behaves with column names.
81
+        spawn_names <- sprintf("THING_%i", seq_len(ncol(test$def)))
82
+        colnames(test$def) <- spawn_names
83
+        colnames(test$ref) <- spawn_names
84
+        ch <- sample(spawn_names)
85
+        expect_identical_and_defmat(test$def[,ch], test$ref[,ch])
86
+    }
87
+})
88
+
89
+test_that("DelayedMatrix wrapping works", {
90
+    possibles <- spawn_scenarios(80, 50)
91
+    for (test in possibles) {
92
+        expect_equal_product(test$def+1, test$ref+1)
93
+
94
+        v <- rnorm(nrow(test$def))
95
+        expect_equal_product(test$def+v, test$ref+v)
96
+        expect_equal_product(test$def*v, test$ref*v)
97
+
98
+        w <- rnorm(ncol(test$def))
99
+        expect_equal_product(sweep(test$def, 2, w, "*"), sweep(test$ref, 2, w, "*"))
100
+    }
101
+})
0 102
new file mode 100644
... ...
@@ -0,0 +1,381 @@
1
+# Tests the ScaledMatrix implementation.
2
+# library(testthat); library(ScaledMatrix); source("setup.R"); source("test-mult.R")
3
+
4
+##########################
5
+# Defining a class that can't do anything but get multiplied.
6
+# This checks that there isn't any hidden DelayedArray realization 
7
+# happening, which would give the same results but slower.
8
+
9
+setClass("CrippledMatrix", slots=c(x="matrix"))
10
+
11
+setMethod("dim", c("CrippledMatrix"), function(x) dim(x@x))
12
+
13
+setMethod("colSums", c("CrippledMatrix"), function(x) colSums(x@x))
14
+
15
+setMethod("rowSums", c("CrippledMatrix"), function(x) rowSums(x@x))
16
+
17
+setMethod("sweep", c("CrippledMatrix"), function (x, MARGIN, STATS, FUN = "-", check.margin = TRUE, ...) {
18
+    sweep(x@x, MARGIN, STATS, FUN, check.margin, ...)
19
+})
20
+
21
+setMethod("%*%", c("CrippledMatrix", "ANY"), function(x, y) x@x %*% y)
22
+
23
+setMethod("%*%", c("ANY", "CrippledMatrix"), function(x, y) x %*% y@x)
24
+
25
+setMethod("crossprod", c("CrippledMatrix", "missing"), function(x, y) crossprod(x@x))
26
+
27
+setMethod("crossprod", c("CrippledMatrix", "ANY"), function(x, y) crossprod(x@x, y))
28
+
29
+setMethod("crossprod", c("ANY", "CrippledMatrix"), function(x, y) crossprod(x, y@x))
30
+
31
+setMethod("tcrossprod", c("CrippledMatrix", "missing"), function(x, y) tcrossprod(x@x))
32
+
33
+setMethod("tcrossprod", c("CrippledMatrix", "ANY"), function(x, y) tcrossprod(x@x, y))
34
+
35
+setMethod("tcrossprod", c("ANY", "CrippledMatrix"), function(x, y) tcrossprod(x, y@x))
36
+
37
+spawn_extra_scenarios <- function(NR=50, NC=20) {
38
+    c(
39
+        spawn_scenarios(NR, NC),
40
+        spawn_scenarios_basic(NR, NC, 
41
+            CREATOR=function(r, c) { 
42
+                new("CrippledMatrix", x=matrix(runif(NR*NC), ncol=NC))
43
+            }, 
44
+            REALIZER=function(x) x@x
45
+        )
46
+    )
47
+}
48
+
49
+##########################
50
+
51
+test_that("ScaledMatrix right multiplication works as expected", {
52
+    possibles <- spawn_extra_scenarios(100, 50)
53
+    for (test in possibles) {
54
+        ref.y <- test$ref
55
+        bs.y <- test$def
56
+
57
+        # Multiply by a vector.
58
+        z <- rnorm(ncol(ref.y))
59
+        expect_equal_product(bs.y %*% z, ref.y %*% z)
60
+
61
+        # Multiply by a matrix.
62
+        z <- matrix(rnorm(ncol(ref.y)*10), ncol=10)
63
+        expect_equal_product(bs.y %*% z, ref.y %*% z)
64
+
65
+        # Multiply by an empty matrix.
66
+        z <- matrix(0, ncol=0, nrow=ncol(ref.y))
67
+        expect_equal_product(bs.y %*% z, ref.y %*% z)
68
+    }
69
+})
70
+
71
+test_that("ScaledMatrix left multiplication works as expected", {
72
+    possibles <- spawn_extra_scenarios(50, 80)
73
+    for (test in possibles) {
74
+        ref.y <- test$ref
75
+        bs.y <- test$def
76
+
77
+        # Multiply by a vector.
78
+        z <- rnorm(nrow(ref.y))
79
+        expect_equal_product(z %*% bs.y, z %*% ref.y)
80
+
81
+        # Multiply by a matrix.
82
+        z <- matrix(rnorm(nrow(ref.y)*10), nrow=10)
83
+        expect_equal_product(z %*% bs.y, z %*% ref.y)
84
+
85
+        # Multiply by an empty matrix.
86
+        z <- matrix(0, nrow=0, ncol=nrow(ref.y))
87
+        expect_equal_product(z %*% bs.y, z %*% ref.y)
88
+    }
89
+})
90
+
91
+test_that("ScaledMatrix dual multiplication works as expected", {
92
+    # Not using the CrippledMatrix here; some scaling of the inner matrix is unavoidable
93
+    # when the inner matrix is _not_ a ScaledMatrix but is being multiplied by one.
94
+    possibles1 <- spawn_scenarios(10, 20)
95
+    for (test1 in possibles1) {
96
+        possibles2 <- spawn_scenarios(20, 15)
97
+        for (test2 in possibles2) {
98
+
99
+            expect_equal_product(test1$def %*% test2$def, test1$ref %*% test2$ref)
100
+
101
+            # Checking that zero-dimension behaviour is as expected.
102
+            expect_equal_product(test1$def[0,] %*% test2$def, test1$ref[0,] %*% test2$ref)
103
+            expect_equal_product(test1$def %*% test2$def[,0], test1$ref %*% test2$ref[,0])
104
+            expect_equal_product(test1$def[,0] %*% test2$def[0,], test1$ref[,0] %*% test2$ref[0,])
105
+            expect_equal_product(test1$def[0,] %*% test2$def[,0], test1$ref[0,] %*% test2$ref[,0])
106
+        }
107
+    }
108
+})
109
+
110
+##########################
111
+
112
+test_that("ScaledMatrix lonely crossproduct works as expected", {
113
+    possibles <- spawn_extra_scenarios(90, 30)
114
+    for (test in possibles) {
115
+        ref.y <- test$ref
116
+        bs.y <- test$def
117
+        expect_equal_product(crossprod(bs.y), crossprod(ref.y))
118
+    }
119
+})
120
+
121
+test_that("ScaledMatrix crossproduct from right works as expected", {
122
+    possibles <- spawn_extra_scenarios(60, 50)
123
+    for (test in possibles) {
124
+        ref.y <- test$ref
125
+        bs.y <- test$def
126
+
127
+        # Multiply by a vector.
128
+        z <- rnorm(nrow(ref.y))
129
+        expect_equal_product(crossprod(bs.y, z), crossprod(ref.y, z))
130
+
131
+        # Multiply by a matrix.
132
+        z <- matrix(rnorm(nrow(ref.y)*10), ncol=10)
133
+        expect_equal_product(crossprod(bs.y, z), crossprod(ref.y, z))
134
+
135
+        # Multiply by an empty matrix.
136
+        z <- matrix(0, ncol=0, nrow=nrow(ref.y))
137
+        expect_equal_product(crossprod(bs.y, z), crossprod(ref.y, z))
138
+    }
139
+})
140
+
141
+test_that("ScaledMatrix crossproduct from left works as expected", {
142
+    possibles <- spawn_extra_scenarios(40, 100)
143
+    for (test in possibles) {
144
+        ref.y <- test$ref
145
+        bs.y <- test$def
146
+
147
+        # Multiply by a vector.
148
+        z <- rnorm(nrow(ref.y))
149
+        expect_equal_product(crossprod(z, bs.y), crossprod(z, ref.y))
150
+
151
+        # Multiply by a matrix.
152
+        z <- matrix(rnorm(nrow(ref.y)*10), ncol=10)
153
+        expect_equal_product(crossprod(z, bs.y), crossprod(z, ref.y))
154
+
155
+        # Multiply by an empty matrix.
156
+        z <- matrix(0, ncol=0, nrow=nrow(ref.y))
157
+        expect_equal_product(crossprod(z, bs.y), crossprod(z, ref.y))
158
+    }
159
+})
160
+
161
+test_that("ScaledMatrix dual crossprod works as expected", {
162
+    possibles1 <- spawn_scenarios(20, 50)
163
+    for (test1 in possibles1) {
164
+        possibles2 <- spawn_scenarios(20, 15)
165
+        for (test2 in possibles2) {
166
+
167
+            expect_equal_product(crossprod(test1$def, test2$def), crossprod(test1$ref, test2$ref))
168
+
169
+            # Checking that zero-dimension behaviour is as expected.
170
+            expect_equal_product(crossprod(test1$def[,0], test2$def), crossprod(test1$ref[,0], test2$ref))
171
+            expect_equal_product(crossprod(test1$def, test2$def[,0]), crossprod(test1$ref, test2$ref[,0]))
172
+            expect_equal_product(crossprod(test1$def[0,], test2$def[0,]), crossprod(test1$ref[0,], test2$ref[0,]))
173
+        }
174
+    }
175
+})
176
+
177
+##########################
178
+
179
+test_that("ScaledMatrix lonely tcrossproduct works as expected", {
180
+    possibles <- spawn_extra_scenarios(50, 80)
181
+    for (test in possibles) {
182
+        ref.y <- test$ref
183
+        bs.y <- test$def
184
+        expect_equal_product(tcrossprod(bs.y), tcrossprod(ref.y))
185
+    }
186
+})
187
+
188
+test_that("ScaledMatrix tcrossproduct from right works as expected", {
189
+    possibles <- spawn_extra_scenarios(60, 70)
190
+    for (test in possibles) {
191
+        ref.y <- test$ref
192
+        bs.y <- test$def
193
+
194
+        # Multiply by a vector (this doesn't work).
195
+        z <- rnorm(ncol(ref.y))
196
+        expect_error(tcrossprod(bs.y, z), "non-conformable")
197
+        expect_error(tcrossprod(ref.y, z), "non-conformable")
198
+
199
+        # Multiply by a matrix.
200
+        z <- matrix(rnorm(ncol(ref.y)*10), nrow=10)
201
+        expect_equal_product(tcrossprod(bs.y, z), tcrossprod(ref.y, z))
202
+
203
+        # Multiply by an empty matrix.
204
+        z <- matrix(0, nrow=0, ncol=ncol(ref.y))
205
+        expect_equal_product(tcrossprod(bs.y, z), tcrossprod(ref.y, z))
206
+    }
207
+})
208
+
209
+test_that("ScaledMatrix tcrossproduct from left works as expected", {
210
+    possibles <- spawn_extra_scenarios(80, 50)
211
+    for (test in possibles) {
212
+        ref.y <- test$ref
213
+        bs.y <- test$def
214
+
215
+        # Multiply by a vector.
216
+        z <- rnorm(ncol(ref.y))
217
+        expect_equal_product(tcrossprod(z, bs.y), tcrossprod(z, ref.y))
218
+
219
+        # Multiply by a matrix.
220
+        z <- matrix(rnorm(ncol(ref.y)*10), nrow=10)
221
+        expect_equal_product(tcrossprod(z, bs.y), tcrossprod(z, ref.y))
222
+
223
+        # Multiply by an empty matrix.
224
+        z <- matrix(0, nrow=0, ncol=ncol(ref.y))
225
+        expect_equal_product(tcrossprod(z, bs.y), tcrossprod(z, ref.y))
226
+    }
227
+})
228
+
229
+test_that("ScaledMatrix dual tcrossprod works as expected", {
230
+    possibles1 <- spawn_scenarios(20, 50)
231
+    for (test1 in possibles1) {
232
+        possibles2 <- spawn_scenarios(25, 50)
233
+        for (test2 in possibles2) {
234
+
235
+            expect_equal_product(tcrossprod(test1$def, test2$def), tcrossprod(test1$ref, test2$ref))
236
+
237
+            # Checking that zero-dimension behaviour is as expected.
238
+            expect_equal_product(tcrossprod(test1$def[0,], test2$def), tcrossprod(test1$ref[0,], test2$ref))
239
+            expect_equal_product(tcrossprod(test1$def, test2$def[0,]), tcrossprod(test1$ref, test2$ref[0,]))
240
+            expect_equal_product(tcrossprod(test1$def[,0], test2$def[,0]), tcrossprod(test1$ref[,0], test2$ref[,0]))
241
+        }
242
+    }
243
+})
244
+
245
+##########################
246
+
247
+wrap_in_ScMat <- function(input, reference) 
248
+# Wrapping an input matrix in a ScaledMatrix.
249
+{
250
+    output <- vector("list", 8)
251
+    counter <- 1L
252
+
253
+    for (trans in c(FALSE, TRUE)) {
254
+        for (it in 1:4) {
255
+            if (trans) { 
256
+                y <- t(input)
257
+                ref <- t(reference)
258
+            } else {
259
+                ref <- reference
260
+                y <- input
261
+            }
262
+
263
+            adjusted <- scale_and_center(y, ref, it)
264
+            if (trans) {
265
+                adjusted$def <- t(adjusted$def)
266
+                adjusted$ref <- t(adjusted$ref)
267
+            }
268
+
269
+            output[[counter]] <- adjusted
270
+            counter <- counter+1L
271
+        }
272
+    }
273
+    output
274
+}
275
+
276
+test_that("nested ScaledMatrix works as expected", {
277
+    basic <- matrix(rnorm(400), ncol=20)
278
+
279
+    available <- list(list(def=basic, ref=basic))
280
+    for (nesting in 1:2) {
281
+        # Creating nested ScMats with and without scaling/centering/transposition.
282
+        next_available <- vector("list", length(available))
283
+        for (i in seq_along(available)) {
284
+            current <- available[[i]]
285
+            next_available[[i]] <- wrap_in_ScMat(current$def, current$ref)
286
+        }
287
+
288
+        # Testing each one of the newly created ScMats.
289
+        available <- unlist(next_available, recursive=FALSE)
290
+        for (i in seq_along(available)) {
291
+            test <- available[[i]]
292
+
293
+            # Coercion works.
294
+            expect_equal(as.matrix(test$def), test$ref)
295
+
296
+            # Basic stats work.
297
+            expect_equal(rowSums(test$ref), rowSums(test$def))
298
+            expect_equal(colSums(test$ref), colSums(test$def))
299
+
300
+            # Multiplication works.
301
+            y <- matrix(rnorm(20*2), ncol=2)
302
+            expect_equal_product(test$def %*% y, test$ref %*% y)
303
+            expect_equal_product(t(y) %*% test$def, t(y) %*% test$ref)
304
+
305
+            # Cross product.
306
+            y <- matrix(rnorm(20*2), ncol=2)
307
+            expect_equal_product(crossprod(test$def), crossprod(test$ref))
308
+            expect_equal_product(crossprod(test$def, y), crossprod(test$ref, y))
309
+            expect_equal_product(crossprod(y, test$def), crossprod(y, test$ref))
310
+
311
+            # Transposed cross product.
312
+            y <- matrix(rnorm(20*2), nrow=2) 
313
+            expect_equal_product(tcrossprod(test$def), tcrossprod(test$ref))
314
+            expect_equal_product(tcrossprod(test$def, y), tcrossprod(test$ref, y))
315
+            expect_equal_product(tcrossprod(y, test$def), tcrossprod(y, test$ref))
316
+        }
317
+    }
318
+})
319
+
320
+set.seed(1200001)
321
+test_that("deep testing of tcrossproduct internals: special mult", {
322
+    NR <- 20
323
+    NC <- 10
324
+    basic <- matrix(rnorm(NC*NR), ncol=NC)
325
+    c <- runif(NC)
326
+    s <- runif(NR)
327
+
328
+    ref <- t(matrix(c, NR, NC, byrow=TRUE)) %*% (basic/s^2)
329
+    out <- BiocSingular:::.internal_mult_special(c, s, basic)
330
+    expect_equal(ref, out)
331
+
332
+    available <- list(list(def=basic, ref=basic))
333
+    for (nesting in 1:2) {
334
+        # Creating nested ScMats with and without scaling/centering/transposition.
335
+        next_available <- vector("list", length(available))
336
+        for (i in seq_along(available)) {
337
+            current <- available[[i]]
338
+            next_available[[i]] <- wrap_in_ScMat(current$def, current$ref)
339
+        }
340
+
341
+        # Testing each one of the newly created nested ScMats.
342
+        available <- unlist(next_available, recursive=FALSE)
343
+        for (i in seq_along(available)) {
344
+            test <- available[[i]]
345
+            ref <- t(matrix(c, NR, NC, byrow=TRUE)) %*% (test$ref/s^2)
346
+            out <- BiocSingular:::.internal_mult_special(c, s, test$def)
347
+            expect_equal(ref, out)
348
+        }
349
+    }
350
+})
351
+
352
+set.seed(1200002)
353
+test_that("deep testing of tcrossproduct internals: scaled tcrossprod", {
354
+    NC <- 30
355
+    NR <- 15
356
+    s <- runif(NC) 
357
+    basic <- matrix(rnorm(NC*NR), ncol=NC)
358
+
359
+    ref <- crossprod(t(basic)/s)
360
+    out <- BiocSingular:::.internal_tcrossprod(basic, s)
361
+    expect_equal(ref, out)
362
+
363
+    available <- list(list(def=basic, ref=basic))
364
+    for (nesting in 1:2) {
365
+        # Creating nested ScMats with and without scaling/centering/transposition.
366
+        next_available <- vector("list", length(available))
367
+        for (i in seq_along(available)) {
368
+            current <- available[[i]]
369
+            next_available[[i]] <- wrap_in_ScMat(current$def, current$ref)
370
+        }
371
+
372
+        # Testing each one of the newly created nested ScMats.
373
+        available <- unlist(next_available, recursive=FALSE)
374
+        for (i in seq_along(available)) {
375
+            test <- available[[i]]
376
+            ref <- crossprod(t(test$ref)/s)
377
+            out <- BiocSingular:::.internal_tcrossprod(test$def, s)
378
+            expect_equal(ref, out)
379
+        }
380
+    }
381
+})
0 382
new file mode 100644
... ...
@@ -0,0 +1,25 @@
1
+# This tests the auto-scaling in the constructor.
2
+# library(testthat); library(ScaledMatrix); source("test-scale.R")
3
+
4
+stripscale <- function(mat, ...) {
5
+    out <- scale(mat, ...)
6
+    attr(out, "scaled:center") <- NULL
7
+    attr(out, "scaled:scale") <- NULL
8
+    out
9
+}
10
+
11
+test_that("ScaledMatrix mimics scale()", {
12
+    mat <- matrix(rnorm(10000), ncol=10)
13
+
14
+    out <- ScaledMatrix(mat, center=TRUE)
15
+    expect_identical(as.matrix(out), stripscale(mat, scale=FALSE))
16
+
17
+    out <- ScaledMatrix(mat, scale=TRUE)
18
+    expect_identical(as.matrix(out), stripscale(mat, center=FALSE))
19
+
20
+    out <- ScaledMatrix(mat, center=TRUE, scale=TRUE)
21
+    expect_identical(as.matrix(out), stripscale(mat))
22
+
23
+    out <- ScaledMatrix(mat)
24
+    expect_identical(as.matrix(out), mat)
25
+})
0 26
new file mode 100644
... ...
@@ -0,0 +1,128 @@
1
+---
2
+title: Using the `ScaledMatrix` class
3
+author: 
4
+- name: Aaron Lun
5
+  email: [email protected] 
6
+date: "Revised: 12 December 2020"
7
+output:
8
+  BiocStyle::html_document:
9
+    toc_float: true 
10
+package: ResidualMatrix
11
+vignette: >
12
+  %\VignetteIndexEntry{Using the ScaledMatrix}
13
+  %\VignetteEngine{knitr::rmarkdown}
14
+  %\VignetteEncoding{UTF-8}    
15
+---
16
+
17
+```{r, echo=FALSE, results="hide", message=FALSE}
18
+knitr::opts_chunk$set(error=FALSE, message=FALSE, warning=FALSE)
19
+```
20
+
21
+# Overview
22
+
23
+The `ScaledMatrix` provides yet another method of running `scale()` on a matrix.
24
+In other words, these three operations are equivalent:
25
+
26
+```{r}
27
+mat <- matrix(rnorm(10000), ncol=10)
28
+
29
+smat1 <- scale(mat)
30
+head(smat1)
31
+
32
+library(DelayedArray)
33
+smat2 <- scale(DelayedArray(mat))
34
+head(smat2)
35
+
36
+library(ScaledMatrix)
37
+smat3 <- ScaledMatrix(mat, center=TRUE, scale=TRUE)
38
+head(smat3)
39
+```
40
+
41
+The biggest difference lies in how they behave in downstream matrix operations.
42
+
43
+- `smat1` is an ordinary matrix, with the scaled and centered values fully realized in memory.
44
+Nothing too unusual here.
45
+- `smat2` is a `DelayedMatrix` and undergoes block processing whereby chunks are realized and operated on, one at a time.
46
+This sacrifices speed for greater memory efficiency by avoiding a copy of the entire matrix. 
47
+In particular, it preserves the structure of the original `mat`, e.g., from a sparse or file-backed representation. 
48
+- `smat3` is a `ScaledMatrix` that refactors certain operations so that they can be applied to the original `mat` without any scaling or centering.
49
+This takes advantage of the original data structure to speed up matrix multiplication and row/column sums,
50
+albeit at the cost of numerical precision.
51
+
52
+# Matrix multiplication
53
+
54
+Given an original matrix $\mathbf{X}$ with $n$ columns, a vector of column centers $\mathbf{c}$ and a vector of column scaling values $\mathbf{s}$, 
55
+our scaled matrix can be written as:
56
+
57
+$$
58
+\mathbf{Y} = (\mathbf{X} - \mathbf{c} \cdot \mathbf{1}_n^T) \mathbf{S}
59
+$$
60
+
61
+where $\mathbf{S} = \text{diag}(s_1^{-1}, ..., s_n^{-1})$.
62
+If we wanted to right-multiply it with another matrix $\mathbf{A}$, we would have:
63
+
64
+$$
65
+\mathbf{YA} = \mathbf{X}\mathbf{S}\mathbf{A} - \mathbf{c} \cdot \mathbf{1}_n^T \mathbf{S}\mathbf{A}
66
+$$
67
+
68
+The right-most expression is simply the outer product of $\mathbf{c}$ with the column sums of $\mathbf{SA}$.
69
+More important is the fact that we can use the matrix multiplication operator for $\mathbf{X}$ with $\mathbf{SA}$,
70
+as this allows us to use highly efficient algorithms for certain data representations, e.g., sparse matrices.
71
+
72
+```{r}
73
+library(Matrix)
74
+mat <- rsparsematrix(20000, 10000, density=0.01)
75
+smat <- ScaledMatrix(mat, center=TRUE, scale=TRUE)
76
+
77
+blob <- matrix(runif(ncol(mat) * 5), ncol=5)
78
+system.time(out <- smat %*% blob)
79
+
80
+# The slower way with block processing.
81
+da <- scale(DelayedArray(mat))
82
+system.time(out2 <- da %*% blob)
83
+```
84
+
85
+The same logic applies for left-multiplication and cross-products.
86
+This allows us to easily speed up high-level operations involving matrix multiplication by just switching to a `ScaledMatrix`,
87
+e.g., in approximate PCA algorithms from the `r Biocpkg("BiocSingular")` package.
88
+
89
+# Other utilities
90
+
91
+Row and column sums are special cases of matrix multiplication and can be computed quickly:
92
+
93
+```{r}
94
+system.time(rowSums(smat))
95
+system.time(rowSums(da))
96
+```
97
+
98
+Subsetting, transposition and renaming of the dimensions are all supported without loss of the `ScaledMatrix` representation:
99
+
100
+```{r}
101
+smat[,1:5]
102
+t(smat)
103
+rownames(smat) <- paste0("GENE_", 1:20000)
104
+smat
105
+```
106
+
107
+Other operations will cause the `ScaledMatrix` to collapse to the general `DelayedMatrix` representation, after which point block processing will be used.
108
+
109
+```{r}
110
+smat + 1
111
+```
112
+
113
+# Caveats 
114
+
115
+For most part, the implementation of the multiplication assumes that the $\mathbf{A}$ matrix and the matrix product are small compared to $\mathbf{X}$.
116
+It is also possible to multiply two `ScaledMatrix`es together if the underlying matrices have efficient operators for their product.
117
+However, if this is not the case, the `ScaledMatrix` offers little benefit for increased overhead.
118
+
119
+It is also worth noting that this speed-up is not entirely free. 
120
+The expression above involves subtracting two matrix with potentially large values, which runs the risk of catastrophic cancellation.
121
+In most practical applications, though, this does not seem to be a major concern, 
122
+especially as most values (e.g., log-normalized expression matrices) lie close to zero anyway.
123
+
124
+# Session information {-}
125
+
126
+```{r}
127
+sessionInfo()
128
+```