Bioconductor Code: structToolbox

Browse code

PLS updates (#64)

* add selectivity ratio

* replace vip_summary with feature_importance
renamed and now allows vip, sr and sr_pvalues to be plotted

* add equal_split model
random subsets so generate training sets with equal group numbers

* plot 1 - p-value
to conform with the "best" feature being a maximum value

* add resample iterator
subsample at random over a number of iterations. Option to use
different kinds of splitting methods. Corresponding chart.

* allow use of list() for factor_name

* force apply not to simplify output to guarantee returning a list

* update example

* add correct parameter
collect will collect the requested model output over all iterations in a list WORK IN PROGRESS

* add collection of multiple outputs of model sequence

* plot reg coeff on rhs

* match outputs of xval for use with grid search etc

* specify levels when converting predictions to factor

* change PLSDA to inherit from PLSR
rename some charts to be compatible with both PLSR and PLSDA

* allow y-block column selection

* re-assign y output after PLSR with factor

* update vignettes wrt PLS changes

* update documentation

* update R version to 4.1

* update documentation

* update documentation

* update scatter plot

- new scatter chart object
- used by PCA scores, PLSR/PLSDA scores
- other charts updated to reflect changes in scores plots where necessary
- added ycol param to plots for when y-block is a matrix

* add url to github

* add plsda scores alias

- plsda_scores_plot and pls_scores_plot do that same thing
Included for backwards compatability
- added components back as parameter for scores plots for backwards compatibility

* fix broken example

* fix broken tests

- scores is now returned as a DatasetExperiment object not a data.frame

* Update data_analysis_omics_using_the_structtoolbox.Rmd

- wrt changes in scores plots

* update documentation

* fix colnames for Y matrix

Gavin Rhys Lloyd authored on 28/02/2022 12:38:08 • GitHub committed on 28/02/2022 12:38:08
Showing 1 changed files

@@ -14,7 +14,8 @@ split_data = function(p_train,...) {
                                      .split_data<-setClass(
                                          "split_data",
                                          contains = c('model'),
                                     -    slots=c(p_train='entity',
                                     +    slots=c(
                                     +        p_train='entity',
                                              training='entity',
                                              testing='entity'
                                          ),

@@ -1,9 +1,4 @@
                                     -#' Split data into subsets
                                     -#'
                                     -#' Splits the data into a training and test set.
                                     -#' @param p_train The proportion of samples in the training set.
                                     -#' @param ... additional slots and values passed to struct_class
                                     -#' @return struct object
                                     +#' @eval get_description('split_data')
                                      #' @export split_data
                                      #' @examples
                                      #' M = split_data(p_train=0.75)
@@ -26,14 +21,17 @@ split_data = function(p_train,...) {
                                          prototype=list(
                                              name = 'Split data',
                                     -        description = 'Splits the data into a training and test set',
                                     +        description = paste0('The data matrix is divided into two subsets.',
                                     +        'A predefined proportion of the samples are randomly selected for a ',
                                     +        'training set, and the remaining samples are used for the test set.'),
                                              type = 'processing',
                                              predicted = 'testing',
                                              .params=c('p_train'),
                                              .outputs=c('training','testing'),
                                              p_train=entity(name = 'Proportion in training set',
                                     -            description = 'The proportion of samples selected for the training set. All other samples will be in assigned to the test set.',
                                     +            description = paste0('The proportion of samples selected for the ',
                                     +            'training set.'),
                                                  value = 0.75,
                                                  type='numeric'),

@@ -6,7 +6,7 @@
                                      #' @return struct object
                                      #' @export split_data
                                      #' @examples
                                     -#' M = split_data()
                                     +#' M = split_data(p=0.75)
                                      #'
                                      split_data = function(p,...) {
                                          out=struct::new_struct('split_data',

@@ -1,15 +1,17 @@
                                     -#' split data into sets
                                     +#' Split data into subsets
                                      #'
                                     -#' Splits the data into a training and test set
                                     +#' Splits the data into a training and test set.
                                     +#' @param p The proportion of samples in the training set.
                                      #' @param ... additional slots and values passed to struct_class
                                      #' @return struct object
                                      #' @export split_data
                                      #' @examples
                                      #' M = split_data()
                                      #'
                                     -split_data = function(...) {
                                     -    out=.split_data()
                                     -    out=struct::new_struct(out,...)
                                     +split_data = function(p,...) {
                                     +    out=struct::new_struct('split_data',
                                     +        p=p,
                                     +        ...)
                                          return(out)
+                                     }
@@ -26,6 +28,8 @@ split_data = function(...) {
                                              description = 'Splits the data into a training and test set',
                                              type = 'processing',
                                              predicted = 'testing',
                                     +        .params=c('p'),
                                     +        .outputs=c('training','testing'),
                                              p=entity(name = 'Proportion in training set',
                                                  description = 'The proportion of samples selected for the training set. All other samples will be in assigned to the test set.',
@@ -45,13 +49,11 @@ split_data = function(...) {
+                                         )
+                                     )
                                     -#' @param ... additional slots and values passed to struct_class
                                      #' @export
                                      #' @template model_apply
                                      setMethod(f="model_apply",
                                          signature=c("split_data","DatasetExperiment"),
                                     -    definition=function(M,D)
                                     -    {
                                     +    definition=function(M,D) {
                                              opt=param_list(M)
                                              # number of samples
                                              nMax=nrow(D$data)

@@ -9,7 +9,7 @@
                                      #'
                                      split_data = function(...) {
                                          out=.split_data()
                                     -    out=struct::.initialize_struct_class(out,...)
                                     +    out=struct::new_struct(out,...)
                                          return(out)
+                                     }
@@ -17,9 +17,9 @@ split_data = function(...) {
                                      .split_data<-setClass(
                                          "split_data",
                                          contains = c('model'),
                                     -    slots=c(params_p='entity',
                                     -        outputs_training='entity',
                                     -        outputs_testing='entity'
                                     +    slots=c(p='entity',
                                     +        training='entity',
                                     +        testing='entity'
                                          ),
                                          prototype=list(name = 'Split data',
@@ -27,17 +27,17 @@ split_data = function(...) {
                                              type = 'processing',
                                              predicted = 'testing',
                                     -        params_p=entity(name = 'Proportion in training set',
                                     +        p=entity(name = 'Proportion in training set',
                                                  description = 'The proportion of samples selected for the training set. All other samples will be in assigned to the test set.',
                                                  value = 0.75,
                                                  type='numeric'),
                                     -        outputs_training=entity(name = 'A DatasetExperiment of training data',
                                     +        training=entity(name = 'A DatasetExperiment of training data',
                                                  description = 'A DatasetExperiment object containing samples selected for the training set.',
                                                  type='DatasetExperiment',
                                                  value=DatasetExperiment()
                                              ),
                                     -        outputs_testing=entity(name = 'A DatasetExperiment of data for testing',
                                     +        testing=entity(name = 'A DatasetExperiment of data for testing',
                                                  description = 'A DatasetExperiment object containing samples selected for the testing set.',
                                                  type='DatasetExperiment',
                                                  value=DatasetExperiment()

@@ -1,11 +1,19 @@
                                      #' split data into sets
                                      #'
                                      #' Splits the data into a training and test set
                                     +#' @param ... slots and values for the new object
                                      #' @export split_data
                                      #' @examples
                                      #' M = split_data()
                                      #'
                                     -split_data<-setClass(
                                     +split_data = function(...) {
                                     +    out=.split_data()
                                     +    out=struct::.initialize_struct_class(out,...)
                                     +    return(out)
                                     +}
+                                    +
+                                    +
                                     +.split_data<-setClass(
                                          "split_data",
                                          contains = c('model'),
                                          slots=c(params_p='entity',
@@ -36,6 +44,7 @@ split_data<-setClass(
+                                         )
+                                     )
                                     +#' @param ... slots and values for the new object
                                      #' @export
                                      #' @template model_apply
                                      setMethod(f="model_apply",
@@ -51,14 +60,14 @@ setMethod(f="model_apply",
                                              in_training=sample(x=1:nMax,size = n, replace=FALSE,prob=NULL)
                                              training=DatasetExperiment(data=D$data[in_training,,drop=FALSE],
                                                  sample_meta=D$sample_meta[in_training,,drop=FALSE],
                                     -            variable_meta=dobj$variable_meta,
                                     -            name=c(name(D),'(Training set)'),
                                     -            description=c(description(D),'A subset of the data has been selected as a training set'))
                                     +            variable_meta=D$variable_meta,
                                     +            name=c(D$name,'(Training set)'),
                                     +            description=c(D$description,'A subset of the data has been selected as a training set'))
                                              testing=DatasetExperiment(data=D$data[-in_training,,drop=FALSE],
                                                  sample_meta=D$sample_meta[-in_training,,drop=FALSE],
                                     -            variable_meta=dobj$variable_meta,
                                     -            name=c(name(D),'(Testing set)'),
                                     -            description=c(description(D),'A subset of the data has been selected as a test set'))
                                     +            variable_meta=D$variable_meta,
                                     +            name=c(D$name,'(Testing set)'),
                                     +            description=c(D$description,'A subset of the data has been selected as a test set'))
                                              output_value(M,'training')=training
                                              output_value(M,'testing')=testing

@@ -37,7 +37,7 @@ split_data<-setClass(
+                                     )
                                      #' @export
                                     -#' @template method_apply
                                     +#' @template model_apply
                                      setMethod(f="model.apply",
                                          signature=c("split_data","dataset"),
                                          definition=function(M,D)

@@ -3,61 +3,61 @@
                                      #' Splits the data into a training and test set
                                      #' @export split_data
                                      split_data<-setClass(
                                     -  "split_data",
                                     -  contains = c('method'),
                                     -  slots=c(params.p='entity',
                                     -          outputs.training='entity',
                                     -          outputs.testing='entity'
                                     -  ),
                                     +    "split_data",
                                     +    contains = c('method'),
                                     +    slots=c(params.p='entity',
                                     +        outputs.training='entity',
                                     +        outputs.testing='entity'
                                     +    ),
                                     -  prototype=list(name = 'Split data',
                                     -                 description = 'Splits the data into a training and test set',
                                     -                 type = 'processing',
                                     -                 predicted = 'testing',
                                     +    prototype=list(name = 'Split data',
                                     +        description = 'Splits the data into a training and test set',
                                     +        type = 'processing',
                                     +        predicted = 'testing',
                                     -                 params.p=entity(name = 'Proportion in training set',
                                     -                                        description = 'The proportion of samples selected for the training set. All other samples willbe in assigned to the test set.',
                                     -                                        value = 0.75,
                                     -                                        type='numeric'),
                                     +        params.p=entity(name = 'Proportion in training set',
                                     +            description = 'The proportion of samples selected for the training set. All other samples willbe in assigned to the test set.',
                                     +            value = 0.75,
                                     +            type='numeric'),
                                     -                 outputs.training=entity(name = 'A dataset of training data',
                                     -                                            description = 'A dataset object containing samples selected for the training set.',
                                     -                                            type='dataset',
                                     -                                            value=dataset()
                                     -                 ),
                                     -                 outputs.testing=entity(name = 'A dataset of data for testing',
                                     -                                         description = 'A dataset object containing samples selected for the testing set.',
                                     -                                         type='dataset',
                                     -                                         value=dataset()
                                     -                 )
                                     -  )
                                     +        outputs.training=entity(name = 'A dataset of training data',
                                     +            description = 'A dataset object containing samples selected for the training set.',
                                     +            type='dataset',
                                     +            value=dataset()
                                     +        ),
                                     +        outputs.testing=entity(name = 'A dataset of data for testing',
                                     +            description = 'A dataset object containing samples selected for the testing set.',
                                     +            type='dataset',
                                     +            value=dataset()
                                     +        )
                                     +    )
+                                     )
                                      #' @export
                                      setMethod(f="method.apply",
                                     -          signature=c("split_data","dataset"),
                                     -          definition=function(M,D)
                                     -          {
                                     -            opt=param.list(M)
                                     -            # number of samples
                                     -            nMax=nrow(dataset.data(D))
                                     -            # number in the training set
                                     -            n=floor(nMax*opt$p)
                                     -            # select a random subset of the data for training
                                     -            in_training=sample(x=1:nMax,size = n, replace=FALSE,prob=NULL)
                                     -            training=dataset(data=dataset.data(D)[in_training,,drop=FALSE],
                                     -                             sample_meta=dataset.sample_meta(D)[in_training,,drop=FALSE],
                                     -                             variable_meta=dataset.variable_meta(D),
                                     -                             name=c(name(D),'(Training set)'),
                                     -                             description=c(description(D),'A subset of the data has been selected as a training set'))
                                     -            testing=dataset(data=dataset.data(D)[-in_training,,drop=FALSE],
                                     -                             sample_meta=dataset.sample_meta(D)[-in_training,,drop=FALSE],
                                     -                             variable_meta=dataset.variable_meta(D),
                                     -                             name=c(name(D),'(Testing set)'),
                                     -                             description=c(description(D),'A subset of the data has been selected as a test set'))
                                     -            output.value(M,'training')=training
                                     -            output.value(M,'testing')=testing
                                     +    signature=c("split_data","dataset"),
                                     +    definition=function(M,D)
                                     +    {
                                     +        opt=param.list(M)
                                     +        # number of samples
                                     +        nMax=nrow(dataset.data(D))
                                     +        # number in the training set
                                     +        n=floor(nMax*opt$p)
                                     +        # select a random subset of the data for training
                                     +        in_training=sample(x=1:nMax,size = n, replace=FALSE,prob=NULL)
                                     +        training=dataset(data=dataset.data(D)[in_training,,drop=FALSE],
                                     +            sample_meta=dataset.sample_meta(D)[in_training,,drop=FALSE],
                                     +            variable_meta=dataset.variable_meta(D),
                                     +            name=c(name(D),'(Training set)'),
                                     +            description=c(description(D),'A subset of the data has been selected as a training set'))
                                     +        testing=dataset(data=dataset.data(D)[-in_training,,drop=FALSE],
                                     +            sample_meta=dataset.sample_meta(D)[-in_training,,drop=FALSE],
                                     +            variable_meta=dataset.variable_meta(D),
                                     +            name=c(name(D),'(Testing set)'),
                                     +            description=c(description(D),'A subset of the data has been selected as a test set'))
                                     +        output.value(M,'training')=training
                                     +        output.value(M,'testing')=testing
                                     -            return(M)
                                     -          }
                                     +        return(M)
                                     +    }
+                                     )

PLS updates (#64)

Release 3 12 candidate (#32)

add functionality related to gastric_cancer vignette (see description)

fix/update examples

update to use new struct class constructors

incremental changes to use struct class constructors

remove all params_ and outputs_ tags

minor text fixes

add @return to documentation

fix broken tests and...

use class contructors and...

change to use train/predict and apply

convert all methods to models

update documentation for BiocCheck

use method_apply roxygen template

change indentation for biocCheck

remove params and outputs slots

initial commit