Browse code

PLS updates (#64)

* add selectivity ratio

* replace vip_summary with feature_importance
renamed and now allows vip, sr and sr_pvalues to be plotted

* add equal_split model
random subsets so generate training sets with equal group numbers

* plot 1 - p-value
to conform with the "best" feature being a maximum value

* add resample iterator
subsample at random over a number of iterations. Option to use
different kinds of splitting methods. Corresponding chart.

* allow use of list() for factor_name

* force apply not to simplify output to guarantee returning a list

* update example

* add correct parameter
collect will collect the requested model output over all iterations in a list WORK IN PROGRESS

* add collection of multiple outputs of model sequence

* plot reg coeff on rhs

* match outputs of xval for use with grid search etc

* specify levels when converting predictions to factor

* change PLSDA to inherit from PLSR
rename some charts to be compatible with both PLSR and PLSDA

* allow y-block column selection

* re-assign y output after PLSR with factor

* update vignettes wrt PLS changes

* update documentation

* update R version to 4.1

* update documentation

* update documentation

* update scatter plot

- new scatter chart object
- used by PCA scores, PLSR/PLSDA scores
- other charts updated to reflect changes in scores plots where necessary
- added ycol param to plots for when y-block is a matrix

* add url to github

* add plsda scores alias

- plsda_scores_plot and pls_scores_plot do that same thing
Included for backwards compatability
- added components back as parameter for scores plots for backwards compatibility

* fix broken example

* fix broken tests

- scores is now returned as a DatasetExperiment object not a data.frame

* Update data_analysis_omics_using_the_structtoolbox.Rmd

- wrt changes in scores plots

* update documentation

* fix colnames for Y matrix

Gavin Rhys Lloyd authored on 28/02/2022 12:38:08 • GitHub committed on 28/02/2022 12:38:08
Showing 1 changed files
... ...
@@ -14,7 +14,8 @@ split_data = function(p_train,...) {
14 14
 .split_data<-setClass(
15 15
     "split_data",
16 16
     contains = c('model'),
17
-    slots=c(p_train='entity',
17
+    slots=c(
18
+        p_train='entity',
18 19
         training='entity',
19 20
         testing='entity'
20 21
     ),
Browse code

Release 3 12 candidate (#32)

* fix base=10 regardless of input (see #15)

class constructor was always setting base to 10 instead of the input value

* merge bug fix 1.01 into dev (#19)

* bug fix issue #7

Correctly re-order the sample_meta column for colouring samples in the dendrogram plot

* version bump

bug fix issue #7

* fix for https://github.com/computational-metabolomics/structToolbox/issues/18 (#20)

correctly reorder the factor labels so that the control group always ends up in the denominator for the fold change calculation.

* fix for https://github.com/computational-metabolomics/structToolbox/issues/18

fixed incorrect length check on matching class labels.

* Issue 17 ttest factor (#21)

* convert to factor if not one already

fix for issue #17

* update roxygen version

* fix for issue #9 (#22)

changed from lapply to vapply and used drop=FALSE to ensure compatibility with a single factor.

* allow user to set lambda (#24)

- lambda changed to input parameter. NULL = uses pmp optimisation
- model_predict now uses the set value of lambda, or lambda_opt if used.
- documentation updated

* Feature non parametric fold change (#26)

* add "median" method

based on DOI: 10.1080/00949650212140 can now calcuate fold changes equivalent to using medians and corresponding confidence intervals

* update documentation

* update median method

now correctly calculates ratio of medians

* use wilcox for paired median intervals

make use of wilcox.test to estimate intervals for the median when using median for paired samples

* Issue 23 filter by name (#27)

* fix for #23

moved all model_apply functionality to model_predict so that model_train and model_predict can be used as well as model_apply

* update documentation

* Update mean_of_medians.R (#29)

fix for #28
- correctly loop over all levels in the named factor

* Feature documentation 3 12 (#31)

* update documentation

Description and inputs now pulled from the object definitions for consistency.

* fix definition of label_features

allows NULL and description updated

* replace non ascii characters

* export mixed_effect object

* use correct object name to generate documentation

* export mixed_effect object

* remove non ascii characters

* update tests with new object name

* add import for capture.output

* add import for capture.output

* use pca_biplot in tests

chart was renamed

* add utils import

* update struct dependency version

* update documentation

* update news, version bump

Gavin Rhys Lloyd authored on 25/10/2020 08:50:13 • GitHub committed on 25/10/2020 08:50:13
Showing 1 changed files
... ...
@@ -1,9 +1,4 @@
1
-#' Split data into subsets
2
-#'
3
-#' Splits the data into a training and test set.
4
-#' @param p_train The proportion of samples in the training set.
5
-#' @param ... additional slots and values passed to struct_class
6
-#' @return struct object
1
+#' @eval get_description('split_data')
7 2
 #' @export split_data
8 3
 #' @examples
9 4
 #' M = split_data(p_train=0.75)
... ...
@@ -26,14 +21,17 @@ split_data = function(p_train,...) {
26 21
 
27 22
     prototype=list(
28 23
         name = 'Split data',
29
-        description = 'Splits the data into a training and test set',
24
+        description = paste0('The data matrix is divided into two subsets.',
25
+        'A predefined proportion of the samples are randomly selected for a ',
26
+        'training set, and the remaining samples are used for the test set.'),
30 27
         type = 'processing',
31 28
         predicted = 'testing',
32 29
         .params=c('p_train'),
33 30
         .outputs=c('training','testing'),
34 31
 
35 32
         p_train=entity(name = 'Proportion in training set',
36
-            description = 'The proportion of samples selected for the training set. All other samples will be in assigned to the test set.',
33
+            description = paste0('The proportion of samples selected for the ',
34
+            'training set.'),
37 35
             value = 0.75,
38 36
             type='numeric'),
39 37
 
Browse code

add functionality related to gastric_cancer vignette (see description)

+ AUC metric
+ PLS charts (reg coeff, ROC, VIP scores)
+ stratified data set splitting

Gavin Rhys Lloyd authored on 07/04/2020 14:39:32
Showing 1 changed files
... ...
@@ -1,16 +1,16 @@
1 1
 #' Split data into subsets
2 2
 #'
3 3
 #' Splits the data into a training and test set.
4
-#' @param p The proportion of samples in the training set.
4
+#' @param p_train The proportion of samples in the training set.
5 5
 #' @param ... additional slots and values passed to struct_class
6 6
 #' @return struct object
7 7
 #' @export split_data
8 8
 #' @examples
9
-#' M = split_data(p=0.75)
9
+#' M = split_data(p_train=0.75)
10 10
 #'
11
-split_data = function(p,...) {
11
+split_data = function(p_train,...) {
12 12
     out=struct::new_struct('split_data',
13
-        p=p,
13
+        p_train=p_train,
14 14
         ...)
15 15
     return(out)
16 16
 }
... ...
@@ -19,19 +19,20 @@ split_data = function(p,...) {
19 19
 .split_data<-setClass(
20 20
     "split_data",
21 21
     contains = c('model'),
22
-    slots=c(p='entity',
22
+    slots=c(p_train='entity',
23 23
         training='entity',
24 24
         testing='entity'
25 25
     ),
26 26
 
27
-    prototype=list(name = 'Split data',
27
+    prototype=list(
28
+        name = 'Split data',
28 29
         description = 'Splits the data into a training and test set',
29 30
         type = 'processing',
30 31
         predicted = 'testing',
31
-        .params=c('p'),
32
+        .params=c('p_train'),
32 33
         .outputs=c('training','testing'),
33 34
 
34
-        p=entity(name = 'Proportion in training set',
35
+        p_train=entity(name = 'Proportion in training set',
35 36
             description = 'The proportion of samples selected for the training set. All other samples will be in assigned to the test set.',
36 37
             value = 0.75,
37 38
             type='numeric'),
... ...
@@ -58,7 +59,7 @@ setMethod(f="model_apply",
58 59
         # number of samples
59 60
         nMax=nrow(D$data)
60 61
         # number in the training set
61
-        n=floor(nMax*opt$p)
62
+        n=floor(nMax*opt$p_train)
62 63
         # select a random subset of the data for training
63 64
         in_training=sample(x=1:nMax,size = n, replace=FALSE,prob=NULL)
64 65
         training=DatasetExperiment(data=D$data[in_training,,drop=FALSE],
Browse code

fix/update examples

Gavin Rhys Lloyd authored on 07/02/2020 17:02:22
Showing 1 changed files
... ...
@@ -6,7 +6,7 @@
6 6
 #' @return struct object
7 7
 #' @export split_data
8 8
 #' @examples
9
-#' M = split_data()
9
+#' M = split_data(p=0.75)
10 10
 #'
11 11
 split_data = function(p,...) {
12 12
     out=struct::new_struct('split_data',
Browse code

update to use new struct class constructors

Gavin Rhys Lloyd authored on 06/02/2020 13:51:52
Showing 1 changed files
... ...
@@ -1,15 +1,17 @@
1
-#' split data into sets
1
+#' Split data into subsets
2 2
 #'
3
-#' Splits the data into a training and test set
3
+#' Splits the data into a training and test set.
4
+#' @param p The proportion of samples in the training set.
4 5
 #' @param ... additional slots and values passed to struct_class
5 6
 #' @return struct object
6 7
 #' @export split_data
7 8
 #' @examples
8 9
 #' M = split_data()
9 10
 #'
10
-split_data = function(...) {
11
-    out=.split_data()
12
-    out=struct::new_struct(out,...)
11
+split_data = function(p,...) {
12
+    out=struct::new_struct('split_data',
13
+        p=p,
14
+        ...)
13 15
     return(out)
14 16
 }
15 17
 
... ...
@@ -26,6 +28,8 @@ split_data = function(...) {
26 28
         description = 'Splits the data into a training and test set',
27 29
         type = 'processing',
28 30
         predicted = 'testing',
31
+        .params=c('p'),
32
+        .outputs=c('training','testing'),
29 33
 
30 34
         p=entity(name = 'Proportion in training set',
31 35
             description = 'The proportion of samples selected for the training set. All other samples will be in assigned to the test set.',
... ...
@@ -45,13 +49,11 @@ split_data = function(...) {
45 49
     )
46 50
 )
47 51
 
48
-#' @param ... additional slots and values passed to struct_class
49 52
 #' @export
50 53
 #' @template model_apply
51 54
 setMethod(f="model_apply",
52 55
     signature=c("split_data","DatasetExperiment"),
53
-    definition=function(M,D)
54
-    {
56
+    definition=function(M,D) {
55 57
         opt=param_list(M)
56 58
         # number of samples
57 59
         nMax=nrow(D$data)
Browse code

incremental changes to use struct class constructors

Gavin Rhys Lloyd authored on 04/02/2020 17:18:11
Showing 1 changed files
... ...
@@ -1,7 +1,7 @@
1 1
 #' split data into sets
2 2
 #'
3 3
 #' Splits the data into a training and test set
4
-#' @param ... slots and values for the new object
4
+#' @param ... additional slots and values passed to struct_class
5 5
 #' @return struct object
6 6
 #' @export split_data
7 7
 #' @examples
... ...
@@ -45,7 +45,7 @@ split_data = function(...) {
45 45
     )
46 46
 )
47 47
 
48
-#' @param ... slots and values for the new object
48
+#' @param ... additional slots and values passed to struct_class
49 49
 #' @export
50 50
 #' @template model_apply
51 51
 setMethod(f="model_apply",
Browse code

remove all params_ and outputs_ tags

also fix resulting duplicate slot name 'type' for mixed_effects

Gavin Rhys Lloyd authored on 04/02/2020 10:28:42
Showing 1 changed files
... ...
@@ -9,7 +9,7 @@
9 9
 #'
10 10
 split_data = function(...) {
11 11
     out=.split_data()
12
-    out=struct::.initialize_struct_class(out,...)
12
+    out=struct::new_struct(out,...)
13 13
     return(out)
14 14
 }
15 15
 
... ...
@@ -17,9 +17,9 @@ split_data = function(...) {
17 17
 .split_data<-setClass(
18 18
     "split_data",
19 19
     contains = c('model'),
20
-    slots=c(params_p='entity',
21
-        outputs_training='entity',
22
-        outputs_testing='entity'
20
+    slots=c(p='entity',
21
+        training='entity',
22
+        testing='entity'
23 23
     ),
24 24
 
25 25
     prototype=list(name = 'Split data',
... ...
@@ -27,17 +27,17 @@ split_data = function(...) {
27 27
         type = 'processing',
28 28
         predicted = 'testing',
29 29
 
30
-        params_p=entity(name = 'Proportion in training set',
30
+        p=entity(name = 'Proportion in training set',
31 31
             description = 'The proportion of samples selected for the training set. All other samples will be in assigned to the test set.',
32 32
             value = 0.75,
33 33
             type='numeric'),
34 34
 
35
-        outputs_training=entity(name = 'A DatasetExperiment of training data',
35
+        training=entity(name = 'A DatasetExperiment of training data',
36 36
             description = 'A DatasetExperiment object containing samples selected for the training set.',
37 37
             type='DatasetExperiment',
38 38
             value=DatasetExperiment()
39 39
         ),
40
-        outputs_testing=entity(name = 'A DatasetExperiment of data for testing',
40
+        testing=entity(name = 'A DatasetExperiment of data for testing',
41 41
             description = 'A DatasetExperiment object containing samples selected for the testing set.',
42 42
             type='DatasetExperiment',
43 43
             value=DatasetExperiment()
Browse code

minor text fixes

Gavin Rhys Lloyd authored on 27/01/2020 10:22:02
Showing 1 changed files
... ...
@@ -28,7 +28,7 @@ split_data = function(...) {
28 28
         predicted = 'testing',
29 29
 
30 30
         params_p=entity(name = 'Proportion in training set',
31
-            description = 'The proportion of samples selected for the training set. All other samples willbe in assigned to the test set.',
31
+            description = 'The proportion of samples selected for the training set. All other samples will be in assigned to the test set.',
32 32
             value = 0.75,
33 33
             type='numeric'),
34 34
 
Browse code

add @return to documentation

Gavin Rhys Lloyd authored on 19/12/2019 15:14:02
Showing 1 changed files
... ...
@@ -2,6 +2,7 @@
2 2
 #'
3 3
 #' Splits the data into a training and test set
4 4
 #' @param ... slots and values for the new object
5
+#' @return struct object
5 6
 #' @export split_data
6 7
 #' @examples
7 8
 #' M = split_data()
Browse code

fix broken tests and...

...update some documentation

Gavin Rhys Lloyd authored on 17/12/2019 17:24:38
Showing 1 changed files
... ...
@@ -1,11 +1,19 @@
1 1
 #' split data into sets
2 2
 #'
3 3
 #' Splits the data into a training and test set
4
+#' @param ... slots and values for the new object
4 5
 #' @export split_data
5 6
 #' @examples
6 7
 #' M = split_data()
7 8
 #'
8
-split_data<-setClass(
9
+split_data = function(...) {
10
+    out=.split_data()
11
+    out=struct::.initialize_struct_class(out,...)
12
+    return(out)
13
+}
14
+
15
+
16
+.split_data<-setClass(
9 17
     "split_data",
10 18
     contains = c('model'),
11 19
     slots=c(params_p='entity',
... ...
@@ -36,6 +44,7 @@ split_data<-setClass(
36 44
     )
37 45
 )
38 46
 
47
+#' @param ... slots and values for the new object
39 48
 #' @export
40 49
 #' @template model_apply
41 50
 setMethod(f="model_apply",
... ...
@@ -51,14 +60,14 @@ setMethod(f="model_apply",
51 60
         in_training=sample(x=1:nMax,size = n, replace=FALSE,prob=NULL)
52 61
         training=DatasetExperiment(data=D$data[in_training,,drop=FALSE],
53 62
             sample_meta=D$sample_meta[in_training,,drop=FALSE],
54
-            variable_meta=dobj$variable_meta,
55
-            name=c(name(D),'(Training set)'),
56
-            description=c(description(D),'A subset of the data has been selected as a training set'))
63
+            variable_meta=D$variable_meta,
64
+            name=c(D$name,'(Training set)'),
65
+            description=c(D$description,'A subset of the data has been selected as a training set'))
57 66
         testing=DatasetExperiment(data=D$data[-in_training,,drop=FALSE],
58 67
             sample_meta=D$sample_meta[-in_training,,drop=FALSE],
59
-            variable_meta=dobj$variable_meta,
60
-            name=c(name(D),'(Testing set)'),
61
-            description=c(description(D),'A subset of the data has been selected as a test set'))
68
+            variable_meta=D$variable_meta,
69
+            name=c(D$name,'(Testing set)'),
70
+            description=c(D$description,'A subset of the data has been selected as a test set'))
62 71
         output_value(M,'training')=training
63 72
         output_value(M,'testing')=testing
64 73
 
Browse code

use class contructors and...

...rename all function with dot to underscore
replace dataset with DatasetExperiment

Gavin Rhys Lloyd authored on 17/12/2019 15:48:01
Showing 1 changed files
... ...
@@ -8,9 +8,9 @@
8 8
 split_data<-setClass(
9 9
     "split_data",
10 10
     contains = c('model'),
11
-    slots=c(params.p='entity',
12
-        outputs.training='entity',
13
-        outputs.testing='entity'
11
+    slots=c(params_p='entity',
12
+        outputs_training='entity',
13
+        outputs_testing='entity'
14 14
     ),
15 15
 
16 16
     prototype=list(name = 'Split data',
... ...
@@ -18,49 +18,49 @@ split_data<-setClass(
18 18
         type = 'processing',
19 19
         predicted = 'testing',
20 20
 
21
-        params.p=entity(name = 'Proportion in training set',
21
+        params_p=entity(name = 'Proportion in training set',
22 22
             description = 'The proportion of samples selected for the training set. All other samples willbe in assigned to the test set.',
23 23
             value = 0.75,
24 24
             type='numeric'),
25 25
 
26
-        outputs.training=entity(name = 'A dataset of training data',
27
-            description = 'A dataset object containing samples selected for the training set.',
28
-            type='dataset',
29
-            value=dataset()
26
+        outputs_training=entity(name = 'A DatasetExperiment of training data',
27
+            description = 'A DatasetExperiment object containing samples selected for the training set.',
28
+            type='DatasetExperiment',
29
+            value=DatasetExperiment()
30 30
         ),
31
-        outputs.testing=entity(name = 'A dataset of data for testing',
32
-            description = 'A dataset object containing samples selected for the testing set.',
33
-            type='dataset',
34
-            value=dataset()
31
+        outputs_testing=entity(name = 'A DatasetExperiment of data for testing',
32
+            description = 'A DatasetExperiment object containing samples selected for the testing set.',
33
+            type='DatasetExperiment',
34
+            value=DatasetExperiment()
35 35
         )
36 36
     )
37 37
 )
38 38
 
39 39
 #' @export
40 40
 #' @template model_apply
41
-setMethod(f="model.apply",
42
-    signature=c("split_data","dataset"),
41
+setMethod(f="model_apply",
42
+    signature=c("split_data","DatasetExperiment"),
43 43
     definition=function(M,D)
44 44
     {
45
-        opt=param.list(M)
45
+        opt=param_list(M)
46 46
         # number of samples
47
-        nMax=nrow(dataset.data(D))
47
+        nMax=nrow(D$data)
48 48
         # number in the training set
49 49
         n=floor(nMax*opt$p)
50 50
         # select a random subset of the data for training
51 51
         in_training=sample(x=1:nMax,size = n, replace=FALSE,prob=NULL)
52
-        training=dataset(data=dataset.data(D)[in_training,,drop=FALSE],
53
-            sample_meta=dataset.sample_meta(D)[in_training,,drop=FALSE],
54
-            variable_meta=dataset.variable_meta(D),
52
+        training=DatasetExperiment(data=D$data[in_training,,drop=FALSE],
53
+            sample_meta=D$sample_meta[in_training,,drop=FALSE],
54
+            variable_meta=dobj$variable_meta,
55 55
             name=c(name(D),'(Training set)'),
56 56
             description=c(description(D),'A subset of the data has been selected as a training set'))
57
-        testing=dataset(data=dataset.data(D)[-in_training,,drop=FALSE],
58
-            sample_meta=dataset.sample_meta(D)[-in_training,,drop=FALSE],
59
-            variable_meta=dataset.variable_meta(D),
57
+        testing=DatasetExperiment(data=D$data[-in_training,,drop=FALSE],
58
+            sample_meta=D$sample_meta[-in_training,,drop=FALSE],
59
+            variable_meta=dobj$variable_meta,
60 60
             name=c(name(D),'(Testing set)'),
61 61
             description=c(description(D),'A subset of the data has been selected as a test set'))
62
-        output.value(M,'training')=training
63
-        output.value(M,'testing')=testing
62
+        output_value(M,'training')=training
63
+        output_value(M,'testing')=testing
64 64
 
65 65
         return(M)
66 66
     }
Browse code

change to use train/predict and apply

update due to removal of methods class from struct base

grlloyd authored on 14/10/2019 15:22:55
Showing 1 changed files
... ...
@@ -37,7 +37,7 @@ split_data<-setClass(
37 37
 )
38 38
 
39 39
 #' @export
40
-#' @template method_apply
40
+#' @template model_apply
41 41
 setMethod(f="model.apply",
42 42
     signature=c("split_data","dataset"),
43 43
     definition=function(M,D)
Browse code

convert all methods to models

due to changes in base package struct

grlloyd authored on 14/10/2019 09:06:39
Showing 1 changed files
... ...
@@ -7,7 +7,7 @@
7 7
 #'
8 8
 split_data<-setClass(
9 9
     "split_data",
10
-    contains = c('method'),
10
+    contains = c('model'),
11 11
     slots=c(params.p='entity',
12 12
         outputs.training='entity',
13 13
         outputs.testing='entity'
... ...
@@ -38,7 +38,7 @@ split_data<-setClass(
38 38
 
39 39
 #' @export
40 40
 #' @template method_apply
41
-setMethod(f="method.apply",
41
+setMethod(f="model.apply",
42 42
     signature=c("split_data","dataset"),
43 43
     definition=function(M,D)
44 44
     {
Browse code

update documentation for BiocCheck

grlloyd authored on 24/09/2019 10:02:55
Showing 1 changed files
... ...
@@ -2,6 +2,9 @@
2 2
 #'
3 3
 #' Splits the data into a training and test set
4 4
 #' @export split_data
5
+#' @examples
6
+#' M = split_data()
7
+#'
5 8
 split_data<-setClass(
6 9
     "split_data",
7 10
     contains = c('method'),
Browse code

use method_apply roxygen template

grlloyd authored on 23/09/2019 12:35:05
Showing 1 changed files
... ...
@@ -34,6 +34,7 @@ split_data<-setClass(
34 34
 )
35 35
 
36 36
 #' @export
37
+#' @template method_apply
37 38
 setMethod(f="method.apply",
38 39
     signature=c("split_data","dataset"),
39 40
     definition=function(M,D)
Browse code

change indentation for biocCheck

grlloyd authored on 24/05/2019 13:53:08
Showing 1 changed files
... ...
@@ -3,61 +3,61 @@
3 3
 #' Splits the data into a training and test set
4 4
 #' @export split_data
5 5
 split_data<-setClass(
6
-  "split_data",
7
-  contains = c('method'),
8
-  slots=c(params.p='entity',
9
-          outputs.training='entity',
10
-          outputs.testing='entity'
11
-  ),
6
+    "split_data",
7
+    contains = c('method'),
8
+    slots=c(params.p='entity',
9
+        outputs.training='entity',
10
+        outputs.testing='entity'
11
+    ),
12 12
 
13
-  prototype=list(name = 'Split data',
14
-                 description = 'Splits the data into a training and test set',
15
-                 type = 'processing',
16
-                 predicted = 'testing',
13
+    prototype=list(name = 'Split data',
14
+        description = 'Splits the data into a training and test set',
15
+        type = 'processing',
16
+        predicted = 'testing',
17 17
 
18
-                 params.p=entity(name = 'Proportion in training set',
19
-                                        description = 'The proportion of samples selected for the training set. All other samples willbe in assigned to the test set.',
20
-                                        value = 0.75,
21
-                                        type='numeric'),
18
+        params.p=entity(name = 'Proportion in training set',
19
+            description = 'The proportion of samples selected for the training set. All other samples willbe in assigned to the test set.',
20
+            value = 0.75,
21
+            type='numeric'),
22 22
 
23
-                 outputs.training=entity(name = 'A dataset of training data',
24
-                                            description = 'A dataset object containing samples selected for the training set.',
25
-                                            type='dataset',
26
-                                            value=dataset()
27
-                 ),
28
-                 outputs.testing=entity(name = 'A dataset of data for testing',
29
-                                         description = 'A dataset object containing samples selected for the testing set.',
30
-                                         type='dataset',
31
-                                         value=dataset()
32
-                 )
33
-  )
23
+        outputs.training=entity(name = 'A dataset of training data',
24
+            description = 'A dataset object containing samples selected for the training set.',
25
+            type='dataset',
26
+            value=dataset()
27
+        ),
28
+        outputs.testing=entity(name = 'A dataset of data for testing',
29
+            description = 'A dataset object containing samples selected for the testing set.',
30
+            type='dataset',
31
+            value=dataset()
32
+        )
33
+    )
34 34
 )
35 35
 
36 36
 #' @export
37 37
 setMethod(f="method.apply",
38
-          signature=c("split_data","dataset"),
39
-          definition=function(M,D)
40
-          {
41
-            opt=param.list(M)
42
-            # number of samples
43
-            nMax=nrow(dataset.data(D))
44
-            # number in the training set
45
-            n=floor(nMax*opt$p)
46
-            # select a random subset of the data for training
47
-            in_training=sample(x=1:nMax,size = n, replace=FALSE,prob=NULL)
48
-            training=dataset(data=dataset.data(D)[in_training,,drop=FALSE],
49
-                             sample_meta=dataset.sample_meta(D)[in_training,,drop=FALSE],
50
-                             variable_meta=dataset.variable_meta(D),
51
-                             name=c(name(D),'(Training set)'),
52
-                             description=c(description(D),'A subset of the data has been selected as a training set'))
53
-            testing=dataset(data=dataset.data(D)[-in_training,,drop=FALSE],
54
-                             sample_meta=dataset.sample_meta(D)[-in_training,,drop=FALSE],
55
-                             variable_meta=dataset.variable_meta(D),
56
-                             name=c(name(D),'(Testing set)'),
57
-                             description=c(description(D),'A subset of the data has been selected as a test set'))
58
-            output.value(M,'training')=training
59
-            output.value(M,'testing')=testing
38
+    signature=c("split_data","dataset"),
39
+    definition=function(M,D)
40
+    {
41
+        opt=param.list(M)
42
+        # number of samples
43
+        nMax=nrow(dataset.data(D))
44
+        # number in the training set
45
+        n=floor(nMax*opt$p)
46
+        # select a random subset of the data for training
47
+        in_training=sample(x=1:nMax,size = n, replace=FALSE,prob=NULL)
48
+        training=dataset(data=dataset.data(D)[in_training,,drop=FALSE],
49
+            sample_meta=dataset.sample_meta(D)[in_training,,drop=FALSE],
50
+            variable_meta=dataset.variable_meta(D),
51
+            name=c(name(D),'(Training set)'),
52
+            description=c(description(D),'A subset of the data has been selected as a training set'))
53
+        testing=dataset(data=dataset.data(D)[-in_training,,drop=FALSE],
54
+            sample_meta=dataset.sample_meta(D)[-in_training,,drop=FALSE],
55
+            variable_meta=dataset.variable_meta(D),
56
+            name=c(name(D),'(Testing set)'),
57
+            description=c(description(D),'A subset of the data has been selected as a test set'))
58
+        output.value(M,'training')=training
59
+        output.value(M,'testing')=testing
60 60
 
61
-            return(M)
62
-          }
61
+        return(M)
62
+    }
63 63
 )
Browse code

remove params and outputs slots

struct now searches for parameters labelled param. and output. so list of them no longer needed as a slot

grlloyd authored on 01/04/2019 14:10:06
Showing 1 changed files
... ...
@@ -14,8 +14,6 @@ split_data<-setClass(
14 14
                  description = 'Splits the data into a training and test set',
15 15
                  type = 'processing',
16 16
                  predicted = 'testing',
17
-                 params=c('p'),
18
-                 outputs=c('training','testing'),
19 17
 
20 18
                  params.p=entity(name = 'Proportion in training set',
21 19
                                         description = 'The proportion of samples selected for the training set. All other samples willbe in assigned to the test set.',
Browse code

initial commit

Gavin Rhys Lloyd authored on 20/03/2019 15:54:08
Showing 1 changed files
1 1
new file mode 100644
... ...
@@ -0,0 +1,65 @@
1
+#' split data into sets
2
+#'
3
+#' Splits the data into a training and test set
4
+#' @export split_data
5
+split_data<-setClass(
6
+  "split_data",
7
+  contains = c('method'),
8
+  slots=c(params.p='entity',
9
+          outputs.training='entity',
10
+          outputs.testing='entity'
11
+  ),
12
+
13
+  prototype=list(name = 'Split data',
14
+                 description = 'Splits the data into a training and test set',
15
+                 type = 'processing',
16
+                 predicted = 'testing',
17
+                 params=c('p'),
18
+                 outputs=c('training','testing'),
19
+
20
+                 params.p=entity(name = 'Proportion in training set',
21
+                                        description = 'The proportion of samples selected for the training set. All other samples willbe in assigned to the test set.',
22
+                                        value = 0.75,
23
+                                        type='numeric'),
24
+
25
+                 outputs.training=entity(name = 'A dataset of training data',
26
+                                            description = 'A dataset object containing samples selected for the training set.',
27
+                                            type='dataset',
28
+                                            value=dataset()
29
+                 ),
30
+                 outputs.testing=entity(name = 'A dataset of data for testing',
31
+                                         description = 'A dataset object containing samples selected for the testing set.',
32
+                                         type='dataset',
33
+                                         value=dataset()
34
+                 )
35
+  )
36
+)
37
+
38
+#' @export
39
+setMethod(f="method.apply",
40
+          signature=c("split_data","dataset"),
41
+          definition=function(M,D)
42
+          {
43
+            opt=param.list(M)
44
+            # number of samples
45
+            nMax=nrow(dataset.data(D))
46
+            # number in the training set
47
+            n=floor(nMax*opt$p)
48
+            # select a random subset of the data for training
49
+            in_training=sample(x=1:nMax,size = n, replace=FALSE,prob=NULL)
50
+            training=dataset(data=dataset.data(D)[in_training,,drop=FALSE],
51
+                             sample_meta=dataset.sample_meta(D)[in_training,,drop=FALSE],
52
+                             variable_meta=dataset.variable_meta(D),
53
+                             name=c(name(D),'(Training set)'),
54
+                             description=c(description(D),'A subset of the data has been selected as a training set'))
55
+            testing=dataset(data=dataset.data(D)[-in_training,,drop=FALSE],
56
+                             sample_meta=dataset.sample_meta(D)[-in_training,,drop=FALSE],
57
+                             variable_meta=dataset.variable_meta(D),
58
+                             name=c(name(D),'(Testing set)'),
59
+                             description=c(description(D),'A subset of the data has been selected as a test set'))
60
+            output.value(M,'training')=training
61
+            output.value(M,'testing')=testing
62
+
63
+            return(M)
64
+          }
65
+)