* add selectivity ratio
* replace vip_summary with feature_importance
renamed and now allows vip, sr and sr_pvalues to be plotted
* add equal_split model
random subsets so generate training sets with equal group numbers
* plot 1 - p-value
to conform with the "best" feature being a maximum value
* add resample iterator
subsample at random over a number of iterations. Option to use
different kinds of splitting methods. Corresponding chart.
* allow use of list() for factor_name
* force apply not to simplify output to guarantee returning a list
* update example
* add correct parameter
collect will collect the requested model output over all iterations in a list WORK IN PROGRESS
* add collection of multiple outputs of model sequence
* plot reg coeff on rhs
* match outputs of xval for use with grid search etc
* specify levels when converting predictions to factor
* change PLSDA to inherit from PLSR
rename some charts to be compatible with both PLSR and PLSDA
* allow y-block column selection
* re-assign y output after PLSR with factor
* update vignettes wrt PLS changes
* update documentation
* update R version to 4.1
* update documentation
* update documentation
* update scatter plot
- new scatter chart object
- used by PCA scores, PLSR/PLSDA scores
- other charts updated to reflect changes in scores plots where necessary
- added ycol param to plots for when y-block is a matrix
* add url to github
* add plsda scores alias
- plsda_scores_plot and pls_scores_plot do that same thing
Included for backwards compatability
- added components back as parameter for scores plots for backwards compatibility
* fix broken example
* fix broken tests
- scores is now returned as a DatasetExperiment object not a data.frame
* Update data_analysis_omics_using_the_structtoolbox.Rmd
- wrt changes in scores plots
* update documentation
* fix colnames for Y matrix
* fix base=10 regardless of input (see #15)
class constructor was always setting base to 10 instead of the input value
* merge bug fix 1.01 into dev (#19)
* bug fix issue #7
Correctly re-order the sample_meta column for colouring samples in the dendrogram plot
* version bump
bug fix issue #7
* fix for https://github.com/computational-metabolomics/structToolbox/issues/18 (#20)
correctly reorder the factor labels so that the control group always ends up in the denominator for the fold change calculation.
* fix for https://github.com/computational-metabolomics/structToolbox/issues/18
fixed incorrect length check on matching class labels.
* Issue 17 ttest factor (#21)
* convert to factor if not one already
fix for issue #17
* update roxygen version
* fix for issue #9 (#22)
changed from lapply to vapply and used drop=FALSE to ensure compatibility with a single factor.
* allow user to set lambda (#24)
- lambda changed to input parameter. NULL = uses pmp optimisation
- model_predict now uses the set value of lambda, or lambda_opt if used.
- documentation updated
* Feature non parametric fold change (#26)
* add "median" method
based on DOI: 10.1080/00949650212140 can now calcuate fold changes equivalent to using medians and corresponding confidence intervals
* update documentation
* update median method
now correctly calculates ratio of medians
* use wilcox for paired median intervals
make use of wilcox.test to estimate intervals for the median when using median for paired samples
* Issue 23 filter by name (#27)
* fix for #23
moved all model_apply functionality to model_predict so that model_train and model_predict can be used as well as model_apply
* update documentation
* Update mean_of_medians.R (#29)
fix for #28
- correctly loop over all levels in the named factor
* Feature documentation 3 12 (#31)
* update documentation
Description and inputs now pulled from the object definitions for consistency.
* fix definition of label_features
allows NULL and description updated
* replace non ascii characters
* export mixed_effect object
* use correct object name to generate documentation
* export mixed_effect object
* remove non ascii characters
* update tests with new object name
* add import for capture.output
* add import for capture.output
* use pca_biplot in tests
chart was renamed
* add utils import
* update struct dependency version
* update documentation
* update news, version bump
... | ... |
@@ -1,9 +1,4 @@ |
1 |
-#' Split data into subsets |
|
2 |
-#' |
|
3 |
-#' Splits the data into a training and test set. |
|
4 |
-#' @param p_train The proportion of samples in the training set. |
|
5 |
-#' @param ... additional slots and values passed to struct_class |
|
6 |
-#' @return struct object |
|
1 |
+#' @eval get_description('split_data') |
|
7 | 2 |
#' @export split_data |
8 | 3 |
#' @examples |
9 | 4 |
#' M = split_data(p_train=0.75) |
... | ... |
@@ -26,14 +21,17 @@ split_data = function(p_train,...) { |
26 | 21 |
|
27 | 22 |
prototype=list( |
28 | 23 |
name = 'Split data', |
29 |
- description = 'Splits the data into a training and test set', |
|
24 |
+ description = paste0('The data matrix is divided into two subsets.', |
|
25 |
+ 'A predefined proportion of the samples are randomly selected for a ', |
|
26 |
+ 'training set, and the remaining samples are used for the test set.'), |
|
30 | 27 |
type = 'processing', |
31 | 28 |
predicted = 'testing', |
32 | 29 |
.params=c('p_train'), |
33 | 30 |
.outputs=c('training','testing'), |
34 | 31 |
|
35 | 32 |
p_train=entity(name = 'Proportion in training set', |
36 |
- description = 'The proportion of samples selected for the training set. All other samples will be in assigned to the test set.', |
|
33 |
+ description = paste0('The proportion of samples selected for the ', |
|
34 |
+ 'training set.'), |
|
37 | 35 |
value = 0.75, |
38 | 36 |
type='numeric'), |
39 | 37 |
|
+ AUC metric
+ PLS charts (reg coeff, ROC, VIP scores)
+ stratified data set splitting
... | ... |
@@ -1,16 +1,16 @@ |
1 | 1 |
#' Split data into subsets |
2 | 2 |
#' |
3 | 3 |
#' Splits the data into a training and test set. |
4 |
-#' @param p The proportion of samples in the training set. |
|
4 |
+#' @param p_train The proportion of samples in the training set. |
|
5 | 5 |
#' @param ... additional slots and values passed to struct_class |
6 | 6 |
#' @return struct object |
7 | 7 |
#' @export split_data |
8 | 8 |
#' @examples |
9 |
-#' M = split_data(p=0.75) |
|
9 |
+#' M = split_data(p_train=0.75) |
|
10 | 10 |
#' |
11 |
-split_data = function(p,...) { |
|
11 |
+split_data = function(p_train,...) { |
|
12 | 12 |
out=struct::new_struct('split_data', |
13 |
- p=p, |
|
13 |
+ p_train=p_train, |
|
14 | 14 |
...) |
15 | 15 |
return(out) |
16 | 16 |
} |
... | ... |
@@ -19,19 +19,20 @@ split_data = function(p,...) { |
19 | 19 |
.split_data<-setClass( |
20 | 20 |
"split_data", |
21 | 21 |
contains = c('model'), |
22 |
- slots=c(p='entity', |
|
22 |
+ slots=c(p_train='entity', |
|
23 | 23 |
training='entity', |
24 | 24 |
testing='entity' |
25 | 25 |
), |
26 | 26 |
|
27 |
- prototype=list(name = 'Split data', |
|
27 |
+ prototype=list( |
|
28 |
+ name = 'Split data', |
|
28 | 29 |
description = 'Splits the data into a training and test set', |
29 | 30 |
type = 'processing', |
30 | 31 |
predicted = 'testing', |
31 |
- .params=c('p'), |
|
32 |
+ .params=c('p_train'), |
|
32 | 33 |
.outputs=c('training','testing'), |
33 | 34 |
|
34 |
- p=entity(name = 'Proportion in training set', |
|
35 |
+ p_train=entity(name = 'Proportion in training set', |
|
35 | 36 |
description = 'The proportion of samples selected for the training set. All other samples will be in assigned to the test set.', |
36 | 37 |
value = 0.75, |
37 | 38 |
type='numeric'), |
... | ... |
@@ -58,7 +59,7 @@ setMethod(f="model_apply", |
58 | 59 |
# number of samples |
59 | 60 |
nMax=nrow(D$data) |
60 | 61 |
# number in the training set |
61 |
- n=floor(nMax*opt$p) |
|
62 |
+ n=floor(nMax*opt$p_train) |
|
62 | 63 |
# select a random subset of the data for training |
63 | 64 |
in_training=sample(x=1:nMax,size = n, replace=FALSE,prob=NULL) |
64 | 65 |
training=DatasetExperiment(data=D$data[in_training,,drop=FALSE], |
... | ... |
@@ -1,15 +1,17 @@ |
1 |
-#' split data into sets |
|
1 |
+#' Split data into subsets |
|
2 | 2 |
#' |
3 |
-#' Splits the data into a training and test set |
|
3 |
+#' Splits the data into a training and test set. |
|
4 |
+#' @param p The proportion of samples in the training set. |
|
4 | 5 |
#' @param ... additional slots and values passed to struct_class |
5 | 6 |
#' @return struct object |
6 | 7 |
#' @export split_data |
7 | 8 |
#' @examples |
8 | 9 |
#' M = split_data() |
9 | 10 |
#' |
10 |
-split_data = function(...) { |
|
11 |
- out=.split_data() |
|
12 |
- out=struct::new_struct(out,...) |
|
11 |
+split_data = function(p,...) { |
|
12 |
+ out=struct::new_struct('split_data', |
|
13 |
+ p=p, |
|
14 |
+ ...) |
|
13 | 15 |
return(out) |
14 | 16 |
} |
15 | 17 |
|
... | ... |
@@ -26,6 +28,8 @@ split_data = function(...) { |
26 | 28 |
description = 'Splits the data into a training and test set', |
27 | 29 |
type = 'processing', |
28 | 30 |
predicted = 'testing', |
31 |
+ .params=c('p'), |
|
32 |
+ .outputs=c('training','testing'), |
|
29 | 33 |
|
30 | 34 |
p=entity(name = 'Proportion in training set', |
31 | 35 |
description = 'The proportion of samples selected for the training set. All other samples will be in assigned to the test set.', |
... | ... |
@@ -45,13 +49,11 @@ split_data = function(...) { |
45 | 49 |
) |
46 | 50 |
) |
47 | 51 |
|
48 |
-#' @param ... additional slots and values passed to struct_class |
|
49 | 52 |
#' @export |
50 | 53 |
#' @template model_apply |
51 | 54 |
setMethod(f="model_apply", |
52 | 55 |
signature=c("split_data","DatasetExperiment"), |
53 |
- definition=function(M,D) |
|
54 |
- { |
|
56 |
+ definition=function(M,D) { |
|
55 | 57 |
opt=param_list(M) |
56 | 58 |
# number of samples |
57 | 59 |
nMax=nrow(D$data) |
... | ... |
@@ -1,7 +1,7 @@ |
1 | 1 |
#' split data into sets |
2 | 2 |
#' |
3 | 3 |
#' Splits the data into a training and test set |
4 |
-#' @param ... slots and values for the new object |
|
4 |
+#' @param ... additional slots and values passed to struct_class |
|
5 | 5 |
#' @return struct object |
6 | 6 |
#' @export split_data |
7 | 7 |
#' @examples |
... | ... |
@@ -45,7 +45,7 @@ split_data = function(...) { |
45 | 45 |
) |
46 | 46 |
) |
47 | 47 |
|
48 |
-#' @param ... slots and values for the new object |
|
48 |
+#' @param ... additional slots and values passed to struct_class |
|
49 | 49 |
#' @export |
50 | 50 |
#' @template model_apply |
51 | 51 |
setMethod(f="model_apply", |
also fix resulting duplicate slot name 'type' for mixed_effects
... | ... |
@@ -9,7 +9,7 @@ |
9 | 9 |
#' |
10 | 10 |
split_data = function(...) { |
11 | 11 |
out=.split_data() |
12 |
- out=struct::.initialize_struct_class(out,...) |
|
12 |
+ out=struct::new_struct(out,...) |
|
13 | 13 |
return(out) |
14 | 14 |
} |
15 | 15 |
|
... | ... |
@@ -17,9 +17,9 @@ split_data = function(...) { |
17 | 17 |
.split_data<-setClass( |
18 | 18 |
"split_data", |
19 | 19 |
contains = c('model'), |
20 |
- slots=c(params_p='entity', |
|
21 |
- outputs_training='entity', |
|
22 |
- outputs_testing='entity' |
|
20 |
+ slots=c(p='entity', |
|
21 |
+ training='entity', |
|
22 |
+ testing='entity' |
|
23 | 23 |
), |
24 | 24 |
|
25 | 25 |
prototype=list(name = 'Split data', |
... | ... |
@@ -27,17 +27,17 @@ split_data = function(...) { |
27 | 27 |
type = 'processing', |
28 | 28 |
predicted = 'testing', |
29 | 29 |
|
30 |
- params_p=entity(name = 'Proportion in training set', |
|
30 |
+ p=entity(name = 'Proportion in training set', |
|
31 | 31 |
description = 'The proportion of samples selected for the training set. All other samples will be in assigned to the test set.', |
32 | 32 |
value = 0.75, |
33 | 33 |
type='numeric'), |
34 | 34 |
|
35 |
- outputs_training=entity(name = 'A DatasetExperiment of training data', |
|
35 |
+ training=entity(name = 'A DatasetExperiment of training data', |
|
36 | 36 |
description = 'A DatasetExperiment object containing samples selected for the training set.', |
37 | 37 |
type='DatasetExperiment', |
38 | 38 |
value=DatasetExperiment() |
39 | 39 |
), |
40 |
- outputs_testing=entity(name = 'A DatasetExperiment of data for testing', |
|
40 |
+ testing=entity(name = 'A DatasetExperiment of data for testing', |
|
41 | 41 |
description = 'A DatasetExperiment object containing samples selected for the testing set.', |
42 | 42 |
type='DatasetExperiment', |
43 | 43 |
value=DatasetExperiment() |
... | ... |
@@ -28,7 +28,7 @@ split_data = function(...) { |
28 | 28 |
predicted = 'testing', |
29 | 29 |
|
30 | 30 |
params_p=entity(name = 'Proportion in training set', |
31 |
- description = 'The proportion of samples selected for the training set. All other samples willbe in assigned to the test set.', |
|
31 |
+ description = 'The proportion of samples selected for the training set. All other samples will be in assigned to the test set.', |
|
32 | 32 |
value = 0.75, |
33 | 33 |
type='numeric'), |
34 | 34 |
|
...update some documentation
... | ... |
@@ -1,11 +1,19 @@ |
1 | 1 |
#' split data into sets |
2 | 2 |
#' |
3 | 3 |
#' Splits the data into a training and test set |
4 |
+#' @param ... slots and values for the new object |
|
4 | 5 |
#' @export split_data |
5 | 6 |
#' @examples |
6 | 7 |
#' M = split_data() |
7 | 8 |
#' |
8 |
-split_data<-setClass( |
|
9 |
+split_data = function(...) { |
|
10 |
+ out=.split_data() |
|
11 |
+ out=struct::.initialize_struct_class(out,...) |
|
12 |
+ return(out) |
|
13 |
+} |
|
14 |
+ |
|
15 |
+ |
|
16 |
+.split_data<-setClass( |
|
9 | 17 |
"split_data", |
10 | 18 |
contains = c('model'), |
11 | 19 |
slots=c(params_p='entity', |
... | ... |
@@ -36,6 +44,7 @@ split_data<-setClass( |
36 | 44 |
) |
37 | 45 |
) |
38 | 46 |
|
47 |
+#' @param ... slots and values for the new object |
|
39 | 48 |
#' @export |
40 | 49 |
#' @template model_apply |
41 | 50 |
setMethod(f="model_apply", |
... | ... |
@@ -51,14 +60,14 @@ setMethod(f="model_apply", |
51 | 60 |
in_training=sample(x=1:nMax,size = n, replace=FALSE,prob=NULL) |
52 | 61 |
training=DatasetExperiment(data=D$data[in_training,,drop=FALSE], |
53 | 62 |
sample_meta=D$sample_meta[in_training,,drop=FALSE], |
54 |
- variable_meta=dobj$variable_meta, |
|
55 |
- name=c(name(D),'(Training set)'), |
|
56 |
- description=c(description(D),'A subset of the data has been selected as a training set')) |
|
63 |
+ variable_meta=D$variable_meta, |
|
64 |
+ name=c(D$name,'(Training set)'), |
|
65 |
+ description=c(D$description,'A subset of the data has been selected as a training set')) |
|
57 | 66 |
testing=DatasetExperiment(data=D$data[-in_training,,drop=FALSE], |
58 | 67 |
sample_meta=D$sample_meta[-in_training,,drop=FALSE], |
59 |
- variable_meta=dobj$variable_meta, |
|
60 |
- name=c(name(D),'(Testing set)'), |
|
61 |
- description=c(description(D),'A subset of the data has been selected as a test set')) |
|
68 |
+ variable_meta=D$variable_meta, |
|
69 |
+ name=c(D$name,'(Testing set)'), |
|
70 |
+ description=c(D$description,'A subset of the data has been selected as a test set')) |
|
62 | 71 |
output_value(M,'training')=training |
63 | 72 |
output_value(M,'testing')=testing |
64 | 73 |
|
...rename all function with dot to underscore
replace dataset with DatasetExperiment
... | ... |
@@ -8,9 +8,9 @@ |
8 | 8 |
split_data<-setClass( |
9 | 9 |
"split_data", |
10 | 10 |
contains = c('model'), |
11 |
- slots=c(params.p='entity', |
|
12 |
- outputs.training='entity', |
|
13 |
- outputs.testing='entity' |
|
11 |
+ slots=c(params_p='entity', |
|
12 |
+ outputs_training='entity', |
|
13 |
+ outputs_testing='entity' |
|
14 | 14 |
), |
15 | 15 |
|
16 | 16 |
prototype=list(name = 'Split data', |
... | ... |
@@ -18,49 +18,49 @@ split_data<-setClass( |
18 | 18 |
type = 'processing', |
19 | 19 |
predicted = 'testing', |
20 | 20 |
|
21 |
- params.p=entity(name = 'Proportion in training set', |
|
21 |
+ params_p=entity(name = 'Proportion in training set', |
|
22 | 22 |
description = 'The proportion of samples selected for the training set. All other samples willbe in assigned to the test set.', |
23 | 23 |
value = 0.75, |
24 | 24 |
type='numeric'), |
25 | 25 |
|
26 |
- outputs.training=entity(name = 'A dataset of training data', |
|
27 |
- description = 'A dataset object containing samples selected for the training set.', |
|
28 |
- type='dataset', |
|
29 |
- value=dataset() |
|
26 |
+ outputs_training=entity(name = 'A DatasetExperiment of training data', |
|
27 |
+ description = 'A DatasetExperiment object containing samples selected for the training set.', |
|
28 |
+ type='DatasetExperiment', |
|
29 |
+ value=DatasetExperiment() |
|
30 | 30 |
), |
31 |
- outputs.testing=entity(name = 'A dataset of data for testing', |
|
32 |
- description = 'A dataset object containing samples selected for the testing set.', |
|
33 |
- type='dataset', |
|
34 |
- value=dataset() |
|
31 |
+ outputs_testing=entity(name = 'A DatasetExperiment of data for testing', |
|
32 |
+ description = 'A DatasetExperiment object containing samples selected for the testing set.', |
|
33 |
+ type='DatasetExperiment', |
|
34 |
+ value=DatasetExperiment() |
|
35 | 35 |
) |
36 | 36 |
) |
37 | 37 |
) |
38 | 38 |
|
39 | 39 |
#' @export |
40 | 40 |
#' @template model_apply |
41 |
-setMethod(f="model.apply", |
|
42 |
- signature=c("split_data","dataset"), |
|
41 |
+setMethod(f="model_apply", |
|
42 |
+ signature=c("split_data","DatasetExperiment"), |
|
43 | 43 |
definition=function(M,D) |
44 | 44 |
{ |
45 |
- opt=param.list(M) |
|
45 |
+ opt=param_list(M) |
|
46 | 46 |
# number of samples |
47 |
- nMax=nrow(dataset.data(D)) |
|
47 |
+ nMax=nrow(D$data) |
|
48 | 48 |
# number in the training set |
49 | 49 |
n=floor(nMax*opt$p) |
50 | 50 |
# select a random subset of the data for training |
51 | 51 |
in_training=sample(x=1:nMax,size = n, replace=FALSE,prob=NULL) |
52 |
- training=dataset(data=dataset.data(D)[in_training,,drop=FALSE], |
|
53 |
- sample_meta=dataset.sample_meta(D)[in_training,,drop=FALSE], |
|
54 |
- variable_meta=dataset.variable_meta(D), |
|
52 |
+ training=DatasetExperiment(data=D$data[in_training,,drop=FALSE], |
|
53 |
+ sample_meta=D$sample_meta[in_training,,drop=FALSE], |
|
54 |
+ variable_meta=dobj$variable_meta, |
|
55 | 55 |
name=c(name(D),'(Training set)'), |
56 | 56 |
description=c(description(D),'A subset of the data has been selected as a training set')) |
57 |
- testing=dataset(data=dataset.data(D)[-in_training,,drop=FALSE], |
|
58 |
- sample_meta=dataset.sample_meta(D)[-in_training,,drop=FALSE], |
|
59 |
- variable_meta=dataset.variable_meta(D), |
|
57 |
+ testing=DatasetExperiment(data=D$data[-in_training,,drop=FALSE], |
|
58 |
+ sample_meta=D$sample_meta[-in_training,,drop=FALSE], |
|
59 |
+ variable_meta=dobj$variable_meta, |
|
60 | 60 |
name=c(name(D),'(Testing set)'), |
61 | 61 |
description=c(description(D),'A subset of the data has been selected as a test set')) |
62 |
- output.value(M,'training')=training |
|
63 |
- output.value(M,'testing')=testing |
|
62 |
+ output_value(M,'training')=training |
|
63 |
+ output_value(M,'testing')=testing |
|
64 | 64 |
|
65 | 65 |
return(M) |
66 | 66 |
} |
update due to removal of methods class from struct base
due to changes in base package struct
... | ... |
@@ -7,7 +7,7 @@ |
7 | 7 |
#' |
8 | 8 |
split_data<-setClass( |
9 | 9 |
"split_data", |
10 |
- contains = c('method'), |
|
10 |
+ contains = c('model'), |
|
11 | 11 |
slots=c(params.p='entity', |
12 | 12 |
outputs.training='entity', |
13 | 13 |
outputs.testing='entity' |
... | ... |
@@ -38,7 +38,7 @@ split_data<-setClass( |
38 | 38 |
|
39 | 39 |
#' @export |
40 | 40 |
#' @template method_apply |
41 |
-setMethod(f="method.apply", |
|
41 |
+setMethod(f="model.apply", |
|
42 | 42 |
signature=c("split_data","dataset"), |
43 | 43 |
definition=function(M,D) |
44 | 44 |
{ |
... | ... |
@@ -3,61 +3,61 @@ |
3 | 3 |
#' Splits the data into a training and test set |
4 | 4 |
#' @export split_data |
5 | 5 |
split_data<-setClass( |
6 |
- "split_data", |
|
7 |
- contains = c('method'), |
|
8 |
- slots=c(params.p='entity', |
|
9 |
- outputs.training='entity', |
|
10 |
- outputs.testing='entity' |
|
11 |
- ), |
|
6 |
+ "split_data", |
|
7 |
+ contains = c('method'), |
|
8 |
+ slots=c(params.p='entity', |
|
9 |
+ outputs.training='entity', |
|
10 |
+ outputs.testing='entity' |
|
11 |
+ ), |
|
12 | 12 |
|
13 |
- prototype=list(name = 'Split data', |
|
14 |
- description = 'Splits the data into a training and test set', |
|
15 |
- type = 'processing', |
|
16 |
- predicted = 'testing', |
|
13 |
+ prototype=list(name = 'Split data', |
|
14 |
+ description = 'Splits the data into a training and test set', |
|
15 |
+ type = 'processing', |
|
16 |
+ predicted = 'testing', |
|
17 | 17 |
|
18 |
- params.p=entity(name = 'Proportion in training set', |
|
19 |
- description = 'The proportion of samples selected for the training set. All other samples willbe in assigned to the test set.', |
|
20 |
- value = 0.75, |
|
21 |
- type='numeric'), |
|
18 |
+ params.p=entity(name = 'Proportion in training set', |
|
19 |
+ description = 'The proportion of samples selected for the training set. All other samples willbe in assigned to the test set.', |
|
20 |
+ value = 0.75, |
|
21 |
+ type='numeric'), |
|
22 | 22 |
|
23 |
- outputs.training=entity(name = 'A dataset of training data', |
|
24 |
- description = 'A dataset object containing samples selected for the training set.', |
|
25 |
- type='dataset', |
|
26 |
- value=dataset() |
|
27 |
- ), |
|
28 |
- outputs.testing=entity(name = 'A dataset of data for testing', |
|
29 |
- description = 'A dataset object containing samples selected for the testing set.', |
|
30 |
- type='dataset', |
|
31 |
- value=dataset() |
|
32 |
- ) |
|
33 |
- ) |
|
23 |
+ outputs.training=entity(name = 'A dataset of training data', |
|
24 |
+ description = 'A dataset object containing samples selected for the training set.', |
|
25 |
+ type='dataset', |
|
26 |
+ value=dataset() |
|
27 |
+ ), |
|
28 |
+ outputs.testing=entity(name = 'A dataset of data for testing', |
|
29 |
+ description = 'A dataset object containing samples selected for the testing set.', |
|
30 |
+ type='dataset', |
|
31 |
+ value=dataset() |
|
32 |
+ ) |
|
33 |
+ ) |
|
34 | 34 |
) |
35 | 35 |
|
36 | 36 |
#' @export |
37 | 37 |
setMethod(f="method.apply", |
38 |
- signature=c("split_data","dataset"), |
|
39 |
- definition=function(M,D) |
|
40 |
- { |
|
41 |
- opt=param.list(M) |
|
42 |
- # number of samples |
|
43 |
- nMax=nrow(dataset.data(D)) |
|
44 |
- # number in the training set |
|
45 |
- n=floor(nMax*opt$p) |
|
46 |
- # select a random subset of the data for training |
|
47 |
- in_training=sample(x=1:nMax,size = n, replace=FALSE,prob=NULL) |
|
48 |
- training=dataset(data=dataset.data(D)[in_training,,drop=FALSE], |
|
49 |
- sample_meta=dataset.sample_meta(D)[in_training,,drop=FALSE], |
|
50 |
- variable_meta=dataset.variable_meta(D), |
|
51 |
- name=c(name(D),'(Training set)'), |
|
52 |
- description=c(description(D),'A subset of the data has been selected as a training set')) |
|
53 |
- testing=dataset(data=dataset.data(D)[-in_training,,drop=FALSE], |
|
54 |
- sample_meta=dataset.sample_meta(D)[-in_training,,drop=FALSE], |
|
55 |
- variable_meta=dataset.variable_meta(D), |
|
56 |
- name=c(name(D),'(Testing set)'), |
|
57 |
- description=c(description(D),'A subset of the data has been selected as a test set')) |
|
58 |
- output.value(M,'training')=training |
|
59 |
- output.value(M,'testing')=testing |
|
38 |
+ signature=c("split_data","dataset"), |
|
39 |
+ definition=function(M,D) |
|
40 |
+ { |
|
41 |
+ opt=param.list(M) |
|
42 |
+ # number of samples |
|
43 |
+ nMax=nrow(dataset.data(D)) |
|
44 |
+ # number in the training set |
|
45 |
+ n=floor(nMax*opt$p) |
|
46 |
+ # select a random subset of the data for training |
|
47 |
+ in_training=sample(x=1:nMax,size = n, replace=FALSE,prob=NULL) |
|
48 |
+ training=dataset(data=dataset.data(D)[in_training,,drop=FALSE], |
|
49 |
+ sample_meta=dataset.sample_meta(D)[in_training,,drop=FALSE], |
|
50 |
+ variable_meta=dataset.variable_meta(D), |
|
51 |
+ name=c(name(D),'(Training set)'), |
|
52 |
+ description=c(description(D),'A subset of the data has been selected as a training set')) |
|
53 |
+ testing=dataset(data=dataset.data(D)[-in_training,,drop=FALSE], |
|
54 |
+ sample_meta=dataset.sample_meta(D)[-in_training,,drop=FALSE], |
|
55 |
+ variable_meta=dataset.variable_meta(D), |
|
56 |
+ name=c(name(D),'(Testing set)'), |
|
57 |
+ description=c(description(D),'A subset of the data has been selected as a test set')) |
|
58 |
+ output.value(M,'training')=training |
|
59 |
+ output.value(M,'testing')=testing |
|
60 | 60 |
|
61 |
- return(M) |
|
62 |
- } |
|
61 |
+ return(M) |
|
62 |
+ } |
|
63 | 63 |
) |
struct now searches for parameters labelled param. and output. so list of them no longer needed as a slot
... | ... |
@@ -14,8 +14,6 @@ split_data<-setClass( |
14 | 14 |
description = 'Splits the data into a training and test set', |
15 | 15 |
type = 'processing', |
16 | 16 |
predicted = 'testing', |
17 |
- params=c('p'), |
|
18 |
- outputs=c('training','testing'), |
|
19 | 17 |
|
20 | 18 |
params.p=entity(name = 'Proportion in training set', |
21 | 19 |
description = 'The proportion of samples selected for the training set. All other samples willbe in assigned to the test set.', |
1 | 1 |
new file mode 100644 |
... | ... |
@@ -0,0 +1,65 @@ |
1 |
+#' split data into sets |
|
2 |
+#' |
|
3 |
+#' Splits the data into a training and test set |
|
4 |
+#' @export split_data |
|
5 |
+split_data<-setClass( |
|
6 |
+ "split_data", |
|
7 |
+ contains = c('method'), |
|
8 |
+ slots=c(params.p='entity', |
|
9 |
+ outputs.training='entity', |
|
10 |
+ outputs.testing='entity' |
|
11 |
+ ), |
|
12 |
+ |
|
13 |
+ prototype=list(name = 'Split data', |
|
14 |
+ description = 'Splits the data into a training and test set', |
|
15 |
+ type = 'processing', |
|
16 |
+ predicted = 'testing', |
|
17 |
+ params=c('p'), |
|
18 |
+ outputs=c('training','testing'), |
|
19 |
+ |
|
20 |
+ params.p=entity(name = 'Proportion in training set', |
|
21 |
+ description = 'The proportion of samples selected for the training set. All other samples willbe in assigned to the test set.', |
|
22 |
+ value = 0.75, |
|
23 |
+ type='numeric'), |
|
24 |
+ |
|
25 |
+ outputs.training=entity(name = 'A dataset of training data', |
|
26 |
+ description = 'A dataset object containing samples selected for the training set.', |
|
27 |
+ type='dataset', |
|
28 |
+ value=dataset() |
|
29 |
+ ), |
|
30 |
+ outputs.testing=entity(name = 'A dataset of data for testing', |
|
31 |
+ description = 'A dataset object containing samples selected for the testing set.', |
|
32 |
+ type='dataset', |
|
33 |
+ value=dataset() |
|
34 |
+ ) |
|
35 |
+ ) |
|
36 |
+) |
|
37 |
+ |
|
38 |
+#' @export |
|
39 |
+setMethod(f="method.apply", |
|
40 |
+ signature=c("split_data","dataset"), |
|
41 |
+ definition=function(M,D) |
|
42 |
+ { |
|
43 |
+ opt=param.list(M) |
|
44 |
+ # number of samples |
|
45 |
+ nMax=nrow(dataset.data(D)) |
|
46 |
+ # number in the training set |
|
47 |
+ n=floor(nMax*opt$p) |
|
48 |
+ # select a random subset of the data for training |
|
49 |
+ in_training=sample(x=1:nMax,size = n, replace=FALSE,prob=NULL) |
|
50 |
+ training=dataset(data=dataset.data(D)[in_training,,drop=FALSE], |
|
51 |
+ sample_meta=dataset.sample_meta(D)[in_training,,drop=FALSE], |
|
52 |
+ variable_meta=dataset.variable_meta(D), |
|
53 |
+ name=c(name(D),'(Training set)'), |
|
54 |
+ description=c(description(D),'A subset of the data has been selected as a training set')) |
|
55 |
+ testing=dataset(data=dataset.data(D)[-in_training,,drop=FALSE], |
|
56 |
+ sample_meta=dataset.sample_meta(D)[-in_training,,drop=FALSE], |
|
57 |
+ variable_meta=dataset.variable_meta(D), |
|
58 |
+ name=c(name(D),'(Testing set)'), |
|
59 |
+ description=c(description(D),'A subset of the data has been selected as a test set')) |
|
60 |
+ output.value(M,'training')=training |
|
61 |
+ output.value(M,'testing')=testing |
|
62 |
+ |
|
63 |
+ return(M) |
|
64 |
+ } |
|
65 |
+) |