Browse code

update vignettes

grlloyd authored on 25/06/2019 15:40:06
Showing 5 changed files

... ...
@@ -1,3 +1,5 @@
1
+^docs$
2
+^_pkgdown\.yml$
1 3
 ^codecov\.yml$
2 4
 ^Meta$
3 5
 ^doc$
... ...
@@ -1,3 +1,5 @@
1
+Meta
2
+doc
1 3
 .Rproj.user
2 4
 .Rhistory
3 5
 .RData
... ...
@@ -12,5 +14,4 @@ inst/doc
12 14
 *.log
13 15
 structtoolbox.Rproj
14 16
 *.Rproj
15
-*.Rproj
16 17
 *.tiff
... ...
@@ -35,6 +35,7 @@ Collate:
35 35
     'forward_selection_by_rank_class.R'
36 36
     'ggplot_theme_pub.R'
37 37
     'glog_class.R'
38
+    'glog_transform_class.R'
38 39
     'grid_search_1d_class.R'
39 40
     'hca_class.R'
40 41
     'kfold_xval_class.R'
41 42
deleted file mode 100644
... ...
@@ -1,154 +0,0 @@
1
-title: 'Example 1: Dataset and model objects'
2
-output: rmarkdown::html_vignette
3
-vignette: >
4
-    %\VignetteEngine{knitr::rmarkdown}
5
-    %\VignetteEncoding{UTF-8}
6
-
7
-The dataset object is designed to hold information relevant for a dataset, such as the raw data, sample meta data and variable metadata. We will use Fisher's Iris dataset as an example.
8
-
9
-```{r}
10
-library('structToolbox')
11
-X=iris
12
-summary(X)
13
-```
14
-
15
-First, we create a dataset object to hold the data. This will make the data compatible with the rest of the struct package objects. Note that fields such as 'name' and 'description' can be set when the object is created.
16
-```{r}
17
-D=dataset(data=X[,1:4],
18
-    sample_meta=X[,5,drop=FALSE],
19
-    name='Iris data',
20
-    description='The data used in Fisher\'s paper')
21
-D
22
-```
23
-
24
-Alternatively fields can be set for the dataset object after object creation. Valid fields for dataset objects are *name*, *description*, *type*, *sample_meta*, *variable_meta* and *data*.
25
-```{r}
26
-name(D)='Fisher\'s Iris data'
27
-D
28
-```
29
-
30
-Fields in the dataset object can be retrieved using `$` notation:
31
-```{r}
32
-# name stored in object D
33
-name(D)
34
-# summary of sample meta data for D
35
-summary(D$sample_meta)
36
-```
37
-
38
-Model objects can be used to apply methods such as Principal Component Analysis to a dataset. First the object has to be created. Model objects are also used for preprocessing steps that need to be applied on a training/test set basis, such as Mean Centring.
39
-
40
-```{r}
41
-M=mean_centre()
42
-M
43
-```
44
-
45
-Model objects have some fields in common with dataset object, such as 'name' and 'description' that can be accessed in the same way as for the dataset object.
46
-
47
-```{r}
48
-name(M)
49
-```
50
-
51
-Model objects also have 'param' and 'output' functions for getting/setting model specific parameters and outputs.
52
-
53
-```{r}
54
-# list the valid outputs from the mean centring object
55
-output.ids(M)
56
-```
57
-
58
-Model objects can be trained using using dataset objects as input. For the mean_centre object this calculates the mean of each column and stores it in the 'mean' output.
59
-
60
-```{r}
61
-# train the model using the data in D
62
-M=model.train(M,D)
63
-output.value(M,'mean_data')
64
-```
65
-
66
-Model objects also have a 'predict' method that allows a trained model to be applied to e.g. test data if required. We don't have test data for this example, so we'll just use the training data. The mean centred data is returned as output 'centred'
67
-
68
-```{r}
69
-M=model.predict(M,D)
70
-Dc=M$centred # a dataset object
71
-# verify the data is column centred (colMeans should be 0, or very close)
72
-colMeans(Dc$data)
73
-```
74
-
75
-
76
-Now that we have centred the data we can apply Principal Component Analysis (PCA). First we create the object. Note that we can create the object and set parameter values at the same time.
77
-
78
-```{r}
79
-P=PCA('number_components'=2)
80
-```
81
-
82
-
83
-The names of valid parameters for a model object can retrieved as a list.
84
-
85
-```{r}
86
-param.ids(P)
87
-```
88
-
89
-Parameter values can be set and retrieved using the 'param' function combined with the parameter name.
90
-```{r}
91
-# set the number of components to 5
92
-param.value(P,'number_components')=5
93
-# get the number of components
94
-param.value(P,'number_components')
95
-```
96
-
97
-A list of all parameter - value pairs can be retrieved using the param.list function.
98
-```{r}
99
-L=param.list(P)
100
-L
101
-# change number of components to 4
102
-L$number_components=4
103
-param.list(P)=L
104
-param.value(P,'number_components')
105
-```
106
-
107
-The PCA model object can be trained in the same way as the mean_centre object, but this time we will input the mean centred dataset object.
108
-
109
-```{r, fig.height=5, fig.width=5}
110
-P=model.train(P,Dc) # train using the Iris data object
111
-```
112
-
113
-Outputs can be accessed in a similar way to parameters.
114
-
115
-```{r}
116
-# valid outputs for PCA model
117
-output.ids(P)
118
-```
119
-```{r}
120
-# get the PCA scores
121
-scores=output.value(P,'scores')
122
-summary(scores)
123
-```
124
-
125
-The `chart.names` function can be sued to list charts for the input object, in this case a PCA object.
126
-
127
-```{r, fig.height=5, fig.width=5}
128
-chart.names(P)
129
-```
130
-
131
-Note Note that charts are objects in their own right within the `struct` framework. The `chart.plot` function can be used with a valid chart object to plot the chart.
132
-
133
-```{r, fig.height=5, fig.width=5}
134
-C=pca_scores_plot(groups=1) # chart object
135
-chart.plot(C,P)
136
-```
137
-
138
-The default values for chart title, axis labels etc are used unless a list of options is included in the chart.plot function. Options for a specific chart can be obtained using the `params.list` function.
139
-
140
-```{r, fig.height=5, fig.width=5}
141
-# get options for the scores plot
142
-opt=param.list(C)
143
-# change the colouring to be related to the factor of interest
144
-opt$factor_name='Species'
145
-opt$groups=Dc$sample_meta$Species
146
-opt$points_to_label='none'
147
-param.list(C)=opt
148
-# plot the chart with the new options
149
-chart.plot(C,P)
150
-```
151
-
152
-
153 0
new file mode 100644
... ...
@@ -0,0 +1,148 @@
1
+---
2
+title: "Model objects"
3
+author: "Dr Gavin Rhys Lloyd"
4
+date: "25/06/2019"
5
+output: 
6
+    html_document:
7
+        df_print: paged
8
+        highlight: tango
9
+vignette: >
10
+  %\VignetteIndexEntry{Vignette Title}
11
+  %\VignetteEngine{knitr::rmarkdown}
12
+  %\VignetteEncoding{UTF-8}
13
+---
14
+
15
+```{r setup, include=FALSE}
16
+knitr::opts_chunk$set(
17
+    collapse = TRUE,
18
+    comment = "#>",
19
+    fig.align = 'center'
20
+)
21
+library(structToolbox)
22
+library(gridExtra)
23
+```
24
+
25
+</br></br>
26
+
27
+# Introduction
28
+PCA (Principal Component Analysis) is a commonly applied method for exploring multivariate datasets. We will use the iris dataset as an example, which is included in the package and already prepared as a dataset object.
29
+
30
+```{r}
31
+D = iris_dataset()
32
+D$data
33
+```
34
+ 
35
+</br></br>
36
+
37
+# PCA model
38
+Before we apply PCA we first need to create a PCA object. This object contains all the inputs, outputs and methods needed to apply PCA. We can set parameters such as the number of components when the PCA model is created, but we can also use dollar notation to change/view it later. 
39
+
40
+```{r}
41
+P = PCA(number_components=15)
42
+P$number_components=5
43
+P$number_components
44
+```
45
+  
46
+The inputs for a model can be listed using `param.ids(object)`:
47
+
48
+```{r}
49
+param.ids(P)
50
+```
51
+</br></br>
52
+
53
+# Model sequences
54
+Unless you have very good reason not to, it is usally sensible to mean centre the columns of the data before PCA. Using the `STRUCT` framework we can create a model sequence that will mean centre and then apply PCA to the mean centred data.
55
+
56
+```{r}
57
+M = mean_centre() + PCA(number_components = 4)
58
+```
59
+  
60
+In `STRUCT` mean centring and PCA are both model objects, and therefore joining them creates a model.sequence object. The objects in the sequence can be accessed by indexing, and we can combine this with dollar notation. For example, the PCA object is the second object in our sequence and we can access the number of components like this:
61
+
62
+```{r}
63
+M[2]$number_components
64
+```
65
+</br></br>
66
+
67
+# Training/testing models
68
+Model and model.sequence objects need to be trained using a training dataset.
69
+
70
+```{r}
71
+M = model.train(M,D)
72
+```
73
+  
74
+Model objects can be used to generate predictions for test datasets. For this example we will just use the training data (sometimes called autoprediction).
75
+
76
+```{r}
77
+M = model.predict(M,D)
78
+```
79
+
80
+The available outputs for an object can be listed and accessed using dollar notation:
81
+  
82
+```{r}
83
+output.ids(M[2])
84
+M[2]$scores
85
+```
86
+</br></br>
87
+
88
+# Model charts
89
+The struct framework includes charts. Charts associated with a model object can be listed.
90
+
91
+```{r}
92
+chart.names(M[2])
93
+```
94
+  
95
+Like model objects, chart objects need to be created before they can be used. Here we will plot the PCA scores plot for our mean centred PCA model.
96
+
97
+```{r}
98
+C = pca_scores_plot(groups=D$sample_meta$Species,factor_name='Species') # colour by Species
99
+chart.plot(C,M[2])
100
+```
101
+  
102
+If we makes changes to our chart object, we must call `chart.plot` again.
103
+
104
+```{r}
105
+C$groups = D$data$Petal.Width
106
+C$factor_name='Petal.Width'
107
+chart.plot(C,M[2])
108
+```
109
+  
110
+The `chart.plot` method can return e.g. a ggplot object so that you can easily combine it with other plots using the gridExtra package for example.
111
+
112
+```{r,fig.width=10}
113
+C1 = pca_scores_plot(groups=D$sample_meta$Species,factor_name='Species') # colour by Species
114
+g1 = chart.plot(C1,M[2])
115
+C2 = PCA.scree()
116
+g2 = chart.plot(C2,M[2])
117
+grid.arrange(grobs=list(g1,g2),nrow=1)
118
+```
119
+</br></br>
120
+
121
+# STATO Integration
122
+Some model objects are also STATO objects. STATO is a general purpose statistics ontology (http://stato-ontology.org/). In the `STRUCT` framework we use it to provide standarded definitions for objects. The PCA model object is also a STATO object.
123
+
124
+```{r}
125
+is(PCA(),'stato')
126
+```
127
+
128
+We can access the STATO ontology using some methods specific to stato objects.
129
+
130
+```{r}
131
+# this is the stato id for PCA
132
+stato.id(P)
133
+
134
+# this is the stato name
135
+stato.name(P)
136
+
137
+# this is the stato definition
138
+stato.definition(P)
139
+```
140
+
141
+This information is more succinctly displayed using `stato.summary`. This method also scans over all inputs and outputs for those with STATO definitions and displays those as well. For PCA the number of components is present, but none of the outputs are STATO objects and therefore no definition is provided.
142
+
143
+```{r}
144
+stato.summary(P)
145
+```
146
+
147
+
148
+