Bioconductor Code: structToolbox

Browse code

update vignettes

grlloyd authored on 25/06/2019 15:40:06
Showing 5 changed files

.Rbuildignore index a7b5bfc..ec93ec9 100644
.gitignore index 39fae77..b76877b 100644
DESCRIPTION index 783c8c6..1d2718d 100644
vignettes/example_1.Rmd index b3bad5d..0000000
vignettes/model_example.Rmd index 0000000..842ee54

History View file @ 44d3bb0

@@ -1,3 +1,5 @@
                     +^docs$
                     +^_pkgdown\.yml$
                      ^codecov\.yml$
                      ^Meta$
                      ^doc$

.gitignore

History View file @ 44d3bb0

@@ -1,3 +1,5 @@
                     +Meta
                     +doc
                      .Rproj.user
                      .Rhistory
                      .RData
@@ -12,5 +14,4 @@ inst/doc
                      *.log
                      structtoolbox.Rproj
                      *.Rproj
                     -*.Rproj
                      *.tiff

DESCRIPTION

History View file @ 44d3bb0

@@ -35,6 +35,7 @@ Collate:
                          'forward_selection_by_rank_class.R'
                          'ggplot_theme_pub.R'
                          'glog_class.R'
                     +    'glog_transform_class.R'
                          'grid_search_1d_class.R'
                          'hca_class.R'
                          'kfold_xval_class.R'

vignettes/example_1.Rmd

History View file @ 44d3bb0

                     deleted file mode 100644
@@ -1,154 +0,0 @@
                     -title: 'Example 1: Dataset and model objects'
                     -output: rmarkdown::html_vignette
                     -vignette: >
                     -    %\VignetteEngine{knitr::rmarkdown}
                     -    %\VignetteEncoding{UTF-8}
+                    -
                     -The dataset object is designed to hold information relevant for a dataset, such as the raw data, sample meta data and variable metadata. We will use Fisher's Iris dataset as an example.
+                    -
                     -```{r}
                     -library('structToolbox')
                     -X=iris
                     -summary(X)
                     -```
+                    -
                     -First, we create a dataset object to hold the data. This will make the data compatible with the rest of the struct package objects. Note that fields such as 'name' and 'description' can be set when the object is created.
                     -```{r}
                     -D=dataset(data=X[,1:4],
                     -    sample_meta=X[,5,drop=FALSE],
                     -    name='Iris data',
                     -    description='The data used in Fisher\'s paper')
                     -D
                     -```
+                    -
                     -Alternatively fields can be set for the dataset object after object creation. Valid fields for dataset objects are *name*, *description*, *type*, *sample_meta*, *variable_meta* and *data*.
                     -```{r}
                     -name(D)='Fisher\'s Iris data'
                     -D
                     -```
+                    -
                     -Fields in the dataset object can be retrieved using `$` notation:
                     -```{r}
                     -# name stored in object D
                     -name(D)
                     -# summary of sample meta data for D
                     -summary(D$sample_meta)
                     -```
+                    -
                     -Model objects can be used to apply methods such as Principal Component Analysis to a dataset. First the object has to be created. Model objects are also used for preprocessing steps that need to be applied on a training/test set basis, such as Mean Centring.
+                    -
                     -```{r}
                     -M=mean_centre()
                     -M
                     -```
+                    -
                     -Model objects have some fields in common with dataset object, such as 'name' and 'description' that can be accessed in the same way as for the dataset object.
+                    -
                     -```{r}
                     -name(M)
                     -```
+                    -
                     -Model objects also have 'param' and 'output' functions for getting/setting model specific parameters and outputs.
+                    -
                     -```{r}
                     -# list the valid outputs from the mean centring object
                     -output.ids(M)
                     -```
+                    -
                     -Model objects can be trained using using dataset objects as input. For the mean_centre object this calculates the mean of each column and stores it in the 'mean' output.
+                    -
                     -```{r}
                     -# train the model using the data in D
                     -M=model.train(M,D)
                     -output.value(M,'mean_data')
                     -```
+                    -
                     -Model objects also have a 'predict' method that allows a trained model to be applied to e.g. test data if required. We don't have test data for this example, so we'll just use the training data. The mean centred data is returned as output 'centred'
+                    -
                     -```{r}
                     -M=model.predict(M,D)
                     -Dc=M$centred # a dataset object
                     -# verify the data is column centred (colMeans should be 0, or very close)
                     -colMeans(Dc$data)
                     -```
+                    -
+                    -
                     -Now that we have centred the data we can apply Principal Component Analysis (PCA). First we create the object. Note that we can create the object and set parameter values at the same time.
+                    -
                     -```{r}
                     -P=PCA('number_components'=2)
                     -```
+                    -
+                    -
                     -The names of valid parameters for a model object can retrieved as a list.
+                    -
                     -```{r}
                     -param.ids(P)
                     -```
+                    -
                     -Parameter values can be set and retrieved using the 'param' function combined with the parameter name.
                     -```{r}
                     -# set the number of components to 5
                     -param.value(P,'number_components')=5
                     -# get the number of components
                     -param.value(P,'number_components')
                     -```
+                    -
                     -A list of all parameter - value pairs can be retrieved using the param.list function.
                     -```{r}
                     -L=param.list(P)
                     -L
                     -# change number of components to 4
                     -L$number_components=4
                     -param.list(P)=L
                     -param.value(P,'number_components')
                     -```
+                    -
                     -The PCA model object can be trained in the same way as the mean_centre object, but this time we will input the mean centred dataset object.
+                    -
                     -```{r, fig.height=5, fig.width=5}
                     -P=model.train(P,Dc) # train using the Iris data object
                     -```
+                    -
                     -Outputs can be accessed in a similar way to parameters.
+                    -
                     -```{r}
                     -# valid outputs for PCA model
                     -output.ids(P)
                     -```
                     -```{r}
                     -# get the PCA scores
                     -scores=output.value(P,'scores')
                     -summary(scores)
                     -```
+                    -
                     -The `chart.names` function can be sued to list charts for the input object, in this case a PCA object.
+                    -
                     -```{r, fig.height=5, fig.width=5}
                     -chart.names(P)
                     -```
+                    -
                     -Note Note that charts are objects in their own right within the `struct` framework. The `chart.plot` function can be used with a valid chart object to plot the chart.
+                    -
                     -```{r, fig.height=5, fig.width=5}
                     -C=pca_scores_plot(groups=1) # chart object
                     -chart.plot(C,P)
                     -```
+                    -
                     -The default values for chart title, axis labels etc are used unless a list of options is included in the chart.plot function. Options for a specific chart can be obtained using the `params.list` function.
+                    -
                     -```{r, fig.height=5, fig.width=5}
                     -# get options for the scores plot
                     -opt=param.list(C)
                     -# change the colouring to be related to the factor of interest
                     -opt$factor_name='Species'
                     -opt$groups=Dc$sample_meta$Species
                     -opt$points_to_label='none'
                     -param.list(C)=opt
                     -# plot the chart with the new options
                     -chart.plot(C,P)
                     -```
+                    -
+                    -

vignettes/model_example.Rmd

History View file @ 44d3bb0

                     new file mode 100644
@@ -0,0 +1,148 @@
                     +---
                     +title: "Model objects"
                     +author: "Dr Gavin Rhys Lloyd"
                     +date: "25/06/2019"
                     +output:
                     +    html_document:
                     +        df_print: paged
                     +        highlight: tango
                     +vignette: >
                     +  %\VignetteIndexEntry{Vignette Title}
                     +  %\VignetteEngine{knitr::rmarkdown}
                     +  %\VignetteEncoding{UTF-8}
                     +---
+                    +
                     +```{r setup, include=FALSE}
                     +knitr::opts_chunk$set(
                     +    collapse = TRUE,
                     +    comment = "#>",
                     +    fig.align = 'center'
                     +)
                     +library(structToolbox)
                     +library(gridExtra)
                     +```
+                    +
                     +</br></br>
+                    +
                     +# Introduction
                     +PCA (Principal Component Analysis) is a commonly applied method for exploring multivariate datasets. We will use the iris dataset as an example, which is included in the package and already prepared as a dataset object.
+                    +
                     +```{r}
                     +D = iris_dataset()
                     +D$data
                     +```
+                    +
                     +</br></br>
+                    +
                     +# PCA model
                     +Before we apply PCA we first need to create a PCA object. This object contains all the inputs, outputs and methods needed to apply PCA. We can set parameters such as the number of components when the PCA model is created, but we can also use dollar notation to change/view it later.
+                    +
                     +```{r}
                     +P = PCA(number_components=15)
                     +P$number_components=5
                     +P$number_components
                     +```
+                    +
                     +The inputs for a model can be listed using `param.ids(object)`:
+                    +
                     +```{r}
                     +param.ids(P)
                     +```
                     +</br></br>
+                    +
                     +# Model sequences
                     +Unless you have very good reason not to, it is usally sensible to mean centre the columns of the data before PCA. Using the `STRUCT` framework we can create a model sequence that will mean centre and then apply PCA to the mean centred data.
+                    +
                     +```{r}
                     +M = mean_centre() + PCA(number_components = 4)
                     +```
+                    +
                     +In `STRUCT` mean centring and PCA are both model objects, and therefore joining them creates a model.sequence object. The objects in the sequence can be accessed by indexing, and we can combine this with dollar notation. For example, the PCA object is the second object in our sequence and we can access the number of components like this:
+                    +
                     +```{r}
                     +M[2]$number_components
                     +```
                     +</br></br>
+                    +
                     +# Training/testing models
                     +Model and model.sequence objects need to be trained using a training dataset.
+                    +
                     +```{r}
                     +M = model.train(M,D)
                     +```
+                    +
                     +Model objects can be used to generate predictions for test datasets. For this example we will just use the training data (sometimes called autoprediction).
+                    +
                     +```{r}
                     +M = model.predict(M,D)
                     +```
+                    +
                     +The available outputs for an object can be listed and accessed using dollar notation:
+                    +
                     +```{r}
                     +output.ids(M[2])
                     +M[2]$scores
                     +```
                     +</br></br>
+                    +
                     +# Model charts
                     +The struct framework includes charts. Charts associated with a model object can be listed.
+                    +
                     +```{r}
                     +chart.names(M[2])
                     +```
+                    +
                     +Like model objects, chart objects need to be created before they can be used. Here we will plot the PCA scores plot for our mean centred PCA model.
+                    +
                     +```{r}
                     +C = pca_scores_plot(groups=D$sample_meta$Species,factor_name='Species') # colour by Species
                     +chart.plot(C,M[2])
                     +```
+                    +
                     +If we makes changes to our chart object, we must call `chart.plot` again.
+                    +
                     +```{r}
                     +C$groups = D$data$Petal.Width
                     +C$factor_name='Petal.Width'
                     +chart.plot(C,M[2])
                     +```
+                    +
                     +The `chart.plot` method can return e.g. a ggplot object so that you can easily combine it with other plots using the gridExtra package for example.
+                    +
                     +```{r,fig.width=10}
                     +C1 = pca_scores_plot(groups=D$sample_meta$Species,factor_name='Species') # colour by Species
                     +g1 = chart.plot(C1,M[2])
                     +C2 = PCA.scree()
                     +g2 = chart.plot(C2,M[2])
                     +grid.arrange(grobs=list(g1,g2),nrow=1)
                     +```
                     +</br></br>
+                    +
                     +# STATO Integration
                     +Some model objects are also STATO objects. STATO is a general purpose statistics ontology (http://stato-ontology.org/). In the `STRUCT` framework we use it to provide standarded definitions for objects. The PCA model object is also a STATO object.
+                    +
                     +```{r}
                     +is(PCA(),'stato')
                     +```
+                    +
                     +We can access the STATO ontology using some methods specific to stato objects.
+                    +
                     +```{r}
                     +# this is the stato id for PCA
                     +stato.id(P)
+                    +
                     +# this is the stato name
                     +stato.name(P)
+                    +
                     +# this is the stato definition
                     +stato.definition(P)
                     +```
+                    +
                     +This information is more succinctly displayed using `stato.summary`. This method also scans over all inputs and outputs for those with STATO definitions and displays those as well. For PCA the number of components is present, but none of the outputs are STATO objects and therefore no definition is provided.
+                    +
                     +```{r}
                     +stato.summary(P)
                     +```
+                    +
+                    +
+                    +