In [None]:
library(structToolbox)
library(gridExtra)



# Introduction
Validation is an important aspect of chemometric modelling. The `STRUCT` framework enables this kind of iterative model testing through `iterator` objects. In order to demonstrate this we will first load the iris data set, which as been pre-prepared as a dataset object as part of the `STRUCT` package.



In [None]:
D = iris_dataset()
summary(D)



# Cross-validation
Cross validation is a common technique for assessing the performance of classification models. For this example we will use a PLSDA model. Data should be mean centred prior to PLS, so we will build a model sequence first.



In [None]:
M = mean_centre() + PLSDA(number_components=2,factor_name='Species')
M


Iterators objects like the k-fold cross-validation object can be created just like any other struct object. Parameters can be set at creation =, and accessed/changed later using dollar notation.



In [None]:
XCV = kfold_xval(folds=5,factor_name='Species')
# change the number of folds
XCV$folds=10
XCV$folds


The model to be cross-validated can be set/accessed used the `models` method.



In [None]:
models(XCV)=M
models(XCV)


Alternatively, iterators can be combined with models using the multiplication symbol:


In [None]:
XCV = kfold_xval(folds=5,method='venetian',factor_name='Species') *
 (mean_centre()+PLSDA(number_components = 2,factor_name='Species'))


The `run` method can be used with any iterator object. The iterator will then run the model sequence multiple times. In our case we will run cross-validation 5 times splitting the data into different training and test sets each time. The `run` method also needs a `metric` to be specified. This metric may be calculated once after all iterations, or after each iteration, depending on the iterator type (resampling, permutation etc). For cross-validation we will calculate balanced accuracy after all iterations.



In [None]:
XCV = run(XCV,D,balanced_accuracy())
XCV$metric



Like other `STRUCT` objects, iterators can have chart objects associated with them. The `chart.names` function will list them for an object.



In [None]:
chart.names(XCV)


Charts for iterator objects can be plotted in the same way as charts for any other object.



In [None]:
C = kfoldxcv_grid()
chart.plot(C,XCV)


It is possible to combine multiple iterators by multiplying them together. This is equivalent to nesting one iterator inside the other. For example, we can repeat our cross-validation multiple times by permuting the sample order.



In [None]:
P = permute_sample_order(number_of_permutations = 10) *
 kfold_xval(folds=5,factor_name='Species')*
 (mean_centre() + PLSDA(factor_name='Species',number_components=2))
P = run(P,D,balanced_accuracy())
P$metric