{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "Rmd_chunk_options": "setup, include=FALSE", "autoscroll": false, "collapsed": true }, "outputs": [], "source": [ "library(structToolbox)\n", "library(gridExtra)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "# Introduction\n", "Validation is an important aspect of chemometric modelling. The `STRUCT` framework enables this kind of iterative model testing through `iterator` objects. In order to demonstrate this we will first load the iris data set, which as been pre-prepared as a dataset object as part of the `STRUCT` package.\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "autoscroll": false, "collapsed": true }, "outputs": [], "source": [ "D = iris_dataset()\n", "summary(D)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "# Cross-validation\n", "Cross validation is a common technique for assessing the performance of classification models. For this example we will use a PLSDA model. Data should be mean centred prior to PLS, so we will build a model sequence first.\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "autoscroll": false, "collapsed": true }, "outputs": [], "source": [ "M = mean_centre() + PLSDA(number_components=2,factor_name='Species')\n", "M" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "Iterators objects like the k-fold cross-validation object can be created just like any other struct object. Parameters can be set at creation =, and accessed/changed later using dollar notation.\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "autoscroll": false, "collapsed": true }, "outputs": [], "source": [ "XCV = kfold_xval(folds=5,factor_name='Species')\n", "# change the number of folds\n", "XCV$folds=10\n", "XCV$folds" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "The model to be cross-validated can be set/accessed used the `models` method.\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "autoscroll": false, "collapsed": true }, "outputs": [], "source": [ "models(XCV)=M\n", "models(XCV)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "Alternatively, iterators can be combined with models using the multiplication symbol:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "autoscroll": false, "collapsed": true }, "outputs": [], "source": [ "XCV = kfold_xval(folds=5,method='venetian',factor_name='Species') *\n", " (mean_centre()+PLSDA(number_components = 2,factor_name='Species'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "The `run` method can be used with any iterator object. The iterator will then run the model sequence multiple times. In our case we will run cross-validation 5 times splitting the data into different training and test sets each time. The `run` method also needs a `metric` to be specified. This metric may be calculated once after all iterations, or after each iteration, depending on the iterator type (resampling, permutation etc). For cross-validation we will calculate balanced accuracy after all iterations.\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "autoscroll": false, "collapsed": true }, "outputs": [], "source": [ "XCV = run(XCV,D,balanced_accuracy())\n", "XCV$metric" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "Like other `STRUCT` objects, iterators can have chart objects associated with them. The `chart.names` function will list them for an object.\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "autoscroll": false, "collapsed": true }, "outputs": [], "source": [ "chart.names(XCV)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "Charts for iterator objects can be plotted in the same way as charts for any other object.\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "Rmd_chunk_options": "warning=FALSE", "autoscroll": false, "collapsed": true }, "outputs": [], "source": [ "C = kfoldxcv_grid()\n", "chart.plot(C,XCV)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "It is possible to combine multiple iterators by multiplying them together. This is equivalent to nesting one iterator inside the other. For example, we can repeat our cross-validation multiple times by permuting the sample order.\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "autoscroll": false, "collapsed": true }, "outputs": [], "source": [ "P = permute_sample_order(number_of_permutations = 10) *\n", " kfold_xval(folds=5,factor_name='Species')*\n", " (mean_centre() + PLSDA(factor_name='Species',number_components=2))\n", "P = run(P,D,balanced_accuracy())\n", "P$metric" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n" ] } ], "metadata": { "Rmd_header": { "author": "Dr Gavin Rhys Lloyd", "output": { "html_document": { "keep_md": true }, "md_document": { "df_print": "kable", "html_preview": false, "variant": "markdown_github" } }, "package": "structToolbox", "title": "Iterator objects", "vignette": "\n%\\VignetteIndexEntry{Iterator objects}\n%\\VignetteEngine{knitr::rmarkdown}\n%\\VignetteEncoding{UTF-8}\n" }, "kernelspec": { "display_name": "R", "language": "R", "name": "ir" }, "language_info": { "codemirror_mode": "r", "file_extension": ".r", "mimetype": "text/x-r-source", "name": "R", "pygments_lexer": "r" } }, "nbformat": 4, "nbformat_minor": 0 }