Bioconductor Code: structToolbox

Raw Blame Patch Log History
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "Rmd_chunk_options": "setup, include=FALSE",
    "autoscroll": false,
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "library(structToolbox)\n",
    "library(gridExtra)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n",
    "# Introduction\n",
    "Validation is an important aspect of chemometric modelling. The `STRUCT` framework enables this kind of iterative model testing through `iterator` objects. In order to demonstrate this we will first load the iris data set, which as been pre-prepared as a dataset object as part of the `STRUCT` package.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "autoscroll": false,
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "D = iris_dataset()\n",
    "summary(D)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n",
    "# Cross-validation\n",
    "Cross validation is a common technique for assessing the performance of classification models. For this example we will use a PLSDA model. Data should be mean centred prior to PLS, so we will build a model sequence first.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "autoscroll": false,
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "M = mean_centre() + PLSDA(number_components=2,factor_name='Species')\n",
    "M"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "Iterators objects like the k-fold cross-validation object can be created just like any other struct object. Parameters can be set at creation =, and accessed/changed later using dollar notation.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "autoscroll": false,
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "XCV = kfold_xval(folds=5,factor_name='Species')\n",
    "# change the number of folds\n",
    "XCV$folds=10\n",
    "XCV$folds"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "The model to be cross-validated can be set/accessed used the `models` method.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "autoscroll": false,
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "models(XCV)=M\n",
    "models(XCV)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "Alternatively,  iterators can be combined with models using the multiplication symbol:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "autoscroll": false,
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "XCV = kfold_xval(folds=5,method='venetian',factor_name='Species') *\n",
    "      (mean_centre()+PLSDA(number_components = 2,factor_name='Species'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "The `run` method can be used with any iterator object. The iterator will then run the model sequence multiple times. In our case we will run cross-validation 5 times splitting the data into different training and test sets each time. The `run` method also needs a `metric` to be specified. This metric may be calculated once after all iterations, or after each iteration, depending on the iterator type (resampling, permutation etc). For cross-validation we will calculate balanced accuracy after all iterations.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "autoscroll": false,
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "XCV = run(XCV,D,balanced_accuracy())\n",
    "XCV$metric"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n",
    "Like other `STRUCT` objects, iterators can have chart objects associated with them. The `chart.names` function will list them for an object.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "autoscroll": false,
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "chart.names(XCV)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "Charts for iterator objects can be plotted in the same way as charts for any other object.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "Rmd_chunk_options": "warning=FALSE",
    "autoscroll": false,
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "C = kfoldxcv_grid()\n",
    "chart.plot(C,XCV)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "It is possible to combine multiple iterators by multiplying them together. This is equivalent to nesting one iterator inside the other. For example, we can repeat our cross-validation multiple times by permuting the sample order.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "autoscroll": false,
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "P = permute_sample_order(number_of_permutations = 10) *\n",
    "    kfold_xval(folds=5,factor_name='Species')*\n",
    "    (mean_centre() + PLSDA(factor_name='Species',number_components=2))\n",
    "P = run(P,D,balanced_accuracy())\n",
    "P$metric"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "\n"
   ]
  }
 ],
 "metadata": {
  "Rmd_header": {
   "author": "Dr Gavin Rhys Lloyd",
   "output": {
    "html_document": {
     "keep_md": true
    },
    "md_document": {
     "df_print": "kable",
     "html_preview": false,
     "variant": "markdown_github"
    }
   },
   "package": "structToolbox",
   "title": "Iterator objects",
   "vignette": "\n%\\VignetteIndexEntry{Iterator objects}\n%\\VignetteEngine{knitr::rmarkdown}\n%\\VignetteEncoding{UTF-8}\n"
  },
  "kernelspec": {
   "display_name": "R",
   "language": "R",
   "name": "ir"
  },
  "language_info": {
   "codemirror_mode": "r",
   "file_extension": ".r",
   "mimetype": "text/x-r-source",
   "name": "R",
   "pygments_lexer": "r"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}