/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
// Welcome to the Directory containing all java files for source of paper “Lossless or Quantized Boosting with Integer Arithmetic” //
// + two example datasets used in the paper (UCIs breastwisc, qsar)                                                                //
//                                                                                                                                 //
// * THIS CODE IS PROVIDED WITH NO GUARANTEE WHATSOEVER: USE IT AT YOUR OWN RISKS *                                                //
//                                                                                                                                 //
// Feedback welcomed: richard.nock@data61.csiro.au                                                                                 //
/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////


////// COMPILING AND RUNNING FROM EXAMPLE RESOURCE FILE

To ease Java compilation, you may run compile.sh

To run, example: 

java -Xmx10000m Experiments -R resource_qsar.txt



////// RESOURCE FILE

A resource file contains three parts:

** 1: where to find data

//Directory where to find domains
@DIRECTORY,Datasets

//Domain prefix name
@PREFIX,qsar

This indicates that all files related to this run must be in subdirectory Datasets/qsar/

Two file need to be there, a "feature" file containing the description of the features and class, and a "data" file giving the examples

** 2: parameterisation of RatBoostE

//Ful rational algo: params
@EPSILON_DELTA,10000
@EPSILON_OPERATOR,1000
@WARNING_LONG,1000000000

* EPSILON_OPERATOR gives the approximation order of input values as fractions. The stored number is ˜ numerator(round (x * EPSILON_OPERATOR)), denominator(EPSILON_OPERATOR). In this case, we approximately store x * 10^3 as a fraction.
* EPSILON_DELTA does the same as EPSILON_OPERATOR for the output classifier. In this case, this amounts to store reals up to 4th digit after , as fractions.
* WARNING_LONG is the maximal Java.long which triggers warnings for quantisation in RatBoostE (makes algorithm faster; the larger the number, the more lossless is the algorithm).

The larger these numbers, the slower RatBoostE but the more lossless the encoding.


////// OUTPUT

The output (command line) looks like e.g. (for * each * fold of a dataset):

 * @RatBoost_E[1023] > Starting fold 1/10 -- (Creating rational ops........... ok.) 0% [6/10000 Mb] 20% [143/10000 Mb] 40% [598/10000 Mb] 60% [1053/10000 Mb] 80% [57/10000 Mb] ok.     (perr err l2 sup) = (0.0543 0.0652 18.5270 0.7500)

OR for example

 * @RatBoost_Q[15][true] > Starting fold 1/10 -- 0% [1211/10000 Mb] 20% [1211/10000 Mb] 40% [1211/10000 Mb] 60% [1211/10000 Mb] 80% [719/10000 Mb] ok.  (perr err l2 sup) = (0.0284 0.0145 4.8440 0.7500)

** interpretation:

1- Algorithms have similar names as in the paper : 

@RatBoost_E[1023] = RatBoost_E[number of different weight values, bitsize = log2(N+1)]; "Creating rational ops........... ok.": algorithm converts input to fractions
@AdaBoost_R, AdaBoostSS = AdaBoost_R (Nock & Nielsen) or SS = Schapire+Singer 
@RatBoost_Q[N][boolean] = RatBoostQ; N = 2^{b}-1 = number of values for quantisation (needs to be odd int); boolean = yes iff quantisation is stochastic (algorithm called RatBoostS in paper)
@RatBoost_A[N][boolean] = same as RatBoostQ but using adaptive quantisation; note: the implementation authorises to use stochastic assignation of weights when boolean = true (not reported in paper)

2- All algorithms, (perr err l2 sup) on fold = (empirical error, estimated test error, l2 norm of linear separator, L0 norm / dimension)
