Table 1 Dataset overview, where N, UT, S, UT/S, NF:CF, UF, R and ID indicate the name, the number of unique targets, the set size, the quotient of the previous two, the numbers of nominal and continuous features (excluding the target), the number of unique feature values, the reference and the OpenML ID, respectively; “sk” indicates that the dataset is available on scikit-learn; the IDs in parentheses refer to the original sets
From: TCR: topologically consistent reweighting for XGBoost in regression tasks
N | UT | S | UT/S | NF:CF | UF | R | \({\textrm{ID}}^{1}\) |
|---|---|---|---|---|---|---|---|
Diabetes | 214 | 442 | 0.48 | 0:10 | 1135 | Efron et al. (2004) | \({\textrm{sk}}^{2}\) |
Treasury | 570 | 1049 | 0.54 | 0:15 | 8180 | –/\({-}^{3}\) | 42,367 |
Wine Quality -REG | 7 | 6497 | 0.001 | 0:11 | 2119 | Cortez et al. (2009) | 287 |
Topo21 | 1336 | 8885 | 0.15 | 0:266 | 350,916 | Feng et al. (2003) | 422 |
Bike Sharing Demand | 869 | 17,379 | 0.05 | 28:6 | 230 | Fanaee-T and Gama (2014) | 42,713 |
Mimic Los | 16,420 | 19,196 | 0.86 | 12:31 | 64,254 | (-by us-) | –/– |
California Housing | 3842 | 20,640 | 0.19 | 0:8 | 70,057 | Kelley Pace and Barry (1997) | sk[2] |
Online News Popularity | 1454 | 39,644 | 0.04 | 14:44 | 457,216 | Fernandes (2015) | 4545 |
Diamonds | 11,602 | 53,940 | 0.26 | 0:9 | 1137 | –/– | 42,225 |
BNG | 78,571 | 78,732 | 0.998 | 4:7 | 543,871 | –/– | 1213 |
Superconduct | 3007 | 21,263 | 0.14 | 0:79 | 573,105 | Grinsztajn et al. (2022) | 44,148 (43,174) |
House Sales | 4027 | 21,613 | 0.19 | 2:15 | 19,787 | Grinsztajn et al. (2022) | 44,066 (42,731) |
Miami Housing 2016 | 13,932 | 13,932 | 1.0 | 0:13 | 108,276 | Grinsztajn et al. (2022) | 44,147 (43,093) |
Mercedes Benz Gr. Manuf | 2545 | 4209 | 0.60 | 376:0 | 2 | Grinsztajn et al. (2022) | 44,061 (42,570) |
Yprop41 (\(\sim\) Topo21) | 1336 | 8885 | 0.15 | 20:42 | 18,896 | Grinsztajn et al. (2022) | 44,054 (416) |
Elevators | 61 | 16,599 | 0.004 | 0:16 | 1945 | Grinsztajn et al. (2022) | 44,134 (216) |
Isolet | 26 | 7797 | 0.003 | 0:613 | 10,001 | Grinsztajn et al. (2022) | 44,135 (300) |
CPU Act | 56 | 8192 | 0.007 | 0:21 | 33,328 | Grinsztajn et al. (2022) | 44,132 (197) |
Pol | 11 | 15,000 | 0.001 | 0:26 | 200 | Grinsztajn et al. (2022) | 44,133 (201) |
Ailerons | 35 | 13,750 | 0.003 | 0:33 | 1933 | Grinsztajn et al. (2022) | 44,137 (296) |