Skip to main content
Account

Table 1 Dataset overview, where N, UT, S, UT/S, NF:CF, UF, R and ID indicate the name, the number of unique targets, the set size, the quotient of the previous two, the numbers of nominal and continuous features (excluding the target), the number of unique feature values, the reference and the OpenML ID, respectively; “sk” indicates that the dataset is available on scikit-learn; the IDs in parentheses refer to the original sets

From: TCR: topologically consistent reweighting for XGBoost in regression tasks

N

UT

S

UT/S

NF:CF

UF

R

\({\textrm{ID}}^{1}\)

Diabetes

214

442

0.48

0:10

1135

Efron et al. (2004)

\({\textrm{sk}}^{2}\)

Treasury

570

1049

0.54

0:15

8180

–/\({-}^{3}\)

42,367

Wine Quality -REG

7

6497

0.001

0:11

2119

Cortez et al. (2009)

287

Topo21

1336

8885

0.15

0:266

350,916

Feng et al. (2003)

422

Bike Sharing Demand

869

17,379

0.05

28:6

230

Fanaee-T and Gama (2014)

42,713

Mimic Los

16,420

19,196

0.86

12:31

64,254

(-by us-)

–/–

California Housing

3842

20,640

0.19

0:8

70,057

Kelley Pace and Barry (1997)

sk[2]

Online News Popularity

1454

39,644

0.04

14:44

457,216

Fernandes (2015)

4545

Diamonds

11,602

53,940

0.26

0:9

1137

–/–

42,225

BNG

78,571

78,732

0.998

4:7

543,871

–/–

1213

Superconduct

3007

21,263

0.14

0:79

573,105

Grinsztajn et al. (2022)

44,148 (43,174)

House Sales

4027

21,613

0.19

2:15

19,787

Grinsztajn et al. (2022)

44,066 (42,731)

Miami Housing 2016

13,932

13,932

1.0

0:13

108,276

Grinsztajn et al. (2022)

44,147 (43,093)

Mercedes Benz Gr. Manuf

2545

4209

0.60

376:0

2

Grinsztajn et al. (2022)

44,061 (42,570)

Yprop41 (\(\sim\) Topo21)

1336

8885

0.15

20:42

18,896

Grinsztajn et al. (2022)

44,054 (416)

Elevators

61

16,599

0.004

0:16

1945

Grinsztajn et al. (2022)

44,134 (216)

Isolet

26

7797

0.003

0:613

10,001

Grinsztajn et al. (2022)

44,135 (300)

CPU Act

56

8192

0.007

0:21

33,328

Grinsztajn et al. (2022)

44,132 (197)

Pol

11

15,000

0.001

0:26

200

Grinsztajn et al. (2022)

44,133 (201)

Ailerons

35

13,750

0.003

0:33

1933

Grinsztajn et al. (2022)

44,137 (296)

  1. \(^{1}\) Source: https://www.openml.org/
  2. \(^{2}\) Source: https://scikit-learn.org/stable/
  3. \(^{3}\) Source: https://sci2s.ugr.es/keel/dataset.php?cod=42