SlideShare a Scribd company logo
Feature Subset Selection for Learning
Huge Configuration Spaces
The case of Linux Kernel Size
Mathieu Acher, Hugo Martin, Juliana Alves Pereira, Luc Lesoil, Arnaud
Blouin, Jean-Marc Jézéquel, Djamel Eddine Khelladi, Olivier Barais
Preprint: https://hal.inria.fr/hal-03720273
15,000+
options
Linux 5.2.8, arm
(% of types’ options)
39000
26000
≈106000
variants
(without constraints) 2
100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Linux Kernel
≈106000
variants
≈1080
is the estimated number of atoms
in the universe
≈1040
is the estimated number of
possible chess positions
3
Dimensionality reduction with feature selection
Huge configuration space ≈106000
configurations
Large option/feature* set: 9K+ options for x86_64
Hypothesis: only a
subset of options
matter when
predicting
properties of
variants
4
*options (~Linux features) are encoded as features (~predictive variables in learning problems)
Dimensionality reduction with feature selection
Huge configuration space ≈106000
configurations
Large option/feature set: 9K+ options for x86_64
Hypothesis: only a subset of
options matter when predicting
properties of variants.
Very few studies at this scale
p options p’ options with p’ << p
n configurations
5
Hypothesis: Only a subset of options matter when predicting
properties of variants. Key results:
● Some state-of-the-art solutions are not scaling
due to “too many feature interactions” (think
about combinatorial with thousands of features!)
● Only ~300 features* (instead of 9K+) are
sufficient to efficiently predict and even
outperforms the accuracy of “learning over all
features/options”
● Training time can be decreased
● Identification of influential options is
consistent with, and can even improve, the
expert knowledge about Linux kernel
configuration.
6
*options (~Linux features) are encoded as features (~predictive
variables in learning problems)
Configurable
software
system
Configurations Variants Quantitative
property
(eg related to performance,
security, energy consumption)
176.8Mb
Linux kernel
.config
(compile-time/Kconfig)
Kernel variants
(binaries)
binary size7
Configurable
software
system
Configurations Variants Quantitative
property
(eg related to performance,
security, energy consumption)
16.1Mb
Linux kernel
.config
(compile-time/Kconfig)
Kernel variants
(binaries)
binary size8
Configurable
software
system
Configurations Variants Quantitative
property
(eg related to performance,
security, energy consumption)
176.8Mb
Linux kernel
.config
(compile-time/Kconfig)
Kernel variants
(binaries)
binary size
16.1Mb
77.2Mb
9
Configurable
software
system
Configurations Variants Quantitative
property
(eg related to performance,
security, energy consumption)
Linux kernel
.config
(compile-time/Kconfig)
Kernel variants
(binaries)
binary size
?
10
Challenge: you cannot build ≈106000
configurations; sampling and
learning to the rescue but…
Is it accurate? Is it effective with p’ features and feature selection?
How many features*? Which options* matter?
7.1Mb
176.8Mb
?
11
p’ options with p’ << p
*options (~Linux features) are encoded as features (~predictive
variables in learning problems)
A challenging case
● Targeted non-functional, quantitative
property: binary size
○ interest for maintainers/users of the Linux
kernel (embedded systems, cloud, etc.)
○ challenging to predict (cross-cutting
options, interplay with compilers/build
systems, etc.)
● Dataset: version 4.13.3, x86_64 arch,
measurements of 95K+ random
configurations
○ paranoiac about deep variability since
2017: Docker to control the build
environment and scale
○ build: 8 minutes on average
○ diversity: from 7Mb to 1.9Gb 12
TUXML: Sampling, Measuring, Learning
13
Most of the work consider a relatively low number of options (<50) Linux has 9K+ options for x86_64
Feature subset selection vs recursive feature elimination: scale? accuracy?
*EX: execution, SI: simulation, SA: static analysis, UF: user feedback, SM: synthetic measurements.
TUXML: Sampling, Measuring, Learning
Docker for a reproducible environment
with tools/packages needed
and Python procedures inside
Easy to launch campaign:
”python kernel_generator.py 10”
builds/measures
10 random configurations
(information sent to a database)
https://github.com/TuxML/
14
TUXML: Sampling, Measuring, Learning
Docker for a reproducible environment
with tools/packages needed
and Python procedures inside
Easy to launch campaign:
”python3 kernel_generator.py 10”
builds/measures
10 random configurations
(information sent to a database)
https://github.com/TuxML/
15
Data: version 4.13.3 (x86_64)
95K+ configurations for Linux 4.13.3
(and 15K hours of computation on a grid computing)
16
RQ1: How do SOTA
techniques perform on
huge configuration spaces?
● Linear-based algorithms : high error rate (it’s not additive!)
● Polynomial regression & performance-influence model : Out Of Memory (too
much interactions and not designed for 9K+ options)
● Tree-based algorithms & neural networks: low error rate
Mean Absolute Percentage Error
(MAPE): the lower the better
17
N : percentage of the
dataset used to training
Dimensionality reduction with feature selection
Huge configuration space ≈106000
configurations
Large options/feature set: 9K+ options for x86_64
Only a subset of options matter when
predicting properties of variants.
RQ2: How accurate is the prediction
model with and without feature selection?
p options p’ options with p’ << p
n configurations
18
Dimensionality reduction with Tree-based feature selection
Tree-based algorithm
(Random Forest)
p=8.743 options
Learn on
Full dataset
p’ <<<<< p options
Reduced dataset
Filter
Any learning algorithm
Learn on
DEBUG_INFO (0.33)
active_options (0.19)
group_129 (0.14)
DEBUG_INFO_REDUCED (0.11)
DEBUG_INFO_SPLIT (0.08)
feature ranking
list
(based on
feature
importance)
19
RQ2: Tree-based Feature Selection pays off!
● Tree-based algorithms & neural
networks:
○ Lower error rate
○ Lower training time
■ Random forest : 18x
■ Gradient Boosting Tree : 5x
● Simpler models, easier to train,
and improved accuracy
● Bonus: interpretable and
consistent with domain
knowledge
20
RQ2: Optimal number of features/options when performing
feature selection
● Depending on algorithm
○ Gradient Boosting Trees &
Neural networks : 1500
● Depending on training set size
● Random forest : 250 options
Sweet spot where only ~300
features are sufficient to efficiently
train a Random Forest and a
Gradient Boosting Tree to obtain a
prediction model that outperforms
other baselines operating over the
full set of features (6% prediction
errors for 40K configurations). 21
RQ3+4: Stability of influential options and Training time
reduction
Using an ensemble of Random
Forest allows the creation of a far
more stable list, with more than 95%
common features in top 300 between
multiple list
Tree-based feature selection speeds
the model training at least 5 times
up to 48 times (since p’ <<<< p)
22
RQ5: How do feature ranking lists, as computed by tree-based
feature selection, relate to Linux knowledge? Top
influential
options
147 documented
options in Kconfig
0 - 50 7
50 - 250 6
250 - 500 6
500 - 1500 28
1500 - 69
Top 50 options in the feature ranking list represents 95% of the feature
importance; collinearity and interpretability: beware!
Incompleteness of Linux documentation:
● Vast majority of influential options is either not documented or not
referring to size: only 7 options of the top 50 are documented as
having a clear influence on size
● Leveraging all the 147 options in the Linux documentation (and
only them) leads to prediction error of 23.6% (instead of <6% for
our feature ranking list)
Relevance: Investigations and exchanges with domain experts confirm
the relevance of the top 50, giving 6 categories of options.
Effective identification of important features:
● consistent with Linux knowledge (Kconfig documentation and
expert insight)
● can be used to refine or augment the incomplete
documentation of the Linux kernel.
23
Kaggle competition using our dataset
https://www.kaggle.com/competitions/linux-kernel-size/overview
24
We can benefit from contributions of the
machine learning community…
And our dataset/problems are raising interests.
Conclusion Feature subset selection is effective over
the huge configuration space of Linux:
● only ~300 features out of 9K+
● accuracy is better with than without tree-based
● training time is decreased
● interpretability: identification of influential options is consistent with, and can
even improve, the expert knowledge about Linux kernel configuration
Future work
● Replication on different versions of Linux
● Does feature ranking list transfer to other versions?
https://www.kaggle.com/competitions/linux-kernel-size/overview
25
26
Computation time
27
Decision Tree
● Ability to handle interactions between features
● Low impact of combinatorial explosion
● Competitive accuracy
● Interpretability
○ Decision rules
○ Feature importance
● Ensembles : Random Forests, Gradient Boosting Trees...
○ More accurate, less interpretable
28
Kpredict
Python module for Python 3.8+ ( https://github.com/HugoJPMartin/kpredict )
Works for many kernel versions and any configuration x86_64
Error : ≃ 6.3%
97% of the predictions are below 20% error
H. Martin, M. Acher, J. A. Pereira, L. Lesoil, J. Jézéquel and D. E. Khelladi, “Transfer learning across variants and versions: The
case of linux kernel size” Transactions on Software Engineering (TSE), 2021 29
Published at IEEE Transactions on Software
Engineering (TSE) in 2021
Preprint: https://hal.inria.fr/hal-03358817
30
Linux
Kernel
31
Backup / Draft slides
32
Transfer learning
“Inductive transfer refers to any algorithmic process by which structure or
knowledge derived from a learning problem is used to enhance learning on a
related problem.” - Jeremy West in A theoretical foundation for inductive transfer
● 100.000 configuration measurements, 15.000 hours of computation
● Mission Impossible : Saving Private Model 4.13
○ Budget : 5.000 configurations measurements (one night worth of ISTIC computing power)
33
Model 4.13: Genesis
34
Model 4.13: Genesis
f1
f2
f3
... fn
1 0 0 ... 1
0 1 0 ... 0
... ... ... ... ...
1 1 1 ... 0
Size
16MB
52MB
...
115MB
35
Model 4.13: Genesis
f1
f2
f3
... fn
1 0 0 ... 1
0 1 0 ... 0
... ... ... ... ...
1 1 1 ... 0
Size
16MB
52MB
...
115MB
36
Model 4.13: Genesis
f1
f2
f3
... fn
1 0 0 ... 1
0 1 0 ... 0
... ... ... ... ...
1 1 1 ... 0
Size
16MB
52MB
...
115MB
Gradient Boosting
Tree algorithm
Features Target
37
Model 4.13: Genesis
f1
f2
f3
... fn
1 0 0 ... 1
0 1 0 ... 0
... ... ... ... ...
1 1 1 ... 0
Size
16MB
52MB
...
115MB
Gradient Boosting
Tree algorithm
Features Target
Model 4.13
38
Model 4.13: Genesis
f1
f2
f3
... fn
1 0 0 ... 1
0 1 0 ... 0
... ... ... ... ...
1 1 1 ... 0
Size
16MB
52MB
...
115MB
Gradient Boosting
Tree algorithm
Features Target
Model 4.13
f1
f2
f3
... fn
0 1 1 ... 0
1 0 0 ... 1
... ... ... ... ...
1 0 1 ... 0
39
Model 4.13: Genesis
f1
f2
f3
... fn
1 0 0 ... 1
0 1 0 ... 0
... ... ... ... ...
1 1 1 ... 0
Size
16MB
52MB
...
115MB
Gradient Boosting
Tree algorithm
Features Target
Model 4.13
Size
18MB
25MB
...
228MB
Predict
f1
f2
f3
... fn
0 1 1 ... 0
1 0 0 ... 1
... ... ... ... ...
1 0 1 ... 0
40
Model 4.13: Genesis
f1
f2
f3
... fn
1 0 0 ... 1
0 1 0 ... 0
... ... ... ... ...
1 1 1 ... 0
Size
16MB
52MB
...
115MB
Gradient Boosting
Tree algorithm
Features Target
Model 4.13
Size
18MB
25MB
...
228MB
Predict
✅
✅
✅
f1
f2
f3
... fn
0 1 1 ... 0
1 0 0 ... 1
... ... ... ... ...
1 0 1 ... 0
41
Model Shifting
42
Model Shifting
f1
f2
f3
... fn
1 0 0 ... 1
0 1 0 ... 0
... ... ... ... ...
1 1 1 ... 0
Size
22MB
68MB
...
105MB
Gradient Boosting
Tree algorithm
Features Target
Model 4.15
43
Model Shifting
f1
f2
f3
... fn
1 0 0 ... 1
0 1 0 ... 0
... ... ... ... ...
1 1 1 ... 0
Size
22MB
68MB
...
105MB
Gradient Boosting
Tree algorithm
Features Target
Model 4.15
Size
19MB
26MB
...
298MB
Predict
❌
❌
✅
f1
f2
f3
... fn
0 1 1 ... 0
1 0 0 ... 1
... ... ... ... ...
1 0 1 ... 0
44
Model Shifting
f1
f2
f3
... fn
1 0 0 ... 1
0 1 0 ... 0
... ... ... ... ...
1 1 1 ... 0
Size
22MB
68MB
...
105MB
Gradient Boosting
Tree algorithm
Features Target
Model 4.15
Size
19MB
26MB
...
298MB
Predict
❌
❌
✅
f1
f2
f3
... fn
0 1 1 ... 0
1 0 0 ... 1
... ... ... ... ...
1 0 1 ... 0
Model 4.13
45
Model Shifting
f1
f2
f3
... fn
1 0 0 ... 1
0 1 0 ... 0
... ... ... ... ...
1 1 1 ... 0
Size
22MB
68MB
...
105MB
Gradient Boosting
Tree algorithm
Features Target
Model 4.15
f1
f2
f3
... fn
0 1 1 ... 0
1 0 0 ... 1
... ... ... ... ...
1 0 1 ... 0
Size
19MB
26MB
...
298MB
Predict
❌
❌
✅
Model 4.13
Old Size
16MB
52MB
...
115MB
Old Size
18MB
25MB
...
228MB
Predict
Predict
46
Model Shifting
f1
f2
f3
... fn
1 0 0 ... 1
0 1 0 ... 0
... ... ... ... ...
1 1 1 ... 0
Size
22MB
68MB
...
105MB
Gradient Boosting
Tree algorithm
Features Target
Shifting Model
4.15
f1
f2
f3
... fn
0 1 1 ... 0
1 0 0 ... 1
... ... ... ... ...
1 0 1 ... 0
Size
21MB
35MB
...
298MB
Predict
✅
✅
✅
Model 4.13
Old Size
16MB
52MB
...
115MB
Old Size
18MB
25MB
...
228MB
Predict
Predict
47
Results
48
Results
Budget : 5.000 configurations
● Model shifting :
○ From 5.6% to 7.1% error rate
● Scratch :
○ From 8.2% to 9.2% error rate
49
Results
Budget : 5.000 configurations
● Model shifting :
○ From 5.6% to 7.1% error rate
● Scratch :
○ From 8.2% to 9.2% error rate
Budget : 10.000 configurations
● Model shifting :
○ From 5.2% to 6.1% error rate
● Scratch :
○ From 7.1% to 7.7% error rate
50
Results
Budget : 1.000 configurations
● Model shifting :
○ From 6.7% to 10.6% error rate
● Scratch :
○ From 14.9% to 16.7% error rate
Budget : 5.000 configurations
● Model shifting :
○ From 5.6% to 7.1% error rate
● Scratch :
○ From 8.2% to 9.2% error rate
Budget : 10.000 configurations
● Model shifting :
○ From 5.2% to 6.1% error rate
● Scratch :
○ From 7.1% to 7.7% error rate
51
Incremental Model Shifting
52
Incremental Model Shifting
Model 4.13 + Shifting Model 4.15 = Model 4.15
Model 4.13 + Shifting Model 4.20 = Model 4.20
Model 4.13 + Shifting Model 5.0 = Model 5.0
Model 4.13 + Shifting Model 5.4 = Model 5.4
Model 4.13 + Shifting Model 5.7 = Model 5.7
Model 4.13 + Shifting Model 5.8 = Model 5.8
Source + Shifting Model = Full Model
Simple Model Shifting
53
Incremental Model Shifting
Model 4.13 + Shifting Model 4.15 = Model 4.15
Model 4.13 + Shifting Model 4.20 = Model 4.20
Model 4.13 + Shifting Model 5.0 = Model 5.0
Model 4.13 + Shifting Model 5.4 = Model 5.4
Model 4.13 + Shifting Model 5.7 = Model 5.7
Model 4.13 + Shifting Model 5.8 = Model 5.8
Source + Shifting Model = Full Model
Simple Model Shifting
Model 4.13 + Shifting Model 4.15 = Model 4.15
Model 4.13 + Shifting Model 4.20 = Model 4.20
Source + Shifting Model = Full Model
Incremental Model Shifting
54
Incremental Model Shifting
Model 4.13 + Shifting Model 4.15 = Model 4.15
Model 4.13 + Shifting Model 4.20 = Model 4.20
Model 4.13 + Shifting Model 5.0 = Model 5.0
Model 4.13 + Shifting Model 5.4 = Model 5.4
Model 4.13 + Shifting Model 5.7 = Model 5.7
Model 4.13 + Shifting Model 5.8 = Model 5.8
Source + Shifting Model = Full Model
Simple Model Shifting
Model 4.13 + Shifting Model 4.15 = Model 4.15
Model 4.15 + Shifting Model 4.20 = Model 4.20
Source + Shifting Model = Full Model
9
Incremental Model Shifting
55
Incremental Model Shifting
Model 4.13 + Shifting Model 4.15 = Model 4.15
Model 4.13 + Shifting Model 4.20 = Model 4.20
Model 4.13 + Shifting Model 5.0 = Model 5.0
Model 4.13 + Shifting Model 5.4 = Model 5.4
Model 4.13 + Shifting Model 5.7 = Model 5.7
Model 4.13 + Shifting Model 5.8 = Model 5.8
Source + Shifting Model = Full Model
Simple Model Shifting
Model 4.13 + Shifting Model 4.15 = Model 4.15
Model 4.15 + Shifting Model 4.20 = Model 4.20
Model 4.20 + Shifting Model 5.0 = Model 5.0
Model 5.0 + Shifting Model 5.4 = Model 5.4
Model 5.4 + Shifting Model 5.7 = Model 5.7
Model 5.7 + Shifting Model 5.8 = Model 5.8
Source + Shifting Model = Full Model
9
Incremental Model Shifting
56
10
57
Results
Budget : 5.000 configurations
● Model shifting :
○ From 5.6% to 7.1% error rate
● Scratch :
○ From 8.2% to 9.2% error rate
● Incremental Shifting :
○ From 5.6% to 7.5%
10
58
Results
Budget : 5.000 configurations
● Model shifting :
○ From 5.6% to 7.1% error rate
● Scratch :
○ From 8.2% to 9.2% error rate
● Incremental Shifting :
○ From 5.6% to 7.5%
Budget : 10.000 configurations
● Model shifting :
○ From 5.2% to 6.1% error rate
● Scratch :
○ From 7.1% to 7.7% error rate
● Incremental Shifting :
○ From 5.2% to 6.5%
10
59
Results
Budget : 1.000 configurations
● Model shifting :
○ From 6.7% to 10.6% error rate
● Scratch :
○ From 14.9% to 16.7% error rate
● Incremental Shifting :
○ From 6.7% to 13.3%
Budget : 5.000 configurations
● Model shifting :
○ From 5.6% to 7.1% error rate
● Scratch :
○ From 8.2% to 9.2% error rate
● Incremental Shifting :
○ From 5.6% to 7.5%
Budget : 10.000 configurations
● Model shifting :
○ From 5.2% to 6.1% error rate
● Scratch :
○ From 7.1% to 7.7% error rate
● Incremental Shifting :
○ From 5.2% to 6.5%
10
60
Results
Budget : 1.000 configurations
● Model shifting :
○ From 6.7% to 10.6% error rate
● Scratch :
○ From 14.9% to 16.7% error rate
● Incremental Shifting :
○ From 6.7% to 13.3%
Budget : 5.000 configurations
● Model shifting :
○ From 5.6% to 7.1% error rate
● Scratch :
○ From 8.2% to 9.2% error rate
● Incremental Shifting :
○ From 5.6% to 7.5%
Budget : 10.000 configurations
● Model shifting :
○ From 5.2% to 6.1% error rate
● Scratch :
○ From 7.1% to 7.7% error rate
● Incremental Shifting :
○ From 5.2% to 6.5%
Model 4.13 Budget : 85.000 configurations
10
61
Results
Budget : 1.000 configurations
● Model shifting :
○ From 6.7% to 10.6% error rate
● Scratch :
○ From 14.9% to 16.7% error rate
● Incremental Shifting :
○ From 6.7% to 13.3%
Budget : 5.000 configurations
● Model shifting :
○ From 5.6% to 7.1% error rate
● Scratch :
○ From 8.2% to 9.2% error rate
● Incremental Shifting :
○ From 5.6% to 7.5%
Budget : 10.000 configurations
● Model shifting :
○ From 5.2% to 6.1% error rate
● Scratch :
○ From 7.1% to 7.7% error rate
● Incremental Shifting :
○ From 5.2% to 6.5%
Model 4.13 Budget : 85.000 configurations
Model 4.13 Budget : 20.000 configurations
10
62
Results
Budget : 1.000 configurations
● Model shifting :
○ From 6.7% to 10.6% error rate
● Scratch :
○ From 14.9% to 16.7% error rate
● Incremental Shifting :
○ From 6.7% to 13.3%
Budget : 5.000 configurations
● Model shifting :
○ From 5.6% to 7.1% error rate
● Scratch :
○ From 8.2% to 9.2% error rate
● Incremental Shifting :
○ From 5.6% to 7.5%
Budget : 10.000 configurations
● Model shifting :
○ From 5.2% to 6.1% error rate
● Scratch :
○ From 7.1% to 7.7% error rate
● Incremental Shifting :
○ From 5.2% to 6.5%
Model 4.13 Budget : 85.000 configurations
Model 4.13 Budget : 20.000 configurations
Budget : 5.000 configurations
● Model shifting :
○ From 6.7% to 7.9% error rate
● Scratch :
○ From 8.2% to 9.2% error rate
● Incremental Shifting :
○ From 6.7% to 7.9%
10
63
Results
Budget : 1.000 configurations
● Model shifting :
○ From 6.7% to 10.6% error rate
● Scratch :
○ From 14.9% to 16.7% error rate
● Incremental Shifting :
○ From 6.7% to 13.3%
Budget : 5.000 configurations
● Model shifting :
○ From 5.6% to 7.1% error rate
● Scratch :
○ From 8.2% to 9.2% error rate
● Incremental Shifting :
○ From 5.6% to 7.5%
Budget : 10.000 configurations
● Model shifting :
○ From 5.2% to 6.1% error rate
● Scratch :
○ From 7.1% to 7.7% error rate
● Incremental Shifting :
○ From 5.2% to 6.5%
Model 4.13 Budget : 85.000 configurations
Model 4.13 Budget : 20.000 configurations
Budget : 5.000 configurations
● Model shifting :
○ From 6.7% to 7.9% error rate
● Scratch :
○ From 8.2% to 9.2% error rate
● Incremental Shifting :
○ From 6.7% to 7.9%
Budget : 10.000 configurations
● Model shifting :
○ From 6.2% to 6.7% error rate
● Scratch :
○ From 7.1% to 7.7% error rate
● Incremental Shifting :
○ From 6.1% to 6.7%
10
64
Results
Budget : 1.000 configurations
● Model shifting :
○ From 6.7% to 10.6% error rate
● Scratch :
○ From 14.9% to 16.7% error rate
● Incremental Shifting :
○ From 6.7% to 13.3%
Budget : 5.000 configurations
● Model shifting :
○ From 5.6% to 7.1% error rate
● Scratch :
○ From 8.2% to 9.2% error rate
● Incremental Shifting :
○ From 5.6% to 7.5%
Budget : 10.000 configurations
● Model shifting :
○ From 5.2% to 6.1% error rate
● Scratch :
○ From 7.1% to 7.7% error rate
● Incremental Shifting :
○ From 5.2% to 6.5%
Model 4.13 Budget : 85.000 configurations
Model 4.13 Budget : 20.000 configurations
Budget : 1.000 configurations
● Model shifting :
○ From 8.5% to 11.6% error rate
● Scratch :
○ From 14.9% to 16.7% error rate
● Incremental Shifting :
○ From 8.5% to 13.8%
Budget : 5.000 configurations
● Model shifting :
○ From 6.7% to 7.9% error rate
● Scratch :
○ From 8.2% to 9.2% error rate
● Incremental Shifting :
○ From 6.7% to 7.9%
Budget : 10.000 configurations
● Model shifting :
○ From 6.2% to 6.7% error rate
● Scratch :
○ From 7.1% to 7.7% error rate
● Incremental Shifting :
○ From 6.1% to 6.7%
10
65
Summary
● Model 4.13 is saved
○ Positively reuse old model on new version at lower cost
○ Better than learning from scratch for years
● Incremental Shifting
○ More sensible to previous models error
○ Better use of more transfer budget
11
66
Kpredict
Python module for Python 3.8+ ( https://github.com/HugoJPMartin/kpredict )
Error : ≃ 6.3%
97% of the predictions are below 20% error
12
67
68

More Related Content

Similar to Feature Subset Selection for Learning Huge Configuration Spaces: The case of Linux Kernel Size (20)

PDF
optimizing_ceph_flash
Vijayendra Shamanna
 
PPT
Power 7 Overview
lambertt
 
PDF
[Pgday.Seoul 2018] PostgreSQL 성능을 위해 개발된 라이브러리 OS 소개 apposha
PgDay.Seoul
 
PDF
Hypertable Nosql
elliando dias
 
PDF
Hypertable
betaisao
 
PPT
[ppt]
butest
 
PDF
Achieving Performance Isolation with Lightweight Co-Kernels
Jiannan Ouyang, PhD
 
PPTX
MySQL Tech Tour 2015 - 5.7 Whats new
Mark Swarbrick
 
PDF
OSDC 2017 | Open POWER for the data center by Werner Fischer
NETWAYS
 
PDF
OSDC 2017 | Linux Performance Profiling and Monitoring by Werner Fischer
NETWAYS
 
PDF
OSDC 2017 - Werner Fischer - Open power for the data center
NETWAYS
 
PPT
Parallel_and_Cluster_Computing.ppt
MohmdUmer
 
PDF
Intel Parallel Studio XE 2016 網路開發工具包新版本功能介紹(現已上市,歡迎詢價)
Cheer Chain Enterprise Co., Ltd.
 
PDF
Advanced High-Performance Computing Features of the OpenPOWER ISA
Ganesan Narayanasamy
 
PPTX
Re invent announcements_2016_hcls_use_cases_mchampion
Mia D Champion
 
PPTX
Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Community
 
PDF
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
OpenStack Korea Community
 
PDF
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
confluent
 
PDF
Linux one vs x86
Diego Rodriguez
 
optimizing_ceph_flash
Vijayendra Shamanna
 
Power 7 Overview
lambertt
 
[Pgday.Seoul 2018] PostgreSQL 성능을 위해 개발된 라이브러리 OS 소개 apposha
PgDay.Seoul
 
Hypertable Nosql
elliando dias
 
Hypertable
betaisao
 
[ppt]
butest
 
Achieving Performance Isolation with Lightweight Co-Kernels
Jiannan Ouyang, PhD
 
MySQL Tech Tour 2015 - 5.7 Whats new
Mark Swarbrick
 
OSDC 2017 | Open POWER for the data center by Werner Fischer
NETWAYS
 
OSDC 2017 | Linux Performance Profiling and Monitoring by Werner Fischer
NETWAYS
 
OSDC 2017 - Werner Fischer - Open power for the data center
NETWAYS
 
Parallel_and_Cluster_Computing.ppt
MohmdUmer
 
Intel Parallel Studio XE 2016 網路開發工具包新版本功能介紹(現已上市,歡迎詢價)
Cheer Chain Enterprise Co., Ltd.
 
Advanced High-Performance Computing Features of the OpenPOWER ISA
Ganesan Narayanasamy
 
Re invent announcements_2016_hcls_use_cases_mchampion
Mia D Champion
 
Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Community
 
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
OpenStack Korea Community
 
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
confluent
 
Linux one vs x86
Diego Rodriguez
 

More from University of Rennes, INSA Rennes, Inria/IRISA, CNRS (20)

PDF
Embracing Deep Variability For Reproducibility and Replicability
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
PDF
Deep Software Variability and Frictionless Reproducibility
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
PDF
A Demonstration of End-User Code Customization Using Generative AI
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
PDF
24 Reasons Why Variability Models Are Not Yet Universal (24RWVMANYU)
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
PDF
On Programming Variability with Large Language Model-based Assistant
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
PDF
Generative AI for Reengineering Variants into Software Product Lines: An Expe...
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
PDF
Tackling Deep Software Variability Together
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
PDF
On anti-cheating in chess, science, reproducibility, and variability
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
PDF
Machine Learning and Deep Software Variability
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
PDF
Mastering Software Variability for Innovation and Science
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
PDF
Reproducible Science and Deep Software Variability
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
PDF
Software Variability and Artificial Intelligence
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
PDF
Teaching Software Product Lines: A Snapshot of Current Practices and Challenges
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
PDF
Exploiting the Enumeration of All Feature Model Configurations: A New Perspec...
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
PDF
Assessing Product Line Derivation Operators Applied to Java Source Code: An E...
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
PDF
Synthesis of Attributed Feature Models From Product Descriptions
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
PDF
From Basic Variability Models to OpenCompare.org
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
PDF
Pandoc: a universal document converter
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
PDF
Metamorphic Domain-Specific Languages
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
PDF
3D Printing, Customization, and Product Lines
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Embracing Deep Variability For Reproducibility and Replicability
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Deep Software Variability and Frictionless Reproducibility
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
A Demonstration of End-User Code Customization Using Generative AI
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
24 Reasons Why Variability Models Are Not Yet Universal (24RWVMANYU)
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
On Programming Variability with Large Language Model-based Assistant
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Generative AI for Reengineering Variants into Software Product Lines: An Expe...
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Tackling Deep Software Variability Together
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
On anti-cheating in chess, science, reproducibility, and variability
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Machine Learning and Deep Software Variability
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Mastering Software Variability for Innovation and Science
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Reproducible Science and Deep Software Variability
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Software Variability and Artificial Intelligence
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Teaching Software Product Lines: A Snapshot of Current Practices and Challenges
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Exploiting the Enumeration of All Feature Model Configurations: A New Perspec...
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Assessing Product Line Derivation Operators Applied to Java Source Code: An E...
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Synthesis of Attributed Feature Models From Product Descriptions
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
From Basic Variability Models to OpenCompare.org
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Pandoc: a universal document converter
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Metamorphic Domain-Specific Languages
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
3D Printing, Customization, and Product Lines
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Ad

Recently uploaded (20)

PDF
Primordial Black Holes and the First Stars
Sérgio Sacani
 
PPTX
Vectors and applications of genetic engineering Pptx
Ashwini I Chuncha
 
PPTX
Pratik inorganic chemistry silicon based ppt
akshaythaker18
 
PDF
Adding Geochemistry To Understand Recharge Areas - Kinney County, Texas - Jim...
Texas Alliance of Groundwater Districts
 
PPT
Cell cycle,cell cycle checkpoint and control
DrMukeshRameshPimpli
 
PPTX
LESSON 2 PSYCHOSOCIAL DEVELOPMENT.pptx L
JeanCarolColico1
 
PDF
The-Origin- of -Metazoa-vertebrates .ppt
S.B.P.G. COLLEGE BARAGAON VARANASI
 
PPTX
How to write a research paper July 3 2025.pptx
suneeta panicker
 
PPTX
Animal Reproductive Behaviors Quiz Presentation in Maroon Brown Flat Graphic ...
LynetteGaniron1
 
PDF
Insect Behaviour : Patterns And Determinants
SheikhArshaqAreeb
 
PPTX
Lamarckism is one of the earliest theories of evolution, proposed before Darw...
Laxman Khatal
 
PDF
The role of the Lorentz force in sunspot equilibrium
Sérgio Sacani
 
PDF
Step-by-Step Guide: How mRNA Vaccines Works
TECNIC
 
PDF
High-speedBouldersandtheDebrisFieldinDARTEjecta
Sérgio Sacani
 
PDF
A young gas giant and hidden substructures in a protoplanetary disk
Sérgio Sacani
 
PDF
crestacean parasitim non chordates notes
S.B.P.G. COLLEGE BARAGAON VARANASI
 
PPTX
Q1 - W1 - D2 - Models of matter for science.pptx
RyanCudal3
 
PDF
GK_GS One Liner For Competitive Exam.pdf
abhi01nm
 
PPTX
Phage Therapy and Bacteriophage Biology.pptx
Prachi Virat
 
PPTX
Akshay tunneling .pptx_20250331_165945_0000.pptx
akshaythaker18
 
Primordial Black Holes and the First Stars
Sérgio Sacani
 
Vectors and applications of genetic engineering Pptx
Ashwini I Chuncha
 
Pratik inorganic chemistry silicon based ppt
akshaythaker18
 
Adding Geochemistry To Understand Recharge Areas - Kinney County, Texas - Jim...
Texas Alliance of Groundwater Districts
 
Cell cycle,cell cycle checkpoint and control
DrMukeshRameshPimpli
 
LESSON 2 PSYCHOSOCIAL DEVELOPMENT.pptx L
JeanCarolColico1
 
The-Origin- of -Metazoa-vertebrates .ppt
S.B.P.G. COLLEGE BARAGAON VARANASI
 
How to write a research paper July 3 2025.pptx
suneeta panicker
 
Animal Reproductive Behaviors Quiz Presentation in Maroon Brown Flat Graphic ...
LynetteGaniron1
 
Insect Behaviour : Patterns And Determinants
SheikhArshaqAreeb
 
Lamarckism is one of the earliest theories of evolution, proposed before Darw...
Laxman Khatal
 
The role of the Lorentz force in sunspot equilibrium
Sérgio Sacani
 
Step-by-Step Guide: How mRNA Vaccines Works
TECNIC
 
High-speedBouldersandtheDebrisFieldinDARTEjecta
Sérgio Sacani
 
A young gas giant and hidden substructures in a protoplanetary disk
Sérgio Sacani
 
crestacean parasitim non chordates notes
S.B.P.G. COLLEGE BARAGAON VARANASI
 
Q1 - W1 - D2 - Models of matter for science.pptx
RyanCudal3
 
GK_GS One Liner For Competitive Exam.pdf
abhi01nm
 
Phage Therapy and Bacteriophage Biology.pptx
Prachi Virat
 
Akshay tunneling .pptx_20250331_165945_0000.pptx
akshaythaker18
 
Ad

Feature Subset Selection for Learning Huge Configuration Spaces: The case of Linux Kernel Size