Trackster Pruning at the CMS High-Granularity Calorimeter

Yousef Fadila, Manu Agarwal
PDEng Software Technology & CERN
Trackster Pruning at the CMS High-Granularity Calorimeter

Goal and Deliverables
• Goal: For the each layer cluster [2D object] of the Trackster [3D object] we would like to
assign a weight which will indicate the likelihood for this layer cluster to belong to the
same particle or is contaminated by another. How this weight will/can be used is kind of a
longer term project.
• Deliverables:
1. Reproducible code. (Jupyter Notebooks)
2. Example of porting one trained model to C.
3. Final Report.
4. Final Presentation.
 All deliverables will be sent on e-mail by December 11, 2019 17:00 EST
2

Data Representation and Approaches
3

Data Description
• Number of layer clusters: 32274
• Pileup/Signal (purity =0,1): 25963
• Not pileup (purity =2): 6311
• Number of Events: 100
• Number of Trackster: 225
• Number of particles per event: 1
4

Approaches for Data Representation
• Layer-cluster level
• Identify the pileup hits using layer-cluster data only. (ignore Event,
Trackster fields).
• Simple representation.
• Extended Layer-cluster level
• Representation as layer-cluster records with aggregation from
preceding/succeeding layers.(Trackster level with locality focus)
• Trackster level
• Identify the pileup hits using full or partial Trackster information as
one input
• Representation as image, sequence or graph.
5

Data Representation I – Layer-cluster Level
• Represent as single records.
• Remove correlated fields, trackster and event level fields.
Pros:
• Simple and Fast.
• Can easily be used with traditional ML
Cons:
• Lost of information from nearby layers
• Decision is made locally based on x, y, z, E, nHits only.
6

Data Representation II – Extended Layer-cluster Level
• Represent as single records.
• Aggregate information from previous/next layers on the same Trackster
Pros:
• Simple
• Can easily be used with traditional ML
Cons:
• Lost of information from far layers
• Require feature engineering and domain knowledge.
7
Example of Newly Generated
Features
RatioSiblingNHits
RatioNextNHits
RatioPrevNHits
RatioE
RatioNextE
RatioPrevE
RatioNext2E
RatioPrev2E

Data Representation III – Sequence
• Represent as sequence of fixed/varying length
• Add layer clusters from previous/next layers to the current point of interest layer cluster
Pros:
• Using recurrent neural network allow taking into account past and future layer-cluster
information before making the classification decision.
Cons:
• Sequence impose order. There is no instinct order between layer clusters belong to the
same layer.
• Complex representation
8

Data Representation III – Sequence(2)
• Sequence Creation:
• For each layer_cluster, find its nearest neighbors
in the immediate predecessor layer
(layer_cluster.layer - 1), and in the immediate
successor layer (layer_cluster.layer + 1)
• For each layer_cluster, create a sequence of
length 'seq_len'.
• In such a sequence, current layer_cluster is
placed between the nearest neighbors from
the predecessor and successor layers.
9

Data Representation V – Trackster Level: Graph
• Represent each Trackster as a graph.
• Nodes are the layerclusters and edges connect layerclusters through “layer” field.
Pros:
• No lost of information. All data on all layer clusters are available on the decision point.
• In different of sequences, No order is imposed between layer clusters on the same layer
Cons:
• Complex representation and slow convergence.
• New emerging field without industrial proven record.
• Many variation and tuning parameters
10

Data Representation V – Trackster Level: Graph(2)
• Graph creation:
• Represent each Trackster as fully connected graph between layers.
• Embed all Tracksters into one matrix.
11
Trackster Q
Trackster V

Data Representation VI – Trackster Level: 3D Image
• Uses spherical coordinate system
• Each layer is thought to be a shell of a sphere in.
• The points on these shells correspond to a pileup or not pileup
Pros:
• Capture all topological and geometrical aspects of the Trackster.
Cons:
• Sparsity of Data – Hard to converge.
• Complexity grows exponentially with respect to add parameters.
12

Data Representation - Summary
1. Layer-cluster level – Records
2. Extended Layer-cluster level – Records
3. Extended Layer-cluster/Trackster level – Sequence
4. Trackster level – Fully Connected Graph
5. Trackster level – 3D image.
• We did not implement a method that uses this
representation.
13

Methods applied to Layer-cluster Level Representation
1. Random Forest
2. Multi-layer Perceptron
3. Support-vector classifier
4. XGBClassifier (gradient boosting)
For each method:
• We run a grid-search to optimize hyper parameters
• We report accuracy, F1, confusion matrix and AUC – ROC curve performance
15

Layer-cluster Level Methods Performance Summary
16
Accuracy score F1 score
RF 0.946 0.862
MLP 0.924 0.801
SVC 0.923 0.79
XGB 0.945 0.859
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
RF MLP SVC XGB
Layer-cluster level Performance Summary

RF/MLP/SVC/XGB Stacking
• Generate a new dataset of the output of RF/MLP/SVC/XGB.
• Train XGB on the new dataset (the meta model)
17
RF MLP SVC XGB
1 1 0 1
1 0 0 1
0 0 1 0
0 0 0 0
1 0 0 0
1 1 1 1
0 0 0 0
• All 4 classifiers give the same result for
21649 records.(93% of the train data)
• The 4 classifiers result did not match
for 1644 records (7% of the train data)
• Stacking using XGBClassifier as meta
model.
• Accuracy Score: 0.948
• F1 Score: 0.867
• Confusion matrix: [5028 152]
[ 182 1093]

Methods applied to Extended Layer-cluster Level Repr.
• Add new features and use Random Forest to rank them
• Remove low-rank features and run these methods:
1. Random Forest
2. Multi-layer Perceptron
3. Support-vector classifier
4. XGBClassifier (gradient boosting)
18

Extended Layer-cluster Level Methods Summary
19
RF 0.917 0.776
MLP 0.923 0.795
SVC 0.901 0.749
XGB 0.933 0.827
Stacking 0.933 0.826
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
RF MLP SVC XGB Stacking
Extended Layer-cluster level Methods Summary

Methods Applied to Sequence Representation
1. LSTM on three-length sequences
2. 1D-CNN on three-length sequences
3. LSTM on five-length sequences
4. 1D-CNN on five-length sequences
20

LSTM on Sequences (Three-length Sequence)
• Setup:
• For each element in a sequence:
• Drop: ['eta','phi','layer','trckPhi','trckEn','trckEta‘, 'trckType‘, 'event','trackster']
• keep x, y, z,E, nHits features.
• 1 LSTM layer with 32 units, followed by a dense layer with sigmoid activation.
• Test accuracy 0.91
Test F1 score 0.77
Testing confusion_matrix
Pile up (purity 0): [5894 193]
NO Pile up (purity 1): [420 1020]
21

ROC Curve : LSTM on Sequences (Three-length Sequence)
22

1D-CNN on Sequences (Three-length Sequence)
• Setup:
• 1D CNN layer with 32 filters, kernel size 3,followed by a dense layer with sigmoid
activation.
Test F1 score 0.78
23

ROC Curve : 1D-CNN on Sequences (Three-length Sequence)
24

LSTM on Sequences (Five-length Sequence)
• Setup:
• 1 LSTM layer with 32 units, followed by a dense layer with sigmoid activation.
Test F1 score 0.76
25

ROC Curve : LSTM on Sequences (Five-length Sequence)
26

1D-CNN on Sequences (Five-length Sequence)
• Setup:
• 1D CNN layer with 32 filters, kernel size 3,followed by a dense layer with sigmoid
activation.
Test F1 score 0.77
27

ROC Curve : 1D-CNN on Sequences (Five-length Sequence)
28

Sequence Representation Methods Performance Summary
29
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
LSTM (3 Length
Sequence )
1-D CNN (3 Length
Sequence )
LSTM (5 Length
Sequence )
1-D CNN (5 Length
Sequence )
Sequence Level Methods Summary

Methods applied to Trackster Level – Graph Representation
• Different architectures for different purposes – Graph classification, Node classification,
Edge predication etc…
• Survey: https://paperswithcode.com/task/graph-classification
https://github.com/benedekrozemberczki/awesome-graph-classification
• For Pile-up detection, we choose to try two architectures:
1. Graph Convolutional Neural Network
• https://arxiv.org/abs/1609.02907
2. Adaptive sampling for graph representation learning
• https://arxiv.org/abs/1809.05343
30

Graph Representation Methods Performance Summary
Setup
• 32274 nodes — represents layer-clusters belongs to 225 Trackster
• 514,910 edges— every layer-cluster in the Trackster is full connected to all previous layer
layer-cluster
• Default hyper parameter values (hidden layers 1, batch size 32, etc..)
Result
• Test accuracy of Graph Convolutional Neural Network: 82.20%
• Test accuracy of Adaptive sampling for graph representation learning: 72.40%
31

Discussion, Recommendations and Future Work
32

Discussion, Recommendations and Future Work
1. We used the small dataset of the simple use-case (<250 Tracksters; each event of 1
particle). The benefit of adding new features in the extended layer-cluster representation
or using more complex 3D representation can’t be measured well in such a case. In our
trials, the performance was worse than the simple 2D version. This could be due to
scarcity of data as the model need more data points to compensates for the extra
complexity of introducing new features.
2. In Extended Layer-cluster level representation, we calculate ratios based on previous/next
layer aggregation. Instead of grouping by layer, Try to group by Euclidian distance on X,Y,Z
< ɛ as we did in sequence representation.
• For example, RatioENext which is defined as layercluster[‘E’]/SUM(E,layerclusters in the next layer ) will
be defined as layercluster[‘E’]/SUM (E, layerclusters within radius ɛ in the next layer )
• The same idea is applicable to a graph representation. Instead of connecting each layerclusters to all
layerclusters in the previous layer, we can connect it only to the nearest ones
33

Discussion, Recommendations and Future Work II
3. GNN tries to learn the classification based on neighbors — layer cluster in previous, next
layers. We hypothesis that using ANN (MLP for example) with extended layer-cluster
level presentation can provide a faster and simpler alternative. The major disadvantage is
that a feature-engineering phase is required to add insightful features — based on how
neighbors’ properties affect the classifications of a layer cluster.
• Using GNN does not require such a phase and the model will try to learn the relations by
itself.
4. The data we used has a discrete target type (pileup or pure) and not purity percentage or
likelihood indicators. Therefore we solved a classification problem and not a regression
one. Under these constraints, the only way to assign a likelihood weight for a layer
cluster is by using the probability of being of type pure.
34

Discussion, Recommendations and Future Work III
5. During the project, we tried a stacking approach with layer-cluster level representation. Based
on the distribution of pile-ups on the 3D space [figure 1],
we hypnosis that a pipeline approach, in which 1st step is
cleaning-up pile-ups with high-confidence layer-cluster level
approach and to use a Trackster level approach as a 2nd step
for the rest layer clusters could the benefits of the two
worlds by reducing the complexity of Trackster level
representation and make the methods (e.g. GNN)
faster to converge.
6. We recommended generating a list of at least 1000 Tracksters for training and 300 Tracksters
(using different simulation setup) as a validation set. and to re-run all methods again.
35
Figure 1

Appendix I: Python Notebooks Descriptions
• All results can be reproduced using delivered python notebooks.
• AggregatedFeatures-RF-MLP-XGB-SVC-Stacking.ipynb — Implementation of Layer-cluster level presentation methods
• NoAggregatedFeatures-RF-MLP-XGB-SVC-Stacking.ipynb — Implementation of Extended Layer-cluster presentation methods
• SequenceModels.ipynb — Implementation of Sequence presentation methods ( LSTM and 1DCNN)
• AdaptiveSamplingGraphConvNN.ipynb — Adaptive Sampling Towards Fast Graph Representation Learning
• GraphConvNN.ipynb — Implementation of Graph Convolution Neural Network
• Porting_to_c_example.ipynb — An example of porting a trained sklearn model to C
36

Appendix II: References
1. Graph Convolutional Neural Network (https://arxiv.org/abs/1609.02907)
2. Adaptive sampling for graph representation learning (https://arxiv.org/abs/1809.05343)
3. Pileup Mitigation with Machine Learning (PUMML) (https://arxiv.org/abs/1707.08600)
4. Deep learning in color: towards automated quark/gluon jet discrimination (https://arxiv.org/abs/1612.01551)
5. Jet-Images – Computer Vision Inspired Techniques for Jet Tagging (https://arxiv.org/abs/1407.5675)
6. Jet-Images – Deep Learning Edition (https://arxiv.org/abs/1511.05190)
7. Pileup mitigation at the Large Hadron Collider with Graph Neural Networks (https://arxiv.org/abs/1810.07988)
8. TensorFlow implementations of Graph Neural Networks: (https://github.com/microsoft/tf-gnn-samples)
9. PyTorch implementations of Graph Neural Networks: (https://github.com/dmlc/dgl)
10. Code Repository (https://bit.ly/38cJRQg)
37

Trackster Pruning at the CMS High-Granularity Calorimeter

More Related Content

What's hot (20)

Similar to Trackster Pruning at the CMS High-Granularity Calorimeter (20)

More from Yousef Fadila (13)

Recently uploaded (20)

Trackster Pruning at the CMS High-Granularity Calorimeter