SlideShare a Scribd company logo
Yousef Fadila, Manu Agarwal
PDEng Software Technology & CERN
Trackster Pruning at the CMS High-Granularity Calorimeter
Goal and Deliverables
• Goal: For the each layer cluster [2D object] of the Trackster [3D object] we would like to
assign a weight which will indicate the likelihood for this layer cluster to belong to the
same particle or is contaminated by another. How this weight will/can be used is kind of a
longer term project.
• Deliverables:
1. Reproducible code. (Jupyter Notebooks)
2. Example of porting one trained model to C.
3. Final Report.
4. Final Presentation.
 All deliverables will be sent on e-mail by December 11, 2019 17:00 EST
2
Data Representation and Approaches
3
Data Description
• Number of layer clusters: 32274
• Pileup/Signal (purity =0,1): 25963
• Not pileup (purity =2): 6311
• Number of Events: 100
• Number of Trackster: 225
• Number of particles per event: 1
4
Approaches for Data Representation
• Layer-cluster level
• Identify the pileup hits using layer-cluster data only. (ignore Event,
Trackster fields).
• Simple representation.
• Extended Layer-cluster level
• Representation as layer-cluster records with aggregation from
preceding/succeeding layers.(Trackster level with locality focus)
• Trackster level
• Identify the pileup hits using full or partial Trackster information as
one input
• Representation as image, sequence or graph.
5
Data Representation I – Layer-cluster Level
• Represent as single records.
• Remove correlated fields, trackster and event level fields.
Pros:
• Simple and Fast.
• Can easily be used with traditional ML
Cons:
• Lost of information from nearby layers
• Decision is made locally based on x, y, z, E, nHits only.
6
Data Representation II – Extended Layer-cluster Level
• Represent as single records.
• Aggregate information from previous/next layers on the same Trackster
Pros:
• Simple
• Can easily be used with traditional ML
Cons:
• Lost of information from far layers
• Require feature engineering and domain knowledge.
7
Example of Newly Generated
Features
RatioSiblingNHits
RatioNextNHits
RatioPrevNHits
RatioE
RatioNextE
RatioPrevE
RatioNext2E
RatioPrev2E
Data Representation III – Sequence
• Represent as sequence of fixed/varying length
• Add layer clusters from previous/next layers to the current point of interest layer cluster
Pros:
• Using recurrent neural network allow taking into account past and future layer-cluster
information before making the classification decision.
Cons:
• Sequence impose order. There is no instinct order between layer clusters belong to the
same layer.
• Complex representation
8
Data Representation III – Sequence(2)
• Sequence Creation:
• For each layer_cluster, find its nearest neighbors
in the immediate predecessor layer
(layer_cluster.layer - 1), and in the immediate
successor layer (layer_cluster.layer + 1)
• For each layer_cluster, create a sequence of
length 'seq_len'.
• In such a sequence, current layer_cluster is
placed between the nearest neighbors from
the predecessor and successor layers.
9
Data Representation V – Trackster Level: Graph
• Represent each Trackster as a graph.
• Nodes are the layerclusters and edges connect layerclusters through “layer” field.
Pros:
• No lost of information. All data on all layer clusters are available on the decision point.
• In different of sequences, No order is imposed between layer clusters on the same layer
Cons:
• Complex representation and slow convergence.
• New emerging field without industrial proven record.
• Many variation and tuning parameters
10
Data Representation V – Trackster Level: Graph(2)
• Graph creation:
• Represent each Trackster as fully connected graph between layers.
• Embed all Tracksters into one matrix.
11
Trackster Q
Trackster V
Data Representation VI – Trackster Level: 3D Image
• Uses spherical coordinate system
• Each layer is thought to be a shell of a sphere in.
• The points on these shells correspond to a pileup or not pileup
Pros:
• Capture all topological and geometrical aspects of the Trackster.
Cons:
• Sparsity of Data – Hard to converge.
• Complexity grows exponentially with respect to add parameters.
12
Data Representation - Summary
1. Layer-cluster level – Records
2. Extended Layer-cluster level – Records
3. Extended Layer-cluster/Trackster level – Sequence
4. Trackster level – Fully Connected Graph
5. Trackster level – 3D image.
• We did not implement a method that uses this
representation.
13
Methods and Results
14
Methods applied to Layer-cluster Level Representation
1. Random Forest
2. Multi-layer Perceptron
3. Support-vector classifier
4. XGBClassifier (gradient boosting)
For each method:
• We run a grid-search to optimize hyper parameters
• We report accuracy, F1, confusion matrix and AUC – ROC curve performance
15
Layer-cluster Level Methods Performance Summary
16
Accuracy score F1 score
RF 0.946 0.862
MLP 0.924 0.801
SVC 0.923 0.79
XGB 0.945 0.859
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
RF MLP SVC XGB
Layer-cluster level Performance Summary
Accuracy score F1 score
RF/MLP/SVC/XGB Stacking
• Generate a new dataset of the output of RF/MLP/SVC/XGB.
• Train XGB on the new dataset (the meta model)
17
RF MLP SVC XGB
1 1 0 1
1 0 0 1
0 0 1 0
0 0 0 0
1 0 0 0
1 1 1 1
0 0 0 0
• All 4 classifiers give the same result for
21649 records.(93% of the train data)
• The 4 classifiers result did not match
for 1644 records (7% of the train data)
• Stacking using XGBClassifier as meta
model.
• Accuracy Score: 0.948
• F1 Score: 0.867
• Confusion matrix: [5028 152]
[ 182 1093]
Methods applied to Extended Layer-cluster Level Repr.
• Add new features and use Random Forest to rank them
• Remove low-rank features and run these methods:
1. Random Forest
2. Multi-layer Perceptron
3. Support-vector classifier
4. XGBClassifier (gradient boosting)
18
Extended Layer-cluster Level Methods Summary
19
Accuracy score F1 score
RF 0.917 0.776
MLP 0.923 0.795
SVC 0.901 0.749
XGB 0.933 0.827
Stacking 0.933 0.826
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
RF MLP SVC XGB Stacking
Extended Layer-cluster level Methods Summary
Accuracy score F1 score
Methods Applied to Sequence Representation
1. LSTM on three-length sequences
2. 1D-CNN on three-length sequences
3. LSTM on five-length sequences
4. 1D-CNN on five-length sequences
20
LSTM on Sequences (Three-length Sequence)
• Setup:
• For each element in a sequence:
• Drop: ['eta','phi','layer','trckPhi','trckEn','trckEta‘, 'trckType‘, 'event','trackster']
• keep x, y, z,E, nHits features.
• 1 LSTM layer with 32 units, followed by a dense layer with sigmoid activation.
• Test accuracy 0.91
Test F1 score 0.77
Testing confusion_matrix
Pile up (purity 0): [5894 193]
NO Pile up (purity 1): [420 1020]
21
ROC Curve : LSTM on Sequences (Three-length Sequence)
22
1D-CNN on Sequences (Three-length Sequence)
• Setup:
• For each element in a sequence:
• Drop: ['eta','phi','layer','trckPhi','trckEn','trckEta‘, 'trckType‘, 'event','trackster']
• keep x, y, z,E, nHits features.
• 1D CNN layer with 32 filters, kernel size 3,followed by a dense layer with sigmoid
activation.
• Test accuracy 0.92
Test F1 score 0.78
Testing confusion_matrix
Pile up (purity 0): [5874 213]
NO Pile up (purity 1): [371 1069]
23
ROC Curve : 1D-CNN on Sequences (Three-length Sequence)
24
LSTM on Sequences (Five-length Sequence)
• Setup:
• For each element in a sequence:
• Drop: ['eta','phi','layer','trckPhi','trckEn','trckEta‘, 'trckType‘, 'event','trackster']
• keep x, y, z,E, nHits features.
• 1 LSTM layer with 32 units, followed by a dense layer with sigmoid activation.
• Test accuracy 0.92
Test F1 score 0.76
Testing confusion_matrix
Pile up (purity 0): [5650 269]
NO Pile up (purity 1): [292 893]
25
ROC Curve : LSTM on Sequences (Five-length Sequence)
26
1D-CNN on Sequences (Five-length Sequence)
• Setup:
• For each element in a sequence:
• Drop: ['eta','phi','layer','trckPhi','trckEn','trckEta‘, 'trckType‘, 'event','trackster']
• keep x, y, z,E, nHits features.
• 1D CNN layer with 32 filters, kernel size 3,followed by a dense layer with sigmoid
activation.
• Test accuracy 0.92
Test F1 score 0.77
Testing confusion_matrix
Pile up (purity 0): [5670 249]
NO Pile up (purity 1): [275 910]
27
ROC Curve : 1D-CNN on Sequences (Five-length Sequence)
28
Sequence Representation Methods Performance Summary
29
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
LSTM (3 Length
Sequence )
1-D CNN (3 Length
Sequence )
LSTM (5 Length
Sequence )
1-D CNN (5 Length
Sequence )
Sequence Level Methods Summary
Accuracy score F1 score
Methods applied to Trackster Level – Graph Representation
• Different architectures for different purposes – Graph classification, Node classification,
Edge predication etc…
• Survey: https://paperswithcode.com/task/graph-classification
https://github.com/benedekrozemberczki/awesome-graph-classification
• For Pile-up detection, we choose to try two architectures:
1. Graph Convolutional Neural Network
• https://arxiv.org/abs/1609.02907
2. Adaptive sampling for graph representation learning
• https://arxiv.org/abs/1809.05343
30
Graph Representation Methods Performance Summary
Setup
• 32274 nodes — represents layer-clusters belongs to 225 Trackster
• 514,910 edges— every layer-cluster in the Trackster is full connected to all previous layer
layer-cluster
• Default hyper parameter values (hidden layers 1, batch size 32, etc..)
Result
• Test accuracy of Graph Convolutional Neural Network: 82.20%
• Test accuracy of Adaptive sampling for graph representation learning: 72.40%
31
Discussion, Recommendations and Future Work
32
Discussion, Recommendations and Future Work
1. We used the small dataset of the simple use-case (<250 Tracksters; each event of 1
particle). The benefit of adding new features in the extended layer-cluster representation
or using more complex 3D representation can’t be measured well in such a case. In our
trials, the performance was worse than the simple 2D version. This could be due to
scarcity of data as the model need more data points to compensates for the extra
complexity of introducing new features.
2. In Extended Layer-cluster level representation, we calculate ratios based on previous/next
layer aggregation. Instead of grouping by layer, Try to group by Euclidian distance on X,Y,Z
< ɛ as we did in sequence representation.
• For example, RatioENext which is defined as layercluster[‘E’]/SUM(E,layerclusters in the next layer ) will
be defined as layercluster[‘E’]/SUM (E, layerclusters within radius ɛ in the next layer )
• The same idea is applicable to a graph representation. Instead of connecting each layerclusters to all
layerclusters in the previous layer, we can connect it only to the nearest ones
33
Discussion, Recommendations and Future Work II
3. GNN tries to learn the classification based on neighbors — layer cluster in previous, next
layers. We hypothesis that using ANN (MLP for example) with extended layer-cluster
level presentation can provide a faster and simpler alternative. The major disadvantage is
that a feature-engineering phase is required to add insightful features — based on how
neighbors’ properties affect the classifications of a layer cluster.
• Using GNN does not require such a phase and the model will try to learn the relations by
itself.
4. The data we used has a discrete target type (pileup or pure) and not purity percentage or
likelihood indicators. Therefore we solved a classification problem and not a regression
one. Under these constraints, the only way to assign a likelihood weight for a layer
cluster is by using the probability of being of type pure.
34
Discussion, Recommendations and Future Work III
5. During the project, we tried a stacking approach with layer-cluster level representation. Based
on the distribution of pile-ups on the 3D space [figure 1],
we hypnosis that a pipeline approach, in which 1st step is
cleaning-up pile-ups with high-confidence layer-cluster level
approach and to use a Trackster level approach as a 2nd step
for the rest layer clusters could the benefits of the two
worlds by reducing the complexity of Trackster level
representation and make the methods (e.g. GNN)
faster to converge.
6. We recommended generating a list of at least 1000 Tracksters for training and 300 Tracksters
(using different simulation setup) as a validation set. and to re-run all methods again.
35
Figure 1
Appendix I: Python Notebooks Descriptions
• All results can be reproduced using delivered python notebooks.
• AggregatedFeatures-RF-MLP-XGB-SVC-Stacking.ipynb — Implementation of Layer-cluster level presentation methods
• NoAggregatedFeatures-RF-MLP-XGB-SVC-Stacking.ipynb — Implementation of Extended Layer-cluster presentation methods
• SequenceModels.ipynb — Implementation of Sequence presentation methods ( LSTM and 1DCNN)
• AdaptiveSamplingGraphConvNN.ipynb — Adaptive Sampling Towards Fast Graph Representation Learning
• GraphConvNN.ipynb — Implementation of Graph Convolution Neural Network
• Porting_to_c_example.ipynb — An example of porting a trained sklearn model to C
36
Appendix II: References
1. Graph Convolutional Neural Network (https://arxiv.org/abs/1609.02907)
2. Adaptive sampling for graph representation learning (https://arxiv.org/abs/1809.05343)
3. Pileup Mitigation with Machine Learning (PUMML) (https://arxiv.org/abs/1707.08600)
4. Deep learning in color: towards automated quark/gluon jet discrimination (https://arxiv.org/abs/1612.01551)
5. Jet-Images – Computer Vision Inspired Techniques for Jet Tagging (https://arxiv.org/abs/1407.5675)
6. Jet-Images – Deep Learning Edition (https://arxiv.org/abs/1511.05190)
7. Pileup mitigation at the Large Hadron Collider with Graph Neural Networks (https://arxiv.org/abs/1810.07988)
8. TensorFlow implementations of Graph Neural Networks: (https://github.com/microsoft/tf-gnn-samples)
9. PyTorch implementations of Graph Neural Networks: (https://github.com/dmlc/dgl)
10. Code Repository (https://bit.ly/38cJRQg)
37
Thank You!
38

More Related Content

PDF
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
NamHyuk Ahn
 
PPTX
Convolutional Neural Network and RNN for OCR problem.
Vishal Mishra
 
PPTX
Deep learning and its application
Srishty Saha
 
PDF
Recurrent Instance Segmentation (UPC Reading Group)
Universitat Politècnica de Catalunya
 
PDF
Multimodal Residual Learning for Visual QA
NamHyuk Ahn
 
PDF
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
Universitat Politècnica de Catalunya
 
PDF
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 
PDF
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Universitat Politècnica de Catalunya
 
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
NamHyuk Ahn
 
Convolutional Neural Network and RNN for OCR problem.
Vishal Mishra
 
Deep learning and its application
Srishty Saha
 
Recurrent Instance Segmentation (UPC Reading Group)
Universitat Politècnica de Catalunya
 
Multimodal Residual Learning for Visual QA
NamHyuk Ahn
 
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
Universitat Politècnica de Catalunya
 
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Universitat Politècnica de Catalunya
 

What's hot (20)

PPTX
ABayesianApproachToLocalizedMultiKernelLearningUsingTheRelevanceVectorMachine...
grssieee
 
PDF
RNN and its applications
Sungjoon Choi
 
PDF
Lifelong / Incremental Deep Learning - Ramon Morros - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
PPTX
Convolutional neural networks
Roozbeh Sanaei
 
PDF
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
Universitat Politècnica de Catalunya
 
PDF
Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks
Universitat Politècnica de Catalunya
 
PPTX
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Universitat Politècnica de Catalunya
 
PDF
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
PDF
Deep Learning for Computer Vision: Segmentation (UPC 2016)
Universitat Politècnica de Catalunya
 
PDF
D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 
PPT
Lec 6-bp
Taymoor Nazmy
 
PDF
PR-284: End-to-End Object Detection with Transformers(DETR)
Jinwon Lee
 
PDF
Exploring Simple Siamese Representation Learning
Sungchul Kim
 
PPT
(Slides) A Method for Distributed Computaion of Semi-Optimal Multicast Tree i...
Naoki Shibata
 
PDF
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
Jinwon Lee
 
PDF
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
Jinwon Lee
 
PDF
Mask-RCNN for Instance Segmentation
Dat Nguyen
 
PDF
Two strategies for large-scale multi-label classification on the YouTube-8M d...
Dalei Li
 
PDF
Deep Learning for Computer Vision: Backward Propagation (UPC 2016)
Universitat Politècnica de Catalunya
 
PDF
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Universitat Politècnica de Catalunya
 
ABayesianApproachToLocalizedMultiKernelLearningUsingTheRelevanceVectorMachine...
grssieee
 
RNN and its applications
Sungjoon Choi
 
Lifelong / Incremental Deep Learning - Ramon Morros - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
Convolutional neural networks
Roozbeh Sanaei
 
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
Universitat Politècnica de Catalunya
 
Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks
Universitat Politècnica de Catalunya
 
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Universitat Politècnica de Catalunya
 
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision: Segmentation (UPC 2016)
Universitat Politècnica de Catalunya
 
D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 
Lec 6-bp
Taymoor Nazmy
 
PR-284: End-to-End Object Detection with Transformers(DETR)
Jinwon Lee
 
Exploring Simple Siamese Representation Learning
Sungchul Kim
 
(Slides) A Method for Distributed Computaion of Semi-Optimal Multicast Tree i...
Naoki Shibata
 
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
Jinwon Lee
 
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
Jinwon Lee
 
Mask-RCNN for Instance Segmentation
Dat Nguyen
 
Two strategies for large-scale multi-label classification on the YouTube-8M d...
Dalei Li
 
Deep Learning for Computer Vision: Backward Propagation (UPC 2016)
Universitat Politècnica de Catalunya
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Universitat Politècnica de Catalunya
 
Ad

Similar to Trackster Pruning at the CMS High-Granularity Calorimeter (20)

PDF
Graph convolutional networks in apache spark
Emiliano Martinez Sanchez
 
PDF
Hardware Acceleration for Machine Learning
CastLabKAIST
 
PDF
Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...
Naoki (Neo) SATO
 
PDF
Graph Neural Network in practice
tuxette
 
PDF
_AI_Stanford_Super_#DeepLearning_Cheat_Sheet!_😊🙃😀🙃😊.pdf
SongsDrizzle
 
PDF
super-cheatsheet-deep-learning.pdf
DeanSchoolofElectron
 
PPTX
Deep learning summary
ankit_ppt
 
PDF
Neural Networks in the Wild: Handwriting Recognition
John Liu
 
PDF
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Databricks
 
PDF
0b85886e-4490-4af0-8b46-7ff3caf5dc2e.pdf
phailinpsp
 
PPTX
Anomaly Detection with Azure and .net
Marco Parenzan
 
PDF
L03.pdf
TRNHONGLINHBCHCM
 
PDF
Text cnn on acme ugc moderation
Marsan Ma
 
PPTX
Cv mini project (1)
Kadambini Indurkar
 
PPTX
진동데이터 활용 충돌체 탐지 AI 경진대회 1등
DACON AI 데이콘
 
PDF
Cheatsheet convolutional-neural-networks
Steve Nouri
 
PPTX
Anomaly Detection with Azure and .NET
Marco Parenzan
 
PDF
Deep-Learning-with-PydddddddddddddTorch.pdf
drjigarsoni28
 
PDF
Memory Efficient Graph Convolutional Network based Distributed Link Prediction
miyurud
 
PDF
DriverPack Solution Download Full ISO
alihamzakpa093
 
Graph convolutional networks in apache spark
Emiliano Martinez Sanchez
 
Hardware Acceleration for Machine Learning
CastLabKAIST
 
Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...
Naoki (Neo) SATO
 
Graph Neural Network in practice
tuxette
 
_AI_Stanford_Super_#DeepLearning_Cheat_Sheet!_😊🙃😀🙃😊.pdf
SongsDrizzle
 
super-cheatsheet-deep-learning.pdf
DeanSchoolofElectron
 
Deep learning summary
ankit_ppt
 
Neural Networks in the Wild: Handwriting Recognition
John Liu
 
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Databricks
 
0b85886e-4490-4af0-8b46-7ff3caf5dc2e.pdf
phailinpsp
 
Anomaly Detection with Azure and .net
Marco Parenzan
 
Text cnn on acme ugc moderation
Marsan Ma
 
Cv mini project (1)
Kadambini Indurkar
 
진동데이터 활용 충돌체 탐지 AI 경진대회 1등
DACON AI 데이콘
 
Cheatsheet convolutional-neural-networks
Steve Nouri
 
Anomaly Detection with Azure and .NET
Marco Parenzan
 
Deep-Learning-with-PydddddddddddddTorch.pdf
drjigarsoni28
 
Memory Efficient Graph Convolutional Network based Distributed Link Prediction
miyurud
 
DriverPack Solution Download Full ISO
alihamzakpa093
 
Ad

More from Yousef Fadila (13)

PDF
Synergy on the Blockchain! whitepaper
Yousef Fadila
 
PDF
Synergy Platform Whitepaper alpha
Yousef Fadila
 
PDF
Recommandation systems -
Yousef Fadila
 
PPTX
Analysis on steam platform
Yousef Fadila
 
PPTX
interactive voting based map matching algorithm
Yousef Fadila
 
PPTX
co-Hadoop: Data co-location on Hadoop.
Yousef Fadila
 
PPTX
Spot deceptive TripAdvisor Reviews
Yousef Fadila
 
PPTX
Textual & Sentiment Analysis of Movie Reviews
Yousef Fadila
 
PPTX
Anomaly Detection - Catch me if you can
Yousef Fadila
 
PPTX
Tweeting for Hillary - DS 501 case study 1
Yousef Fadila
 
PPTX
CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Project 1
Yousef Fadila
 
PPT
Innovative thinking التفكير الابداعي
Yousef Fadila
 
PDF
Am i overpaying - business proposal
Yousef Fadila
 
Synergy on the Blockchain! whitepaper
Yousef Fadila
 
Synergy Platform Whitepaper alpha
Yousef Fadila
 
Recommandation systems -
Yousef Fadila
 
Analysis on steam platform
Yousef Fadila
 
interactive voting based map matching algorithm
Yousef Fadila
 
co-Hadoop: Data co-location on Hadoop.
Yousef Fadila
 
Spot deceptive TripAdvisor Reviews
Yousef Fadila
 
Textual & Sentiment Analysis of Movie Reviews
Yousef Fadila
 
Anomaly Detection - Catch me if you can
Yousef Fadila
 
Tweeting for Hillary - DS 501 case study 1
Yousef Fadila
 
CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Project 1
Yousef Fadila
 
Innovative thinking التفكير الابداعي
Yousef Fadila
 
Am i overpaying - business proposal
Yousef Fadila
 

Recently uploaded (20)

PPTX
IP_Journal_Articles_2025IP_Journal_Articles_2025
mishell212144
 
PPT
From Vision to Reality: The Digital India Revolution
Harsh Bharvadiya
 
PDF
D9110.pdfdsfvsdfvsdfvsdfvfvfsvfsvffsdfvsdfvsd
minhn6673
 
PPTX
Introduction to computer chapter one 2017.pptx
mensunmarley
 
PDF
Blitz Campinas - Dia 24 de maio - Piettro.pdf
fabigreek
 
PDF
Blue Futuristic Cyber Security Presentation.pdf
tanvikhunt1003
 
PPTX
The whitetiger novel review for collegeassignment.pptx
DhruvPatel754154
 
PPTX
Pipeline Automatic Leak Detection for Water Distribution Systems
Sione Palu
 
PPTX
Introduction to Biostatistics Presentation.pptx
AtemJoshua
 
PDF
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
PPTX
White Blue Simple Modern Enhancing Sales Strategy Presentation_20250724_21093...
RamNeymarjr
 
PDF
TIC ACTIVIDAD 1geeeeeeeeeeeeeeeeeeeeeeeeeeeeeer3.pdf
Thais Ruiz
 
PDF
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
PPTX
Data-Driven Machine Learning for Rail Infrastructure Health Monitoring
Sione Palu
 
PPTX
Fuzzy_Membership_Functions_Presentation.pptx
pythoncrazy2024
 
PPTX
INFO8116 - Week 10 - Slides.pptx data analutics
guddipatel10
 
PPTX
Presentation on animal welfare a good topic
kidscream385
 
PPTX
HSE WEEKLY REPORT for dummies and lazzzzy.pptx
ahmedibrahim691723
 
PPTX
lecture 13 mind test academy it skills.pptx
ggesjmrasoolpark
 
PPTX
Multiscale Segmentation of Survey Respondents: Seeing the Trees and the Fores...
Sione Palu
 
IP_Journal_Articles_2025IP_Journal_Articles_2025
mishell212144
 
From Vision to Reality: The Digital India Revolution
Harsh Bharvadiya
 
D9110.pdfdsfvsdfvsdfvsdfvfvfsvfsvffsdfvsdfvsd
minhn6673
 
Introduction to computer chapter one 2017.pptx
mensunmarley
 
Blitz Campinas - Dia 24 de maio - Piettro.pdf
fabigreek
 
Blue Futuristic Cyber Security Presentation.pdf
tanvikhunt1003
 
The whitetiger novel review for collegeassignment.pptx
DhruvPatel754154
 
Pipeline Automatic Leak Detection for Water Distribution Systems
Sione Palu
 
Introduction to Biostatistics Presentation.pptx
AtemJoshua
 
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
White Blue Simple Modern Enhancing Sales Strategy Presentation_20250724_21093...
RamNeymarjr
 
TIC ACTIVIDAD 1geeeeeeeeeeeeeeeeeeeeeeeeeeeeeer3.pdf
Thais Ruiz
 
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
Data-Driven Machine Learning for Rail Infrastructure Health Monitoring
Sione Palu
 
Fuzzy_Membership_Functions_Presentation.pptx
pythoncrazy2024
 
INFO8116 - Week 10 - Slides.pptx data analutics
guddipatel10
 
Presentation on animal welfare a good topic
kidscream385
 
HSE WEEKLY REPORT for dummies and lazzzzy.pptx
ahmedibrahim691723
 
lecture 13 mind test academy it skills.pptx
ggesjmrasoolpark
 
Multiscale Segmentation of Survey Respondents: Seeing the Trees and the Fores...
Sione Palu
 

Trackster Pruning at the CMS High-Granularity Calorimeter

  • 1. Yousef Fadila, Manu Agarwal PDEng Software Technology & CERN Trackster Pruning at the CMS High-Granularity Calorimeter
  • 2. Goal and Deliverables • Goal: For the each layer cluster [2D object] of the Trackster [3D object] we would like to assign a weight which will indicate the likelihood for this layer cluster to belong to the same particle or is contaminated by another. How this weight will/can be used is kind of a longer term project. • Deliverables: 1. Reproducible code. (Jupyter Notebooks) 2. Example of porting one trained model to C. 3. Final Report. 4. Final Presentation.  All deliverables will be sent on e-mail by December 11, 2019 17:00 EST 2
  • 3. Data Representation and Approaches 3
  • 4. Data Description • Number of layer clusters: 32274 • Pileup/Signal (purity =0,1): 25963 • Not pileup (purity =2): 6311 • Number of Events: 100 • Number of Trackster: 225 • Number of particles per event: 1 4
  • 5. Approaches for Data Representation • Layer-cluster level • Identify the pileup hits using layer-cluster data only. (ignore Event, Trackster fields). • Simple representation. • Extended Layer-cluster level • Representation as layer-cluster records with aggregation from preceding/succeeding layers.(Trackster level with locality focus) • Trackster level • Identify the pileup hits using full or partial Trackster information as one input • Representation as image, sequence or graph. 5
  • 6. Data Representation I – Layer-cluster Level • Represent as single records. • Remove correlated fields, trackster and event level fields. Pros: • Simple and Fast. • Can easily be used with traditional ML Cons: • Lost of information from nearby layers • Decision is made locally based on x, y, z, E, nHits only. 6
  • 7. Data Representation II – Extended Layer-cluster Level • Represent as single records. • Aggregate information from previous/next layers on the same Trackster Pros: • Simple • Can easily be used with traditional ML Cons: • Lost of information from far layers • Require feature engineering and domain knowledge. 7 Example of Newly Generated Features RatioSiblingNHits RatioNextNHits RatioPrevNHits RatioE RatioNextE RatioPrevE RatioNext2E RatioPrev2E
  • 8. Data Representation III – Sequence • Represent as sequence of fixed/varying length • Add layer clusters from previous/next layers to the current point of interest layer cluster Pros: • Using recurrent neural network allow taking into account past and future layer-cluster information before making the classification decision. Cons: • Sequence impose order. There is no instinct order between layer clusters belong to the same layer. • Complex representation 8
  • 9. Data Representation III – Sequence(2) • Sequence Creation: • For each layer_cluster, find its nearest neighbors in the immediate predecessor layer (layer_cluster.layer - 1), and in the immediate successor layer (layer_cluster.layer + 1) • For each layer_cluster, create a sequence of length 'seq_len'. • In such a sequence, current layer_cluster is placed between the nearest neighbors from the predecessor and successor layers. 9
  • 10. Data Representation V – Trackster Level: Graph • Represent each Trackster as a graph. • Nodes are the layerclusters and edges connect layerclusters through “layer” field. Pros: • No lost of information. All data on all layer clusters are available on the decision point. • In different of sequences, No order is imposed between layer clusters on the same layer Cons: • Complex representation and slow convergence. • New emerging field without industrial proven record. • Many variation and tuning parameters 10
  • 11. Data Representation V – Trackster Level: Graph(2) • Graph creation: • Represent each Trackster as fully connected graph between layers. • Embed all Tracksters into one matrix. 11 Trackster Q Trackster V
  • 12. Data Representation VI – Trackster Level: 3D Image • Uses spherical coordinate system • Each layer is thought to be a shell of a sphere in. • The points on these shells correspond to a pileup or not pileup Pros: • Capture all topological and geometrical aspects of the Trackster. Cons: • Sparsity of Data – Hard to converge. • Complexity grows exponentially with respect to add parameters. 12
  • 13. Data Representation - Summary 1. Layer-cluster level – Records 2. Extended Layer-cluster level – Records 3. Extended Layer-cluster/Trackster level – Sequence 4. Trackster level – Fully Connected Graph 5. Trackster level – 3D image. • We did not implement a method that uses this representation. 13
  • 15. Methods applied to Layer-cluster Level Representation 1. Random Forest 2. Multi-layer Perceptron 3. Support-vector classifier 4. XGBClassifier (gradient boosting) For each method: • We run a grid-search to optimize hyper parameters • We report accuracy, F1, confusion matrix and AUC – ROC curve performance 15
  • 16. Layer-cluster Level Methods Performance Summary 16 Accuracy score F1 score RF 0.946 0.862 MLP 0.924 0.801 SVC 0.923 0.79 XGB 0.945 0.859 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 RF MLP SVC XGB Layer-cluster level Performance Summary Accuracy score F1 score
  • 17. RF/MLP/SVC/XGB Stacking • Generate a new dataset of the output of RF/MLP/SVC/XGB. • Train XGB on the new dataset (the meta model) 17 RF MLP SVC XGB 1 1 0 1 1 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 1 1 1 1 0 0 0 0 • All 4 classifiers give the same result for 21649 records.(93% of the train data) • The 4 classifiers result did not match for 1644 records (7% of the train data) • Stacking using XGBClassifier as meta model. • Accuracy Score: 0.948 • F1 Score: 0.867 • Confusion matrix: [5028 152] [ 182 1093]
  • 18. Methods applied to Extended Layer-cluster Level Repr. • Add new features and use Random Forest to rank them • Remove low-rank features and run these methods: 1. Random Forest 2. Multi-layer Perceptron 3. Support-vector classifier 4. XGBClassifier (gradient boosting) 18
  • 19. Extended Layer-cluster Level Methods Summary 19 Accuracy score F1 score RF 0.917 0.776 MLP 0.923 0.795 SVC 0.901 0.749 XGB 0.933 0.827 Stacking 0.933 0.826 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 RF MLP SVC XGB Stacking Extended Layer-cluster level Methods Summary Accuracy score F1 score
  • 20. Methods Applied to Sequence Representation 1. LSTM on three-length sequences 2. 1D-CNN on three-length sequences 3. LSTM on five-length sequences 4. 1D-CNN on five-length sequences 20
  • 21. LSTM on Sequences (Three-length Sequence) • Setup: • For each element in a sequence: • Drop: ['eta','phi','layer','trckPhi','trckEn','trckEta‘, 'trckType‘, 'event','trackster'] • keep x, y, z,E, nHits features. • 1 LSTM layer with 32 units, followed by a dense layer with sigmoid activation. • Test accuracy 0.91 Test F1 score 0.77 Testing confusion_matrix Pile up (purity 0): [5894 193] NO Pile up (purity 1): [420 1020] 21
  • 22. ROC Curve : LSTM on Sequences (Three-length Sequence) 22
  • 23. 1D-CNN on Sequences (Three-length Sequence) • Setup: • For each element in a sequence: • Drop: ['eta','phi','layer','trckPhi','trckEn','trckEta‘, 'trckType‘, 'event','trackster'] • keep x, y, z,E, nHits features. • 1D CNN layer with 32 filters, kernel size 3,followed by a dense layer with sigmoid activation. • Test accuracy 0.92 Test F1 score 0.78 Testing confusion_matrix Pile up (purity 0): [5874 213] NO Pile up (purity 1): [371 1069] 23
  • 24. ROC Curve : 1D-CNN on Sequences (Three-length Sequence) 24
  • 25. LSTM on Sequences (Five-length Sequence) • Setup: • For each element in a sequence: • Drop: ['eta','phi','layer','trckPhi','trckEn','trckEta‘, 'trckType‘, 'event','trackster'] • keep x, y, z,E, nHits features. • 1 LSTM layer with 32 units, followed by a dense layer with sigmoid activation. • Test accuracy 0.92 Test F1 score 0.76 Testing confusion_matrix Pile up (purity 0): [5650 269] NO Pile up (purity 1): [292 893] 25
  • 26. ROC Curve : LSTM on Sequences (Five-length Sequence) 26
  • 27. 1D-CNN on Sequences (Five-length Sequence) • Setup: • For each element in a sequence: • Drop: ['eta','phi','layer','trckPhi','trckEn','trckEta‘, 'trckType‘, 'event','trackster'] • keep x, y, z,E, nHits features. • 1D CNN layer with 32 filters, kernel size 3,followed by a dense layer with sigmoid activation. • Test accuracy 0.92 Test F1 score 0.77 Testing confusion_matrix Pile up (purity 0): [5670 249] NO Pile up (purity 1): [275 910] 27
  • 28. ROC Curve : 1D-CNN on Sequences (Five-length Sequence) 28
  • 29. Sequence Representation Methods Performance Summary 29 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 LSTM (3 Length Sequence ) 1-D CNN (3 Length Sequence ) LSTM (5 Length Sequence ) 1-D CNN (5 Length Sequence ) Sequence Level Methods Summary Accuracy score F1 score
  • 30. Methods applied to Trackster Level – Graph Representation • Different architectures for different purposes – Graph classification, Node classification, Edge predication etc… • Survey: https://paperswithcode.com/task/graph-classification https://github.com/benedekrozemberczki/awesome-graph-classification • For Pile-up detection, we choose to try two architectures: 1. Graph Convolutional Neural Network • https://arxiv.org/abs/1609.02907 2. Adaptive sampling for graph representation learning • https://arxiv.org/abs/1809.05343 30
  • 31. Graph Representation Methods Performance Summary Setup • 32274 nodes — represents layer-clusters belongs to 225 Trackster • 514,910 edges— every layer-cluster in the Trackster is full connected to all previous layer layer-cluster • Default hyper parameter values (hidden layers 1, batch size 32, etc..) Result • Test accuracy of Graph Convolutional Neural Network: 82.20% • Test accuracy of Adaptive sampling for graph representation learning: 72.40% 31
  • 33. Discussion, Recommendations and Future Work 1. We used the small dataset of the simple use-case (<250 Tracksters; each event of 1 particle). The benefit of adding new features in the extended layer-cluster representation or using more complex 3D representation can’t be measured well in such a case. In our trials, the performance was worse than the simple 2D version. This could be due to scarcity of data as the model need more data points to compensates for the extra complexity of introducing new features. 2. In Extended Layer-cluster level representation, we calculate ratios based on previous/next layer aggregation. Instead of grouping by layer, Try to group by Euclidian distance on X,Y,Z < ɛ as we did in sequence representation. • For example, RatioENext which is defined as layercluster[‘E’]/SUM(E,layerclusters in the next layer ) will be defined as layercluster[‘E’]/SUM (E, layerclusters within radius ɛ in the next layer ) • The same idea is applicable to a graph representation. Instead of connecting each layerclusters to all layerclusters in the previous layer, we can connect it only to the nearest ones 33
  • 34. Discussion, Recommendations and Future Work II 3. GNN tries to learn the classification based on neighbors — layer cluster in previous, next layers. We hypothesis that using ANN (MLP for example) with extended layer-cluster level presentation can provide a faster and simpler alternative. The major disadvantage is that a feature-engineering phase is required to add insightful features — based on how neighbors’ properties affect the classifications of a layer cluster. • Using GNN does not require such a phase and the model will try to learn the relations by itself. 4. The data we used has a discrete target type (pileup or pure) and not purity percentage or likelihood indicators. Therefore we solved a classification problem and not a regression one. Under these constraints, the only way to assign a likelihood weight for a layer cluster is by using the probability of being of type pure. 34
  • 35. Discussion, Recommendations and Future Work III 5. During the project, we tried a stacking approach with layer-cluster level representation. Based on the distribution of pile-ups on the 3D space [figure 1], we hypnosis that a pipeline approach, in which 1st step is cleaning-up pile-ups with high-confidence layer-cluster level approach and to use a Trackster level approach as a 2nd step for the rest layer clusters could the benefits of the two worlds by reducing the complexity of Trackster level representation and make the methods (e.g. GNN) faster to converge. 6. We recommended generating a list of at least 1000 Tracksters for training and 300 Tracksters (using different simulation setup) as a validation set. and to re-run all methods again. 35 Figure 1
  • 36. Appendix I: Python Notebooks Descriptions • All results can be reproduced using delivered python notebooks. • AggregatedFeatures-RF-MLP-XGB-SVC-Stacking.ipynb — Implementation of Layer-cluster level presentation methods • NoAggregatedFeatures-RF-MLP-XGB-SVC-Stacking.ipynb — Implementation of Extended Layer-cluster presentation methods • SequenceModels.ipynb — Implementation of Sequence presentation methods ( LSTM and 1DCNN) • AdaptiveSamplingGraphConvNN.ipynb — Adaptive Sampling Towards Fast Graph Representation Learning • GraphConvNN.ipynb — Implementation of Graph Convolution Neural Network • Porting_to_c_example.ipynb — An example of porting a trained sklearn model to C 36
  • 37. Appendix II: References 1. Graph Convolutional Neural Network (https://arxiv.org/abs/1609.02907) 2. Adaptive sampling for graph representation learning (https://arxiv.org/abs/1809.05343) 3. Pileup Mitigation with Machine Learning (PUMML) (https://arxiv.org/abs/1707.08600) 4. Deep learning in color: towards automated quark/gluon jet discrimination (https://arxiv.org/abs/1612.01551) 5. Jet-Images – Computer Vision Inspired Techniques for Jet Tagging (https://arxiv.org/abs/1407.5675) 6. Jet-Images – Deep Learning Edition (https://arxiv.org/abs/1511.05190) 7. Pileup mitigation at the Large Hadron Collider with Graph Neural Networks (https://arxiv.org/abs/1810.07988) 8. TensorFlow implementations of Graph Neural Networks: (https://github.com/microsoft/tf-gnn-samples) 9. PyTorch implementations of Graph Neural Networks: (https://github.com/dmlc/dgl) 10. Code Repository (https://bit.ly/38cJRQg) 37