SlideShare a Scribd company logo
PRINS: Scalable Model Inference for
Component-based System Logs*
Donghwan Shin1), Domenico Bianculli2), and Lionel Briand2,3)
1) University of She
ffi
eld
2) University of Luxembourg
3) University of Ottawa
* This presentation is for the Journal-First Track at ICSE 2023; the original paper was accepted in Empirical Software Engineering (EMSE) journal.
A
B
Y
Z
…
Model
Inference
Technique
ith execution
20190621.001 A
20190621.002 B
20190621.002 Z
20190621.002 B
…
ith execution
20190621.001 A
20190621.002 B
20190621.002 Z
20190621.002 B
…
ith execution
20221101.001 A
20221101.004 B
20221101.011 Z
20221101.013 B
20221101.101 Y
…
System Logs System Model
Log = A sequence of log entries representing a single execution
fl
ow
Too large
Not Scalable
Enough
No Models
2
081111 090711 25010 INFO dfs.DataNode$DataXceiver: Receiving block blk_5652408071925555972 src: /10.251.65.203:38382 dest: /10.251.65.203:50010
081111 090711 25181 INFO dfs.DataNode$DataXceiver: Receiving block blk_5652408071925555972 src: /10.251.27.63:54730 dest: /10.251.27.63:50010
081111 090711 25487 INFO dfs.DataNode$DataXceiver: Receiving block blk_5652408071925555972 src: /10.251.65.203:40305 dest: /10.251.65.203:50010
081111 090711 00031 INFO dfs.FSNamesystem: BLOCK* NameSystem.allocateBlock: /user/root/rand8/_temporary/part-00156. blk_5652408071925555972
081111 090756 25011 INFO dfs.DataNode$PacketResponder: PacketResponder 2 for block blk_5652408071925555972 terminating
081111 090756 25011 INFO dfs.DataNode$PacketResponder: Received block blk_5652408071925555972 of size 67108864 from /10.251.65.203
081111 090756 25184 INFO dfs.DataNode$PacketResponder: PacketResponder 0 for block blk_5652408071925555972 terminating
081111 090756 25184 INFO dfs.DataNode$PacketResponder: Received block blk_5652408071925555972 of size 67108864 from /10.251.27.63
081111 090756 25488 INFO dfs.DataNode$PacketResponder: PacketResponder 1 for block blk_5652408071925555972 terminating
081111 090756 25488 INFO dfs.DataNode$PacketResponder: Received block blk_5652408071925555972 of size 67108864 from /10.251.65.203
081111 090756 00027 INFO dfs.FSNamesystem: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.251.71.16:50010 is added to blk_5652408071925555972
081111 111345 00013 INFO dfs.DataBlockScanner: Veri
fi
cation succeeded for blk_5652408071925555972
Example HDFS Log
Component IDs
3
Observation: Systems are often composed of multiple components
What if we infer INDIVIDUAL component
models and then stitch them together?
4
System Logs
eA
1
eA
2
eB
4
eB
4
eA
1
eA
2
eB
4
eA
1
eA
3
eB
5
eA
1
eA
2
eB
4
eB
4
eA
1
eA
2
eB
4
eA
1
eA
3
eB
5
ax
bx
dy
dy
ax
bx
dy
ax
cx
ey
PRINS: PRojection-INference-Stitching
s0
s1
s2
s3
s4
a b
c
d
e
INference
Model of x
Model of y
INference
Component x
Component y
PRojection
eA
1
eA
2
eA
1
eA
2
eA
1
eA
3
eA
1
eA
2
eA
1
eA
2
eA
1
eA
3
ax
bx
ax
bx
ax
cx
eB
4
eB
4
eB
4
eB
5
eB
4
eB
4
eB
4
eB
5
dy
dy
dy
ey
s0
s1
s2
a b
c
d
s4
e
Stitching
System Model
+ (optional) Heuristic
Determinisation (HD)
Research Questions
• RQ1: How does the execution time of PRINS change according to the parallel
inference tasks in the inference stage?
• RQ2: How does the execution time of change according to parameter ?
• RQ3: How does the accuracy of the models (in the form of gFSMs) generated
by change according to parameter ?
• RQ4: How fast is PRINS when compared to state-of-the-art model inference
techniques?
• RQ5: How accurate are the models generated by PRINS compared to those
generated by state-of-the-art model inference techniques?
HDu u
HDu u
6
Parallel
inference
Heuristic
Determinisation
PRINS
(compared
to
MINT)
Research Questions
• RQ1: How does the execution time of PRINS change according to the parallel
inference tasks in the inference stage?
• RQ2: How does the execution time of change according to parameter ?
• RQ3: How does the accuracy of the models (in the form of gFSMs) generated
by change according to parameter ?
• RQ4: How fast is PRINS when compared to state-of-the-art model inference
techniques?
• RQ5: How accurate are the models generated by PRINS compared to those
generated by state-of-the-art model inference techniques?
HDu u
HDu u
7
Parallel
inference
Heuristic
Determinisation
PRINS
(compared
to
MINT)
RQ4: Execution Time of PRINS compared to MINT
2 4 6 8
5
10
15
20
Execution
Time
(s)
Hadoop
MINT
PRINS-N
PRINS-P
2 4 6 8
0
5000
10000
HDFS
MINT
PRINS-N
PRINS-P
2 4 6 8
0
5000
10000
15000
Linux
MINT
PRINS-N
PRINS-P
2 4
0
2500
5000
7500
10000
Zookeeper
MINT
PRINS-N
PRINS-P
2 4 6 8
Duplication Factor
0
5000
10000
15000
Execution
Time
(s)
CoreSync
MINT
PRINS-N
PRINS-P
2 4 6 8
Duplication Factor
2.5
5.0
7.5
10.0
12.5
NGLClient
MINT
PRINS-N
PRINS-P
2 4 6 8
Duplication Factor
0
10000
20000
30000
Oobelib
MINT
PRINS-N
PRINS-P
2 4 6 8
Duplication Factor
0
5000
10000
15000
PDApp
MINT
PRINS-N
PRINS-P
PRINS-N = PRINS with No parallel inference (HD is enabled to be fair with MINT)
PRINS-P = PRINS with Parallel inference (HD is enabled to be fair with MINT)
Duplication Factor = How many times each log is duplicated to increase the input log size systematically
8
RQ5: Accuracy of PRINS compared to MINT
9
Downside: Size of System Models
10
Contributions
• Tame the scalability issue of model
inference using divide-and-conquer.
• Present an empirical evaluation of
PRINS and its comparison with the
state-of-the-art model inference tool.
• It works especially well when the
components appearing in di
ff
erent
executions are similar.
• Provide a publicly available
implementation of PRINS.
11
Paper (Open Access) Replication Package

More Related Content

Similar to PRINS: Scalable Model Inference for Component-based System Logs (20)

PDF
Spark
newmooxx
 
PDF
Big Data and Small Devices by Katharina Morik
BigMine
 
PDF
Log-Based Slicing for System-Level Test Cases
Lionel Briand
 
PDF
Machine_Learning_Blocks___Bryan_Thesis
Bryan Collazo Santiago
 
PDF
Towards a Unified Data Analytics Optimizer with Yanlei Diao
Databricks
 
PPTX
Beyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLab
Vijay Srinivas Agneeswaran, Ph.D
 
PDF
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
Otávio Carvalho
 
PPT
sequenckjkojkjhguignmpojihiubgijnkompoje.ppt
JITENDER773791
 
PPT
sequencea.ppt
olusolaogunyewo1
 
PPT
sequf;lds,g;'dsg;dlld'g;;gldgence - Copy.ppt
JITENDER773791
 
PDF
An Institutional Theory For -Components
Scott Faria
 
PPTX
Predictive Maintenance with Deep Learning and Apache Flink
Dongwon Kim
 
PPTX
A TALE of DATA PATTERN DISCOVERY IN PARALLEL
Jenny Liu
 
PPTX
Next generation analytics with yarn, spark and graph lab
Impetus Technologies
 
PPTX
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward
 
PDF
A Lock-Free Algorithm of Tree-Based Reduction for Large Scale Clustering on G...
Ruo Ando
 
PPTX
SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013
James McGalliard
 
PDF
Data Analysis and Prediction System for Meteorological Data
IRJET Journal
 
PDF
Omega
benevolent001
 
PPTX
Chapter Introductionn to distributed system .pptx
Tekle12
 
Spark
newmooxx
 
Big Data and Small Devices by Katharina Morik
BigMine
 
Log-Based Slicing for System-Level Test Cases
Lionel Briand
 
Machine_Learning_Blocks___Bryan_Thesis
Bryan Collazo Santiago
 
Towards a Unified Data Analytics Optimizer with Yanlei Diao
Databricks
 
Beyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLab
Vijay Srinivas Agneeswaran, Ph.D
 
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
Otávio Carvalho
 
sequenckjkojkjhguignmpojihiubgijnkompoje.ppt
JITENDER773791
 
sequencea.ppt
olusolaogunyewo1
 
sequf;lds,g;'dsg;dlld'g;;gldgence - Copy.ppt
JITENDER773791
 
An Institutional Theory For -Components
Scott Faria
 
Predictive Maintenance with Deep Learning and Apache Flink
Dongwon Kim
 
A TALE of DATA PATTERN DISCOVERY IN PARALLEL
Jenny Liu
 
Next generation analytics with yarn, spark and graph lab
Impetus Technologies
 
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward
 
A Lock-Free Algorithm of Tree-Based Reduction for Large Scale Clustering on G...
Ruo Ando
 
SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013
James McGalliard
 
Data Analysis and Prediction System for Meteorological Data
IRJET Journal
 
Chapter Introductionn to distributed system .pptx
Tekle12
 

More from Lionel Briand (20)

PDF
LTM: Scalable and Black-box Similarity-based Test Suite Minimization based on...
Lionel Briand
 
PDF
TEASMA: A Practical Methodology for Test Adequacy Assessment of Deep Neural N...
Lionel Briand
 
PDF
Automated Test Case Repair Using Language Models
Lionel Briand
 
PDF
Automated Testing and Safety Analysis of Deep Neural Networks
Lionel Briand
 
PDF
FlakyFix: Using Large Language Models for Predicting Flaky Test Fix Categorie...
Lionel Briand
 
PDF
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Lionel Briand
 
PDF
Precise and Complete Requirements? An Elusive Goal
Lionel Briand
 
PDF
Large Language Models for Test Case Evolution and Repair
Lionel Briand
 
PDF
Metamorphic Testing for Web System Security
Lionel Briand
 
PDF
Simulator-based Explanation and Debugging of Hazard-triggering Events in DNN-...
Lionel Briand
 
PDF
Fuzzing for CPS Mutation Testing
Lionel Briand
 
PDF
Data-driven Mutation Analysis for Cyber-Physical Systems
Lionel Briand
 
PDF
Many-Objective Reinforcement Learning for Online Testing of DNN-Enabled Systems
Lionel Briand
 
PDF
ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...
Lionel Briand
 
PDF
Black-box Safety Analysis and Retraining of DNNs based on Feature Extraction ...
Lionel Briand
 
PDF
Revisiting the Notion of Diversity in Software Testing
Lionel Briand
 
PDF
Applications of Search-based Software Testing to Trustworthy Artificial Intel...
Lionel Briand
 
PDF
Autonomous Systems: How to Address the Dilemma between Autonomy and Safety
Lionel Briand
 
PDF
Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...
Lionel Briand
 
PDF
Reinforcement Learning for Test Case Prioritization
Lionel Briand
 
LTM: Scalable and Black-box Similarity-based Test Suite Minimization based on...
Lionel Briand
 
TEASMA: A Practical Methodology for Test Adequacy Assessment of Deep Neural N...
Lionel Briand
 
Automated Test Case Repair Using Language Models
Lionel Briand
 
Automated Testing and Safety Analysis of Deep Neural Networks
Lionel Briand
 
FlakyFix: Using Large Language Models for Predicting Flaky Test Fix Categorie...
Lionel Briand
 
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Lionel Briand
 
Precise and Complete Requirements? An Elusive Goal
Lionel Briand
 
Large Language Models for Test Case Evolution and Repair
Lionel Briand
 
Metamorphic Testing for Web System Security
Lionel Briand
 
Simulator-based Explanation and Debugging of Hazard-triggering Events in DNN-...
Lionel Briand
 
Fuzzing for CPS Mutation Testing
Lionel Briand
 
Data-driven Mutation Analysis for Cyber-Physical Systems
Lionel Briand
 
Many-Objective Reinforcement Learning for Online Testing of DNN-Enabled Systems
Lionel Briand
 
ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...
Lionel Briand
 
Black-box Safety Analysis and Retraining of DNNs based on Feature Extraction ...
Lionel Briand
 
Revisiting the Notion of Diversity in Software Testing
Lionel Briand
 
Applications of Search-based Software Testing to Trustworthy Artificial Intel...
Lionel Briand
 
Autonomous Systems: How to Address the Dilemma between Autonomy and Safety
Lionel Briand
 
Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...
Lionel Briand
 
Reinforcement Learning for Test Case Prioritization
Lionel Briand
 
Ad

Recently uploaded (20)

PDF
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
PPTX
ChiSquare Procedure in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PPTX
Homogeneity of Variance Test Options IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
NEW-Viral>Wondershare Filmora 14.5.18.12900 Crack Free
sherryg1122g
 
PPTX
OpenChain @ OSS NA - In From the Cold: Open Source as Part of Mainstream Soft...
Shane Coughlan
 
PPTX
Help for Correlations in IBM SPSS Statistics.pptx
Version 1 Analytics
 
PDF
NSF Converter Simplified: From Complexity to Clarity
Johnsena Crook
 
PDF
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
PDF
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
PDF
Technical-Careers-Roadmap-in-Software-Market.pdf
Hussein Ali
 
PDF
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
PDF
Generic or Specific? Making sensible software design decisions
Bert Jan Schrijver
 
PPTX
Foundations of Marketo Engage - Powering Campaigns with Marketo Personalization
bbedford2
 
PDF
Everything you need to know about pricing & licensing Microsoft 365 Copilot f...
Q-Advise
 
PDF
IObit Driver Booster Pro 12.4.0.585 Crack Free Download
henryc1122g
 
PPTX
Comprehensive Risk Assessment Module for Smarter Risk Management
EHA Soft Solutions
 
PDF
Dipole Tech Innovations – Global IT Solutions for Business Growth
dipoletechi3
 
PDF
Top Agile Project Management Tools for Teams in 2025
Orangescrum
 
PPTX
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
PPTX
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
ChiSquare Procedure in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Homogeneity of Variance Test Options IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
NEW-Viral>Wondershare Filmora 14.5.18.12900 Crack Free
sherryg1122g
 
OpenChain @ OSS NA - In From the Cold: Open Source as Part of Mainstream Soft...
Shane Coughlan
 
Help for Correlations in IBM SPSS Statistics.pptx
Version 1 Analytics
 
NSF Converter Simplified: From Complexity to Clarity
Johnsena Crook
 
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
Technical-Careers-Roadmap-in-Software-Market.pdf
Hussein Ali
 
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
Generic or Specific? Making sensible software design decisions
Bert Jan Schrijver
 
Foundations of Marketo Engage - Powering Campaigns with Marketo Personalization
bbedford2
 
Everything you need to know about pricing & licensing Microsoft 365 Copilot f...
Q-Advise
 
IObit Driver Booster Pro 12.4.0.585 Crack Free Download
henryc1122g
 
Comprehensive Risk Assessment Module for Smarter Risk Management
EHA Soft Solutions
 
Dipole Tech Innovations – Global IT Solutions for Business Growth
dipoletechi3
 
Top Agile Project Management Tools for Teams in 2025
Orangescrum
 
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Ad

PRINS: Scalable Model Inference for Component-based System Logs

  • 1. PRINS: Scalable Model Inference for Component-based System Logs* Donghwan Shin1), Domenico Bianculli2), and Lionel Briand2,3) 1) University of She ffi eld 2) University of Luxembourg 3) University of Ottawa * This presentation is for the Journal-First Track at ICSE 2023; the original paper was accepted in Empirical Software Engineering (EMSE) journal.
  • 2. A B Y Z … Model Inference Technique ith execution 20190621.001 A 20190621.002 B 20190621.002 Z 20190621.002 B … ith execution 20190621.001 A 20190621.002 B 20190621.002 Z 20190621.002 B … ith execution 20221101.001 A 20221101.004 B 20221101.011 Z 20221101.013 B 20221101.101 Y … System Logs System Model Log = A sequence of log entries representing a single execution fl ow Too large Not Scalable Enough No Models 2
  • 3. 081111 090711 25010 INFO dfs.DataNode$DataXceiver: Receiving block blk_5652408071925555972 src: /10.251.65.203:38382 dest: /10.251.65.203:50010 081111 090711 25181 INFO dfs.DataNode$DataXceiver: Receiving block blk_5652408071925555972 src: /10.251.27.63:54730 dest: /10.251.27.63:50010 081111 090711 25487 INFO dfs.DataNode$DataXceiver: Receiving block blk_5652408071925555972 src: /10.251.65.203:40305 dest: /10.251.65.203:50010 081111 090711 00031 INFO dfs.FSNamesystem: BLOCK* NameSystem.allocateBlock: /user/root/rand8/_temporary/part-00156. blk_5652408071925555972 081111 090756 25011 INFO dfs.DataNode$PacketResponder: PacketResponder 2 for block blk_5652408071925555972 terminating 081111 090756 25011 INFO dfs.DataNode$PacketResponder: Received block blk_5652408071925555972 of size 67108864 from /10.251.65.203 081111 090756 25184 INFO dfs.DataNode$PacketResponder: PacketResponder 0 for block blk_5652408071925555972 terminating 081111 090756 25184 INFO dfs.DataNode$PacketResponder: Received block blk_5652408071925555972 of size 67108864 from /10.251.27.63 081111 090756 25488 INFO dfs.DataNode$PacketResponder: PacketResponder 1 for block blk_5652408071925555972 terminating 081111 090756 25488 INFO dfs.DataNode$PacketResponder: Received block blk_5652408071925555972 of size 67108864 from /10.251.65.203 081111 090756 00027 INFO dfs.FSNamesystem: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.251.71.16:50010 is added to blk_5652408071925555972 081111 111345 00013 INFO dfs.DataBlockScanner: Veri fi cation succeeded for blk_5652408071925555972 Example HDFS Log Component IDs 3 Observation: Systems are often composed of multiple components
  • 4. What if we infer INDIVIDUAL component models and then stitch them together? 4
  • 5. System Logs eA 1 eA 2 eB 4 eB 4 eA 1 eA 2 eB 4 eA 1 eA 3 eB 5 eA 1 eA 2 eB 4 eB 4 eA 1 eA 2 eB 4 eA 1 eA 3 eB 5 ax bx dy dy ax bx dy ax cx ey PRINS: PRojection-INference-Stitching s0 s1 s2 s3 s4 a b c d e INference Model of x Model of y INference Component x Component y PRojection eA 1 eA 2 eA 1 eA 2 eA 1 eA 3 eA 1 eA 2 eA 1 eA 2 eA 1 eA 3 ax bx ax bx ax cx eB 4 eB 4 eB 4 eB 5 eB 4 eB 4 eB 4 eB 5 dy dy dy ey s0 s1 s2 a b c d s4 e Stitching System Model + (optional) Heuristic Determinisation (HD)
  • 6. Research Questions • RQ1: How does the execution time of PRINS change according to the parallel inference tasks in the inference stage? • RQ2: How does the execution time of change according to parameter ? • RQ3: How does the accuracy of the models (in the form of gFSMs) generated by change according to parameter ? • RQ4: How fast is PRINS when compared to state-of-the-art model inference techniques? • RQ5: How accurate are the models generated by PRINS compared to those generated by state-of-the-art model inference techniques? HDu u HDu u 6 Parallel inference Heuristic Determinisation PRINS (compared to MINT)
  • 7. Research Questions • RQ1: How does the execution time of PRINS change according to the parallel inference tasks in the inference stage? • RQ2: How does the execution time of change according to parameter ? • RQ3: How does the accuracy of the models (in the form of gFSMs) generated by change according to parameter ? • RQ4: How fast is PRINS when compared to state-of-the-art model inference techniques? • RQ5: How accurate are the models generated by PRINS compared to those generated by state-of-the-art model inference techniques? HDu u HDu u 7 Parallel inference Heuristic Determinisation PRINS (compared to MINT)
  • 8. RQ4: Execution Time of PRINS compared to MINT 2 4 6 8 5 10 15 20 Execution Time (s) Hadoop MINT PRINS-N PRINS-P 2 4 6 8 0 5000 10000 HDFS MINT PRINS-N PRINS-P 2 4 6 8 0 5000 10000 15000 Linux MINT PRINS-N PRINS-P 2 4 0 2500 5000 7500 10000 Zookeeper MINT PRINS-N PRINS-P 2 4 6 8 Duplication Factor 0 5000 10000 15000 Execution Time (s) CoreSync MINT PRINS-N PRINS-P 2 4 6 8 Duplication Factor 2.5 5.0 7.5 10.0 12.5 NGLClient MINT PRINS-N PRINS-P 2 4 6 8 Duplication Factor 0 10000 20000 30000 Oobelib MINT PRINS-N PRINS-P 2 4 6 8 Duplication Factor 0 5000 10000 15000 PDApp MINT PRINS-N PRINS-P PRINS-N = PRINS with No parallel inference (HD is enabled to be fair with MINT) PRINS-P = PRINS with Parallel inference (HD is enabled to be fair with MINT) Duplication Factor = How many times each log is duplicated to increase the input log size systematically 8
  • 9. RQ5: Accuracy of PRINS compared to MINT 9
  • 10. Downside: Size of System Models 10
  • 11. Contributions • Tame the scalability issue of model inference using divide-and-conquer. • Present an empirical evaluation of PRINS and its comparison with the state-of-the-art model inference tool. • It works especially well when the components appearing in di ff erent executions are similar. • Provide a publicly available implementation of PRINS. 11 Paper (Open Access) Replication Package