SlideShare a Scribd company logo
Scalable and Order-robust Continual Learning
with Additive Parameter Decomposition
𝐽𝐽𝐽𝐽𝐽𝐽𝐽𝐽𝐽𝐽𝐽𝐽𝐽 𝑌𝑌𝑌𝑌𝑌𝑌𝑛𝑛1, 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝐾𝐾𝐾𝐾𝑚𝑚2, 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑌𝑌𝑌𝑌𝑌𝑌𝑔𝑔1,2, 𝑎𝑎𝑎𝑎𝑎𝑎 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝐽𝐽𝐽𝐽 𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝑔𝑔1,2
KAIST1, AITRICS2
Continual Learning of a Machine
Continual learning is often formulated as an incremental/online multi-task learning
that models complex task-to-task relationships by weights in NN.
t-2 t-1 t
Learning
Model
Learned knowledge
3) New knowledge is s
tored for future use
2) Knowledge is transferred
from previously Learned tasks
1) Tasks are received in a
sequential order
4) Refine existing
knowledge
Challenges: Catastrophic Forgetting
Introduction of new tasks can result in semantic drift or catastrophic forgetting,
where original meaning of the features change as they fit to later tasks.
Jaehong Yoon et al., “Scalable and Order-robust Continual Learning with Additive Parameter Decomposition”, ICLR 2020.
𝑾𝑾 1
𝑾𝑾 2
New task
+
Challenges: Scalability
Even with well-defined regularizers, it is very hard to completely avoid catastrophic
forgetting, since in practice, the model may encounter an unlimited number of tasks.
Jaehong Yoon et al., “Scalable and Order-robust Continual Learning with Additive Parameter Decomposition”, ICLR 2020.
Toy-sized Continual Learning
…
Large-scaled Continual Learning
Continual learning model needs to guarantee their scalability to a large number of
tasks, with respect to efficiency as to memory usage and training time.
Challenges: Task-order Sensitivity
Jaehong Yoon et al., “Scalable and Order-robust Continual Learning with Additive Parameter Decomposition”, ICLR 2020.
Diseases
Disease
Classification
ResultInputTask
Diseases
Diseases
OrderA
OrderB
?
The task order that the model trains on has a large impact on continual learning model
due to the unidirectional knowledge transfer from earlier tasks to later one.
Additive Parameter Decomposition (APD)
Conceptually, our model, APD additively decomposes the model parameters
into task-shared (𝝈𝝈) and highly-sparse task-adaptive parameters(𝝉𝝉).
Further, we periodically regroup task-adaptive parameters to obtain hierarchically
shared parameters for utilizing the varying degree of knowledge sharing structure.
Jaehong Yoon et al., “Scalable and Order-robust Continual Learning with Additive Parameter Decomposition”, ICLR 2020.
𝝉𝝉1:𝑡𝑡
𝝈𝝈
ℳ1:𝑡𝑡
Task-order Robust (Reliable) Continual Learning
It is achieved by following mechanisms:
• Decomposition of parameters into task-shared and task-adaptive parts.
• Sparsity-inducing regularization on the task-adaptive parameters.
Jaehong Yoon et al., “Scalable and Order-robust Continual Learning with Additive Parameter Decomposition”, ICLR 2020.
𝝈𝝈 𝝉𝝉1:𝑡𝑡
+
ℳ1:𝑡𝑡
𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 ℒ 𝝈𝝈 ⊗ 𝒎𝒎𝑡𝑡 + 𝝉𝝉𝑡𝑡 ; 𝒟𝒟𝑡𝑡 + 𝜆𝜆1 �
𝑖𝑖=1
𝑡𝑡
||𝝉𝝉𝑖𝑖||1 + 𝜆𝜆2 �
𝑖𝑖=1
𝑡𝑡−1
||𝜽𝜽𝑖𝑖
∗
− (𝝈𝝈 ⊗ 𝒎𝒎𝑖𝑖 + 𝝉𝝉𝑖𝑖)||2
2
𝝈𝝈, 𝝉𝝉1:𝑡𝑡, 𝒗𝒗1:𝑡𝑡
• The retroactive update of previous task-adaptive parameters to reflect the change in
the task-shared parameters prevents them from drifting away from their original
solutions.
𝜽𝜽𝑖𝑖
∗
: Approximated Solution of previous tasks i
Experimental Results
APD variants outperforms recent expansion-based continual learning benchmarks
with minimal capacity expansion, and training time.
Jaehong Yoon et al., “Scalable and Order-robust Continual Learning with Additive Parameter Decomposition”, ICLR 2020.
Experimental Results
APD shows remarkably superior and reliable performance in terms of Task-order
fairness (robustness).
Jaehong Yoon et al., “Scalable and Order-robust Continual Learning with Additive Parameter Decomposition”, ICLR 2020.
Large-scaled (# of tasks) Continual Learning
We further validate the scalability of our model with large-scale continual learning
experiments on Omniglot dataset, which has 100 tasks.
The plot shows that our APD scales well, showing logarithmic growth in network
capacity (the number of parameters), while PGN shows linear growth.
Jaehong Yoon et al., “Scalable and Order-robust Continual Learning with Additive Parameter Decomposition”, ICLR 2020.
Models Capacity Accuracy
STL 10,000% 82.13±0.08%
L2T 1,599% 64.65±1.76%
EWC 1,599% 68.66±1.92%
PGN-large 1,543% 79.35±0.12%
PGN-small 1,045% 73.65±0.27%
APD-large 943% 81.60±0.53%
APD-small 649% 81.20±0.62%
~ 7.95%
Preventing Catastrophic Forgetting
APD variants show no sign of catastrophic forgetting on those tasks, although
their performances marginally change during the course of training.
Jaehong Yoon et al., “Scalable and Order-robust Continual Learning with Additive Parameter Decomposition”, ICLR 2020.
Selective Task Forgetting
There is no performance degeneration on non-target tasks, since dropping out a
task-adaptive parameter for a specific task will not affect for the remaining tasks.
Forgetting (Training Step 5)Forgetting (Training Step 3)
This ability to selectively forget is another important advantage of our model that
makes it practical in lifelong learning scenarios.
Jaehong Yoon et al., “Scalable and Order-robust Continual Learning with Additive Parameter Decomposition”, ICLR 2020.
Conclusion
• We tackle practically important and novel problems in continual learning that have
been overlooked thus far, such as scalability and order-robustness.
Jaehong Yoon et al., “Scalable and Order-robust Continual Learning with Additive Parameter Decomposition”, ICLR 2020.
• We introduce a novel CL framework which is based on the decomposition of the
network parameters into shared and sparse task-adaptive parameters.
• We perform extensive experimental validation of our model on multiple datasets
against recent continual learning methods. APD is significantly superior to them
in terms of the accuracy, efficiency, scalability, as well as order-robustness.
Thanks

More Related Content

What's hot (20)

PPTX
【DL輪読会】Contrastive Learning as Goal-Conditioned Reinforcement Learning
Deep Learning JP
 
PDF
Continual Learning: why, how, and when
Gabriele Graffieti
 
PDF
continual learning survey
ぱんいち すみもと
 
PDF
Optimization in deep learning
Rakshith Sathish
 
PDF
Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...
Universitat Politècnica de Catalunya
 
PDF
Dcgan
Brian Kim
 
PDF
Introduction to Diffusion Models
Sangwoo Mo
 
PDF
Machine Learning Explanations: LIME framework
Deep Learning Italia
 
PPTX
Probabilistic Reasoning
Junya Tanaka
 
PDF
Introduction to SAC(Soft Actor-Critic)
Suhyun Cho
 
PDF
Presentation - Model Efficiency for Edge AI
Qualcomm Research
 
PDF
Emerging Properties in Self-Supervised Vision Transformers
Sungchul Kim
 
PDF
Mask-RCNN for Instance Segmentation
Dat Nguyen
 
PDF
Introduction to A3C model
WEBFARMER. ltd.
 
PDF
[2020 CVPR Efficient DET paper review]
taeseon ryu
 
PDF
CNN Attention Networks
Taeoh Kim
 
PPTX
Physics-Informed Machine Learning
OmarYounis21
 
PDF
Temporal difference learning
Jie-Han Chen
 
PPTX
[DL輪読会]Dream to Control: Learning Behaviors by Latent Imagination
Deep Learning JP
 
PDF
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 
【DL輪読会】Contrastive Learning as Goal-Conditioned Reinforcement Learning
Deep Learning JP
 
Continual Learning: why, how, and when
Gabriele Graffieti
 
continual learning survey
ぱんいち すみもと
 
Optimization in deep learning
Rakshith Sathish
 
Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...
Universitat Politècnica de Catalunya
 
Dcgan
Brian Kim
 
Introduction to Diffusion Models
Sangwoo Mo
 
Machine Learning Explanations: LIME framework
Deep Learning Italia
 
Probabilistic Reasoning
Junya Tanaka
 
Introduction to SAC(Soft Actor-Critic)
Suhyun Cho
 
Presentation - Model Efficiency for Edge AI
Qualcomm Research
 
Emerging Properties in Self-Supervised Vision Transformers
Sungchul Kim
 
Mask-RCNN for Instance Segmentation
Dat Nguyen
 
Introduction to A3C model
WEBFARMER. ltd.
 
[2020 CVPR Efficient DET paper review]
taeseon ryu
 
CNN Attention Networks
Taeoh Kim
 
Physics-Informed Machine Learning
OmarYounis21
 
Temporal difference learning
Jie-Han Chen
 
[DL輪読会]Dream to Control: Learning Behaviors by Latent Imagination
Deep Learning JP
 
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 

More from MLAI2 (20)

PDF
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
MLAI2
 
PDF
Online Hyperparameter Meta-Learning with Hypergradient Distillation
MLAI2
 
PDF
Online Coreset Selection for Rehearsal-based Continual Learning
MLAI2
 
PDF
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
MLAI2
 
PDF
Skill-Based Meta-Reinforcement Learning
MLAI2
 
PDF
Edge Representation Learning with Hypergraphs
MLAI2
 
PDF
Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Genera...
MLAI2
 
PDF
Mini-Batch Consistent Slot Set Encoder For Scalable Set Encoding
MLAI2
 
PDF
Task Adaptive Neural Network Search with Meta-Contrastive Learning
MLAI2
 
PDF
Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint L...
MLAI2
 
PDF
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning
MLAI2
 
PDF
Accurate Learning of Graph Representations with Graph Multiset Pooling
MLAI2
 
PDF
Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...
MLAI2
 
PDF
Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...
MLAI2
 
PDF
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MLAI2
 
PDF
Adversarial Self-Supervised Contrastive Learning
MLAI2
 
PDF
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...
MLAI2
 
PDF
Neural Mask Generator : Learning to Generate Adaptive Word Maskings for Langu...
MLAI2
 
PDF
Cost-effective Interactive Attention Learning with Neural Attention Process
MLAI2
 
PDF
Adversarial Neural Pruning with Latent Vulnerability Suppression
MLAI2
 
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
MLAI2
 
Online Hyperparameter Meta-Learning with Hypergradient Distillation
MLAI2
 
Online Coreset Selection for Rehearsal-based Continual Learning
MLAI2
 
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
MLAI2
 
Skill-Based Meta-Reinforcement Learning
MLAI2
 
Edge Representation Learning with Hypergraphs
MLAI2
 
Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Genera...
MLAI2
 
Mini-Batch Consistent Slot Set Encoder For Scalable Set Encoding
MLAI2
 
Task Adaptive Neural Network Search with Meta-Contrastive Learning
MLAI2
 
Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint L...
MLAI2
 
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning
MLAI2
 
Accurate Learning of Graph Representations with Graph Multiset Pooling
MLAI2
 
Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...
MLAI2
 
Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...
MLAI2
 
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MLAI2
 
Adversarial Self-Supervised Contrastive Learning
MLAI2
 
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...
MLAI2
 
Neural Mask Generator : Learning to Generate Adaptive Word Maskings for Langu...
MLAI2
 
Cost-effective Interactive Attention Learning with Neural Attention Process
MLAI2
 
Adversarial Neural Pruning with Latent Vulnerability Suppression
MLAI2
 
Ad

Recently uploaded (20)

PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PPTX
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
PDF
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PPT
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PPTX
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
Python basic programing language for automation
DanialHabibi2
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Python basic programing language for automation
DanialHabibi2
 
Ad

Scalable and Order-robust Continual Learning with Additive Parameter Decomposition

  • 1. Scalable and Order-robust Continual Learning with Additive Parameter Decomposition 𝐽𝐽𝐽𝐽𝐽𝐽𝐽𝐽𝐽𝐽𝐽𝐽𝐽 𝑌𝑌𝑌𝑌𝑌𝑌𝑛𝑛1, 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝐾𝐾𝐾𝐾𝑚𝑚2, 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑌𝑌𝑌𝑌𝑌𝑌𝑔𝑔1,2, 𝑎𝑎𝑎𝑎𝑎𝑎 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝐽𝐽𝐽𝐽 𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝑔𝑔1,2 KAIST1, AITRICS2
  • 2. Continual Learning of a Machine Continual learning is often formulated as an incremental/online multi-task learning that models complex task-to-task relationships by weights in NN. t-2 t-1 t Learning Model Learned knowledge 3) New knowledge is s tored for future use 2) Knowledge is transferred from previously Learned tasks 1) Tasks are received in a sequential order 4) Refine existing knowledge
  • 3. Challenges: Catastrophic Forgetting Introduction of new tasks can result in semantic drift or catastrophic forgetting, where original meaning of the features change as they fit to later tasks. Jaehong Yoon et al., “Scalable and Order-robust Continual Learning with Additive Parameter Decomposition”, ICLR 2020. 𝑾𝑾 1 𝑾𝑾 2 New task +
  • 4. Challenges: Scalability Even with well-defined regularizers, it is very hard to completely avoid catastrophic forgetting, since in practice, the model may encounter an unlimited number of tasks. Jaehong Yoon et al., “Scalable and Order-robust Continual Learning with Additive Parameter Decomposition”, ICLR 2020. Toy-sized Continual Learning … Large-scaled Continual Learning Continual learning model needs to guarantee their scalability to a large number of tasks, with respect to efficiency as to memory usage and training time.
  • 5. Challenges: Task-order Sensitivity Jaehong Yoon et al., “Scalable and Order-robust Continual Learning with Additive Parameter Decomposition”, ICLR 2020. Diseases Disease Classification ResultInputTask Diseases Diseases OrderA OrderB ? The task order that the model trains on has a large impact on continual learning model due to the unidirectional knowledge transfer from earlier tasks to later one.
  • 6. Additive Parameter Decomposition (APD) Conceptually, our model, APD additively decomposes the model parameters into task-shared (𝝈𝝈) and highly-sparse task-adaptive parameters(𝝉𝝉). Further, we periodically regroup task-adaptive parameters to obtain hierarchically shared parameters for utilizing the varying degree of knowledge sharing structure. Jaehong Yoon et al., “Scalable and Order-robust Continual Learning with Additive Parameter Decomposition”, ICLR 2020. 𝝉𝝉1:𝑡𝑡 𝝈𝝈 ℳ1:𝑡𝑡
  • 7. Task-order Robust (Reliable) Continual Learning It is achieved by following mechanisms: • Decomposition of parameters into task-shared and task-adaptive parts. • Sparsity-inducing regularization on the task-adaptive parameters. Jaehong Yoon et al., “Scalable and Order-robust Continual Learning with Additive Parameter Decomposition”, ICLR 2020. 𝝈𝝈 𝝉𝝉1:𝑡𝑡 + ℳ1:𝑡𝑡 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 ℒ 𝝈𝝈 ⊗ 𝒎𝒎𝑡𝑡 + 𝝉𝝉𝑡𝑡 ; 𝒟𝒟𝑡𝑡 + 𝜆𝜆1 � 𝑖𝑖=1 𝑡𝑡 ||𝝉𝝉𝑖𝑖||1 + 𝜆𝜆2 � 𝑖𝑖=1 𝑡𝑡−1 ||𝜽𝜽𝑖𝑖 ∗ − (𝝈𝝈 ⊗ 𝒎𝒎𝑖𝑖 + 𝝉𝝉𝑖𝑖)||2 2 𝝈𝝈, 𝝉𝝉1:𝑡𝑡, 𝒗𝒗1:𝑡𝑡 • The retroactive update of previous task-adaptive parameters to reflect the change in the task-shared parameters prevents them from drifting away from their original solutions. 𝜽𝜽𝑖𝑖 ∗ : Approximated Solution of previous tasks i
  • 8. Experimental Results APD variants outperforms recent expansion-based continual learning benchmarks with minimal capacity expansion, and training time. Jaehong Yoon et al., “Scalable and Order-robust Continual Learning with Additive Parameter Decomposition”, ICLR 2020.
  • 9. Experimental Results APD shows remarkably superior and reliable performance in terms of Task-order fairness (robustness). Jaehong Yoon et al., “Scalable and Order-robust Continual Learning with Additive Parameter Decomposition”, ICLR 2020.
  • 10. Large-scaled (# of tasks) Continual Learning We further validate the scalability of our model with large-scale continual learning experiments on Omniglot dataset, which has 100 tasks. The plot shows that our APD scales well, showing logarithmic growth in network capacity (the number of parameters), while PGN shows linear growth. Jaehong Yoon et al., “Scalable and Order-robust Continual Learning with Additive Parameter Decomposition”, ICLR 2020. Models Capacity Accuracy STL 10,000% 82.13±0.08% L2T 1,599% 64.65±1.76% EWC 1,599% 68.66±1.92% PGN-large 1,543% 79.35±0.12% PGN-small 1,045% 73.65±0.27% APD-large 943% 81.60±0.53% APD-small 649% 81.20±0.62% ~ 7.95%
  • 11. Preventing Catastrophic Forgetting APD variants show no sign of catastrophic forgetting on those tasks, although their performances marginally change during the course of training. Jaehong Yoon et al., “Scalable and Order-robust Continual Learning with Additive Parameter Decomposition”, ICLR 2020.
  • 12. Selective Task Forgetting There is no performance degeneration on non-target tasks, since dropping out a task-adaptive parameter for a specific task will not affect for the remaining tasks. Forgetting (Training Step 5)Forgetting (Training Step 3) This ability to selectively forget is another important advantage of our model that makes it practical in lifelong learning scenarios. Jaehong Yoon et al., “Scalable and Order-robust Continual Learning with Additive Parameter Decomposition”, ICLR 2020.
  • 13. Conclusion • We tackle practically important and novel problems in continual learning that have been overlooked thus far, such as scalability and order-robustness. Jaehong Yoon et al., “Scalable and Order-robust Continual Learning with Additive Parameter Decomposition”, ICLR 2020. • We introduce a novel CL framework which is based on the decomposition of the network parameters into shared and sparse task-adaptive parameters. • We perform extensive experimental validation of our model on multiple datasets against recent continual learning methods. APD is significantly superior to them in terms of the accuracy, efficiency, scalability, as well as order-robustness.