Multi-task learning combines examples (soft limitations imposed on the parameters) from different tasks to improve generalization. When a section of a model is shared across tasks, it is more constrained to excellent values (if the sharing is acceptable), which often leads to better generalization.
The diagram below shows a common type of multi-task learning in which several supervised tasks (predicting \mathbf{y}^{(i)} given x) share the same input x, as well as an intermediate-level representation \mathbf{h}^{(shared)} that captures a common pool of components (shared). The model is divided into two sorts of parts, each with its own set of parameters:
- Task-specific parameters - which only benefit from the examples of their task to achieve good generalization. The higher layers of the neural network are depicted in the diagram below.
- Generic parameters - those that apply to all tasks (which benefit from the pooled data of all the tasks). The lower levels of the neural network are depicted in the diagram below.
Multi-task learning can take many different shapes in deep learning frameworks, and this diagram represents a common scenario in which the tasks share a common input but have many target random variables. The lower layers of a deep network (whether supervised and feedforward or with a generative component with downward arrows) can be shared across tasks, and task-specific parameters (associated with the weights into and out of \mathbf{h}^{(1)} and \mathbf{h}^{(2)} , respectively) can be learned on top of those, resulting in a shared representation \mathbf{h}^{(shared)} . The core idea is that variances in input x are explained by a common pool of factors and that each job is linked to a subset of these factors. Top-level hidden units \mathbf{h}^{(1)} and \mathbf{h}^{(2)} are specialised for each task (predicting \mathbf{y}^{(1)} and \mathbf{y}^{(2)} , respectively) in this example, while some intermediate-level representation(shared) is shared across all tasks. In the unsupervised learning environment, some of the top-level components should be related to none of the output tasks (\mathbf{h}^{(3)} ): these are the pieces that explain some of the input changes but are not beneficial for predicting \mathbf{y}^{(1)} or \mathbf{y}^{(2)} .
Learning curves show how the negative log-likelihood loss has changed over time (indicated as a number of training iterations over the dataset, or epochs). In this scenario, we use MNIST to train a maxout network. The training goal decreases over time, while the validation set average loss eventually rises, resulting in an asymmetric U-shaped curve.
Improved generalisation and generalisation error bounds can be achieved thanks to the shared parameters, which can significantly improve statistical strength. Of course, this will only occur if precise assumptions regarding the statistical link between distinct activities are valid, meaning that some tasks are related. Some of the factors that explain the differences observed in the data associated with different tasks are consistent across two or more tasks, which is the basic prior premise in deep learning.
Similar Reads
Machine Learning Models Machine Learning models are very powerful resources that automate multiple tasks and make them more accurate and efficient. ML handles new data and scales the growing demand for technology with valuable insight. It improves the performance over time. This cutting-edge technology has various benefits
14 min read
Transfer Learning in NLP Transfer learning is an important tool in natural language processing (NLP) that helps build powerful models without needing massive amounts of data. This article explains what transfer learning is, why it's important in NLP, and how it works. Table of Content Why Transfer Learning is important in N
15+ min read
Supervised Machine Learning Supervised machine learning is a fundamental approach for machine learning and artificial intelligence. It involves training a model using labeled data, where each input comes with a corresponding correct output. The process is like a teacher guiding a studentâhence the term "supervised" learning. I
12 min read
Introduction to Multi-Task Learning(MTL) for Deep Learning Multi-Task Learning (MTL) is a type of machine learning technique where a model is trained to perform multiple tasks simultaneously. In deep learning, MTL refers to training a neural network to perform multiple tasks by sharing some of the network's layers and parameters across tasks. In MTL, the go
6 min read
Multioutput Regression in Machine Learning In machine learning we often encounter regression, these problems involve predicting a continuous target variable, such as house prices, or temperature. However, in many real-world scenarios, we need to predict not only single but many variables together, this is where we use multi-output regression
11 min read