11-777 lecture 1.1 introduction_the behavior era, the computational era-CSDN博客

本文链接：https://blog.csdn.net/liupeng19970119/article/details/109113884

文章目录

background
O
KR

background

Recently, I find a good cources about multimodal machine learning. In this blog, I will study it and note my understanding.

O

master multimodal basic work

KR

what is modality ?
multimodal develop history
main area in multimodal

1.what is modality ?

modality :

the way in which something happens or experienced.
it includes sensory form(touch,feel) or a certain type of information(image, speech).

Medium :

a means for storing or communicating information.

Here is examples of modalities:

2. multimodal develop history

The “behavioral” era (1970s until late 1980s)
The McGurk Effect (1976)
The “computational” era (late 1980s until 2000)
Audio-Visual Speech Recognition (AVSR)
Affective Computing
The “interaction” era (2000 - 2010)
Human Multimodal Interaction ways.
The “deep learning” era (2010s until …)

3. main areas in multimodal

multimodal has 5 core thories, 37 applicationes, 235 related work.
here are five areas.

1. Representation

Definition : Learning how to represent and summarize multimodal data in away
that exploits the complementarity and redundancy.
在这里插入图片描述
demo :

main framewrok :
在这里插入图片描述
coordinated representaions is aiming to max corrlelated and make uncorrelated ventors distincitly.

2. Alignment

find correspondences between elements of modalities.
在这里插入图片描述
Demo :

在这里插入图片描述

3. fusion

Definition: To join information from two or more modalities to perform a
prediction task.

it is not talking about detail model name,But fcou on when, how, what to fusion.
Model-Based (Intermediate) Approaches

Deep neural networks
Kernel-based methods
Graphical models

4. Translation

Definition: Process of changing data from one modality to another, where the
translation relationship can often be open-ended or subjective.
在这里插入图片描述