没有合适的资源?快使用搜索试试~ 我知道了~
【计算机视觉】基于轻量模型的无参考人脸图像质量评估方法研究:VQualA 2025挑战赛技术综述
0 下载量 101 浏览量
2025-08-28
21:58:33
上传
评论
收藏 15.12MB PDF 举报
温馨提示
内容概要:本文介绍了VQualA 2025人脸图像质量评估(FIQA)挑战赛的方法与结果,旨在推动在计算资源受限条件下高效、精准的人脸图像质量评估模型的发展。该挑战要求参赛模型在不超过0.5 GFLOPs和500万参数的限制下,预测任意分辨率人脸图像的平均意见得分(MOS)。比赛吸引了127名参与者,共提交1519次结果,最终13支团队提交了完整方案。各团队采用了轻量级网络架构(如MobileNet、ShuffleNet、RegNet等),结合自训练、知识蒸馏、多阶段训练、双分支结构、注意力机制和相关性优化损失函数等技术,在保证效率的同时显著提升了性能。排名第一的方案ECNU-SJTU VQA Team通过自训练增强教师模型,并利用知识蒸馏得到高性能轻量学生模型,取得了0.9664的综合评分,优于基线模型0.13以上。; 适合人群:计算机视觉方向的研究人员、从事图像质量评估或轻量化模型设计的工程师,以及参与AI竞赛的开发者;具备深度学习和图像处理基础知识。; 使用场景及目标:① 探索适用于移动端和边缘设备的高效FIQA模型设计方法;② 学习如何在严格计算约束下通过知识蒸馏、自训练、多尺度训练等策略提升模型性能;③ 借鉴不同团队在数据增强、损失函数设计(如相关性损失、WingLoss)、模型融合等方面的创新实践。; 阅读建议:建议结合各团队提交的技术方案图示与训练细节,重点分析其在轻量化与性能平衡上的设计思路,尤其是知识蒸馏、双分支结构与评分分布均衡采样等关键技术的有效性,可为实际部署提供参考。
资源推荐
资源详情
资源评论

格式:pdf 资源大小:4.8MB 页数:103

格式:zip 资源大小:293.9MB



格式:zip 资源大小:367.4MB







格式:zip 资源大小:991.0MB




格式:png 资源大小:296.1KB
















VQualA 2025 Challenge on Face Image Quality Assessment:
Methods and Results
Sizhuo Ma
*
Wei-Ting Chen
*
Qiang Gao
*
Jian Wang
*
Chris Wei Zhou
*
Wei Sun Weixia Zhang Linhan Cao Jun Jia Xiangyang Zhu Dandan Zhu
Xiongkuo Min Guangtao Zhai Baoying Chen Xiongwei Xiao Jishen Zeng
Wei Wu Tiexuan Lou Yuchen Tan Chunyi Song Zhiwei Xu
MohammadAli Hamidi Hadi Amirpour Mingyin Bai Jiawang Du Zhenyu Jiang
Zilong Lu Ziguan Cui Zongliang Gan Xinpeng Li Shiqi Jiang Chenhui Li
Changbo Wang Weijun Yuan Zhan Li Yihang Chen Yifan Deng Ruting Deng
Zhanglu Chen Boyang Yao Shuling Zheng Feng Zhang Zhiheng Fu
Abhishek Joshi Aman Agarwal Rakhil Immidisetti Ajay Narasimha Mopidevi
Vishwajeet Shukla Hao Yang Ruikun Zhang Liyuan Pan Kaixin Deng
Hang Ouyang Fan Yang Zhizun Luo Zhuohang Shi Songning Lai Weilin Ruan
Yutao Yue
Abstract
Face images play a crucial role in numerous applications;
however, real-world conditions frequently introduce degra-
dations such as noise, blur, and compression artifacts,
affecting overall image quality and hindering subsequent
tasks. To address this challenge, we organized the VQualA
2025 Challenge on Face Image Quality Assessment (FIQA)
as part of the ICCV 2025 Workshops. Participants created
lightweight and efficient models (limited to 0.5 GFLOPs
and 5 million parameters) for the prediction of Mean Opin-
ion Scores (MOS) on face images with arbitrary resolu-
tions and realistic degradations. Submissions underwent
comprehensive evaluations through correlation metrics on
a dataset of in-the-wild face images. This challenge at-
tracted 127 participants, with 1519 final submissions. This
report summarizes the methodologies and findings for ad-
vancing the development of practical FIQA approaches.
1. Introduction
In recent years, face images have become integral to a
wide variety of applications, including video communica-
tion, photography, augmented reality, and digital content
∗
Sizhuo Ma (sma @ snap. com), Wei-Ting Chen (weitingchen @
microsoft . com), Qiang Gao (qgao @ snap . com), Jian Wang
(jwang4@snap.com) and Chris Wei Zhou (zhouw26@cardiff.
ac.uk) are the challenge organizers. The other authors are participants
of the VQualA 2025 Challenge on Face Image Quality Assessment.
creation. However, real-world face images are frequently
captured under non-ideal conditions due to environmen-
tal constraints and hardware limitations, resulting in com-
mon degradations such as noise, blur, compression artifacts,
and poor lighting. These degradations not only diminish
perceived image quality but also negatively impact down-
stream image processing tasks like enhancement, editing,
and synthesis. Moreover, compromised image quality can
adversely affect the performance and generalization ability
of data-driven models, including large-scale vision systems
and generative models, which depend on high-quality face
image datasets for effective training [15, 34]. Thus, the de-
velopment of robust generic FIQA methods capable of ac-
curately quantifying perceptual degradation levels has be-
come increasingly critical [2, 36].
To advance research in this area, we introduce the
VQualA 2025 Challenge on Face Image Quality Assess-
ment, held in conjunction with the ICCV 2025 Workshops.
This challenge focuses specifically on evaluating the per-
ceptual quality of face images on an arbitrary scale affected
by real-world degradations, emphasizing accuracy within
stringent computational constraints. Participants are tasked
with developing efficient and lightweight models capable of
predicting the MOS of face images under conditions such
as blur, noise, and low illumination. To reflect realistic de-
ployment scenarios, submissions must adhere to computa-
tional constraints, including a maximum of 0.5 GFLOPs
and fewer than 5 million parameters. Model performance
will be rigorously evaluated using no-reference image qual-
arXiv:2508.18445v1 [cs.CV] 25 Aug 2025

ity metrics and extensive subjective human studies to ensure
alignment with human perceptual judgments.
The primary objective of this challenge is to encourage
innovation in efficient and precise FIQA models suitable
for real-time deployment on mobile and edge devices, ul-
timately advancing the broader field of perceptual quality
assessment and enabling practical, real-world applications.
This challenge garnered significant interest, attracting
127 registered participants. Throughout the development
phase, participants submitted 1058 entries, followed by 461
submissions during the final testing phase. Ultimately, 13
teams successfully submitted their final models and accom-
panying fact sheets, each providing detailed methodologies
for face image quality assessment. Sec. 3 presents a com-
prehensive analysis and summary of submitted methods.
We anticipate that this challenge will contribute meaning-
fully to the ongoing progress of face image quality assess-
ment methods, particularly in real-world scenarios under
computational constraints.
This challenge is one of several associated with the
VQualA Workshop at ICCV 2025, including: Image Super-
Resolution Generated Content Quality Assessment [19],
Visual Quality Comparison for Large Multimodal Mod-
els [44], GenAI-Bench AIGC Video Quality Assess-
ment [3], Engagement Prediction for Short Videos [18] and
Document Image Quality Assessment [11].
2. VQualA FIQA Challenge
2.1. Datasets and Evaluation
To ensure a fair evaluation of participant solutions, we cu-
rated distinct training, validation, and testing datasets for
this challenge. Our training set comprises 27,686 images,
and our validation set contains 1,000 images, all collected
from CelebA [22] and Flickr. For the test set, we gathered
889 images exclusively from Flickr.
Variety in resolution. A key challenge of this compe-
tition was developing a Face Image Quality Assessment
(FIQA) method capable of handling in-the-wild images
with diverse resolutions. Unlike previous datasets, such as
GFIQA [36], our collected face images are not normalized
and exhibit a wide range of resolutions, with short-edge di-
mensions varying from 224 to 1024 pixels. To generate la-
bels for all datasets, we employed the state-of-the-art FIQA
method, DSL-FIQA [2]. For each image, regardless of its
resolution, 20 random patches were extracted and averaged
to determine its quality score.
Evaluation. The challenge was structured into two dis-
tinct phases:
• Development Phase: During this phase, participants
were provided with the training images and their cor-
responding labels, along with the validation images.
They were tasked with developing their solutions and
uploading prediction results for the validation set,
which were then compared against the ground truth.
• Testing Phase: For the testing phase, participants were
required to upload their model definitions and weights.
The models then processed the unseen test images di-
rectly on our server, and the results were compared
against the ground truth labels. We intentionally did
not release the test dataset due to a strict constraint of
0.5 GFLOPs and 5M parameters. Releasing the test
images could have led to participants using larger mod-
els to generate pseudo-labels for these images, subse-
quently training smaller models that overfit, thereby
compromising the fairness of the competition.
The awards were determined according to the testing
phase scores. We use the average of SROCC and
PLCC as the overall score:
Score = (SROCC + PLCC)/2 (1)
2.2. Baseline
To facilitate the development of solutions, a MobileNetV2-
based [35] baseline was provided, which accepts 224 × 224
pixel image patches as input. During inference, multiple
random patches were cropped from the original image and
processed by the network. The resulting output scores were
subsequently averaged to yield a final prediction. The base-
line was trained using the Adam [16] optimizer with a learn-
ing rate of 5 × 10
−4
and a weight decay of 10
−5
. The loss
function used was Mean Squared Error. Training was con-
ducted for 20 epochs with a batch size of 64.
While satisfactory performance was achieved with an
ensemble of 20 random crops, adherence to GFLOPs con-
straints necessitates the use of a single crop, which exhibits
suboptimal performance (see Tab. 1). This constraint poses
a challenge for participants, requiring the development of
optimal input-handling strategies, including appropriate re-
sizing and cropping techniques, to maximize performance
under computational limitations.
2.3. Challenge Results
Table 1 summarizes the challenge results. A total of 13
teams submitted their solutions and accompanying fact
sheets. The top-performing method attained a score of
0.9664, representing an improvement of over 0.13 rela-
tive to the baseline, notably with comparable computational
complexity (GFLOPs) and a reduced number of parameters.
The subsequent section provides a detailed description
of each submitted solution. A list of team members and
affiliations are included in Appendix A.

Table 1. Challenge Results
Rank Team Score SROCC PLCC GFLOPS NumParams[M]
1 ECNU-SJTU VQA Team 0.9664 0.9692 0.9637 0.3313 1.1796
2 MediaForensics 0.9624 0.9624 0.9624 0.4687 1.5189
3 Next 0.9583 0.9630 0.9535 0.4533 1.2224
4 ATHENAFace 0.9566 0.9600 0.9533 0.4985 2.0916
5 NJUPT-IQA-Group 0.9547 0.9530 0.9564 0.4860 3.7171
6 ECNU VIS Lab 0.9406 0.9397 0.9415 0.4923 3.2805
7 JNU620 0.9334 0.9413 0.9255 0.4097 3.2511
8 ISeeCV 0.9279 0.9282 0.9275 0.4890 0.9513
9 RegNet 0.9242 0.9262 0.9222 0.4895 4.0252
10 Conquerit 0.9038 0.9118 0.8958 0.2235 4.7795
11 BIT ssvgg 0.8727 0.8897 0.8557 0.5120 4.7242
12 2077Agent 0.8432 0.8529 0.8335 0.2852 1.3005
13 DERS 0.6999 0.7098 0.6900 0.8980 6.0523
Baseline 0.8309 0.8334 0.8283 0.3139 3.2511
labelled data
Teacher
Model
unlabeled data
Enhanced
Teacher
Model
unlabeled data
Student
Model
training
inferring
Figure 1. ECNU-SJTU VQA Team.
3. Teams and Methods
3.1. Efficient Face Image Quality Assessment via
Self-training and Knowledge Distillation (by
ECNU-SJTU VQA Team)
The ECNU-SJTU VQA Team proposes a framework com-
prising two main stages, as described in Fig. 1. First,
they trained a teacher model using a self-training approach.
Specifically, the Swin Transformer Base (Swin-B) [23] was
adopted. The classification head was removed and replaced
with a two-layer multilayer perceptron (MLP) consisting of
128 and 1 hidden neurons, respectively, serving as the re-
gression head. They began by training the teacher model on
the labeled face image quality assessment (FIQA) dataset
provided by the challenge organizers. Next, they collected
a large-scale unlabeled face image dataset (approximately
200k images) from the Internet. The trained teacher model
was then used to generate pseudo-labels for these images.
They combined the labeled images and pseudo-labeled im-
ages to retrain the teacher model, thereby enhancing its per-
formance through self-training. After this, the enhanced
teacher was used to generate pseudo-labels for an additional
set of collected face images (approximately 200k images)
for the second-stage training.
In the second stage, they trained a student model us-
ing the labeled images, the first-round and the second-
round pseudo-labeled images. The student model employed
EdgeNeXt-XX-Small [27] as the backbone, with its clas-
sification head replaced by the same two-layer MLP re-
gression head. Through learning from ground-truth data,
the teacher-labeled data, and the enhanced teacher-labeled
data, the student model achieved competitive performance,
closely matching that of the enhanced teacher.
Their approach is inspired by the Self-training with
Noisy Student framework [41], but differs in two key as-
pects. First, unlike the original method which uses a larger
model for iterative self-training, they retained the same ar-
chitecture (i.e., Swin-B). Additionally, since their goal was
to assess visual quality, they avoided introducing noise to
the input images for self-training, as it might degrade their
perceptual fidelity. Second, they further distilled the en-
hanced teacher into a lightweight student model to enable
efficient image quality assessment. More details can be
found in the challenge paper [37].
Training details. They implemented their framework us-
ing PyTorch 2.4 [30]. For the two-round teacher training,
they used the AdamW [25] optimizer with a learning rate
of 1×10
−4
, a weight decay of 1×10
−6
, and a learning rate
decay factor of 0.1 every 10 epochs. The model was trained
剩余14页未读,继续阅读
资源评论


码流怪侠

- 粉丝: 4w+
上传资源 快速赚钱
我的内容管理 展开
我的资源 快来上传第一个资源
我的收益
登录查看自己的收益我的积分 登录查看自己的积分
我的C币 登录后查看C币余额
我的收藏
我的下载
下载帮助


最新资源
- 无线传感器网络安全技术研究分析方向总结WirelessSensorNetworksSecurity.doc
- 【精品课件】算法的概念和表示方法.pptx
- 用MATLAB写程序对机械振动信号进行分析.doc
- 二轮复习精研重难点(二)基因工程中限制酶的选取与基因表达载体的构建学案.doc
- 互联网证券买卖协议范本最新.doc
- 代理软件销售协议.docx
- 基于C#的校园商品销售系统设计.pdf
- 软件工程案例教学的设计探讨论文.doc
- 网站运营方案.docx
- 企业产品年收入统计excel模板.xlsx
- 宇龙数控加工仿真软件数控加工仿真系统在数控教学和鉴定中的应用研究.doc
- 网络自纠自查报告博客.docx
- 可重构网络设备系统产业化项目谋划建议书.doc
- 小区服务小程序-小区服务小程序系统-小区服务小程序系统源码-小区服务小程序管理系统-小区服务小程序管理系统java代码-小区服务小程序系统设计与实现-基于ssm的小区服务小程序系统-基于Web的小区服
- cad实习报告范文2篇.doc
- 电动运输小车PLC控制系统实施方案河南工业大学.doc
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈



安全验证
文档复制为VIP权益,开通VIP直接复制
