基于图像级监督和自训练的跨模态肿瘤分割转换器模型|文献速递--Transformer架构在医学影像分析中的应用-CSDN博客

本文链接：https://blog.csdn.net/weixin_38594676/article/details/142215663

Title

题目

Image-level supervision and self-training for transformer-basedcross-modality tumor segmentation

基于图像级监督和自训练的跨模态肿瘤分割转换器模型。

文献速递介绍

深度学习在各种医学图像分析应用中展现了出色的性能和潜力（Chen等，2022）。尤其是在医学图像分割中，深度学习取得了相当于专家手动注释的准确度（Minaee等，2021）。然而，这些突破受到模型在面对未知领域数据时性能下降问题的制约（Torralba和Efros，2011）。这一问题在医学成像中尤为重要，因为分布转换非常常见。在所有领域对数据进行标注既低效又难以实现，特别是在图像分割中，专家级像素级标签的生成代价高昂且难以获得（Prevedello等，2019）。因此，构建能够在没有额外标注的情况下跨领域良好泛化的模型是一个亟需解决的挑战。

特别地，跨模态泛化是减少数据依赖性和扩大深度神经网络可用性的关键贡献。这类模型的应用非常广泛，因为一种成像模态缺少带注释的训练样本是很常见的。例如，增强型T1加权（T1ce）MR图像的获取是前庭神经鞘瘤（VS）检测中最常用的协议。为了避免肿瘤无限制增长导致不可逆的听力丧失，准确诊断和描绘VS至关重要。然而，为了减少T1ce成像的扫描时间并减轻使用钆对比剂的风险，高分辨率T2加权（hrT2）图像最近在临床工作流程中越来越受欢迎（Dang等，2020）。因此，现有的带注释的T1ce数据库可以用来解决在hrT2图像上进行VS分割时缺乏训练数据的问题。

Abatract

摘要

Deep neural networks are commonly used for automated medical image segmentation, but models willfrequently struggle to generalize well across different imaging modalities. This issue is particularly problematicdue to the limited availability of annotated data, both in the target as well as the source modality, makingit difficult to deploy these models on a larger scale. To overcome these challenges, we propose a new semisupervised training strategy called MoDATTS. Our approach is designed for accurate cross-modality 3D tumorsegmentation on unpaired bi-modal datasets. An image-to-image translation strategy between modalities is usedto produce synthetic but annotated images and labels in the desired modality and improve generalization to theunannotated target modality. We also use powerful vision transformer architectures for both image translation(TransUNet) and segmentation (Medformer) tasks and introduce an iterative self-training procedure in the latertask to further close the domain gap between modalities, thus also training on unlabeled images in the targetmodality. MoDATTS additionally allows the possibility to exploit image-level labels with a semi-supervisedobjective that encourages the model to disentangle tumors from the background. This semi-supervisedmethodology helps in particular to maintain downstream segmentation performance when pixel-level labelscarcity is also present in the source modality dataset, or when the source dataset contains healthy controls.The proposed model achieves superior performance compared to other methods from participating teams inthe CrossMoDA 2022 vestibular schwannoma (VS) segmentation challenge, as evidenced by its reported topDice score of 0.87± 0.04 for the VS segmentation. MoDATTS also yields consistent improvements in Dice scoresover baselines on a cross-modality adult brain gliomas segmentation task composed of four different contrastsfrom the BraTS 2020 challenge dataset, where 95% of a target supervised model performance is reached whenno target modality annotations are available. We report that 99% and 100% of this maximum performancecan be attained if 20% and 50% of the target data is additionally annotated, which further demonstrates thatMoDATTS can be leveraged to reduce the annotation burden.

深度神经网络通常用于自动医学图像分割，但这些模型在不同的成像模态之间往往难以很好地泛化。这一问题尤其严重，因为在目标和源模态中注释数据的可用性有限，使得在更大范围内部署这些模型变得困难。为了解决这些挑战，我们提出了一种新的半监督训练策略，称为MoDATTS。我们的方法旨在对未配对的双模态数据集进行精确的跨模态3D肿瘤分割。模态之间的图像到图像翻译策略用于在所需模态中生成合成但带有注释的图像和标签，并提高对未注释目标模态的泛化能力。

我们还使用了强大的视觉转换器架构来执行图像翻译（TransUNet）和分割（Medformer）任务，并在后续的任务中引入了迭代的自训练过程，以进一步缩小模态之间的领域差距，从而在目标模态中的未标注图像上进行训练。MoDATTS还允许利用具有半监督目标的图像级标签，以鼓励模型将肿瘤与背景分离。这种半监督方法特别有助于在源模态数据集中存在像素级标签稀缺，或者源数据集中包含健康对照的情况下保持下游分割性能。

所提出的模型在CrossMoDA 2022听神经瘤（VS）分割挑战赛中，相较于其他参赛团队的方法，表现出色，其VS分割的Dice评分为0.87±0.04。此外，MoDATTS在跨模态成人脑胶质瘤分割任务中也表现出相对于基线的Dice评分的一致改进，该任务包含来自BraTS 2020挑战赛数据集的四种不同对比度的图像。当没有目标模态注释数据时，MoDATTS达到目标监督模型性能的95%。我们报告，当额外注释20%和50%的目标数据时，可以分别达到99%和100%的最大性能，这进一步表明MoDATTS可以用于减少注释负担。

Method

方法

Let us consider the case where we want to segment images 𝑋 insome target modality 𝑇 and obtaining ground truth segmentation labels𝑌𝑇 is too costly due to resource, time or other constrai