Assessment of a Grad-CAM interpretable deep learning model for HAPE diagnosis: performance and pitfalls in severity stratification from chest radiographs

Yang, Ya; Yu, Hongmei; Xiang, Qijie; Wu, Jie; Li, Jianhao; Du, Feizhou; Yang, Yonglin; Wang, Peng

doi:10.1186/s12911-025-03256-5

Assessment of a Grad-CAM interpretable deep learning model for HAPE diagnosis: performance and pitfalls in severity stratification from chest radiographs

Research
Open access
Published: 30 October 2025

Volume 25, article number 402, (2025)
Cite this article

You have full access to this open access article

Download PDF

BMC Medical Informatics and Decision Making Aims and scope Submit manuscript

Assessment of a Grad-CAM interpretable deep learning model for HAPE diagnosis: performance and pitfalls in severity stratification from chest radiographs

Download PDF

Ya Yang¹,
Hongmei Yu¹,
Qijie Xiang²,
Jie Wu¹,
Jianhao Li¹,
Feizhou Du¹,
Yonglin Yang³ &
…
Peng Wang¹

409 Accesses
1 Altmetric
Explore all metrics

Abstract

Objectives

To investigate the feasibility of a deep learning model, using a transfer learning approach, for recognizing high-altitude pulmonary edema (HAPE) on chest X-ray images and exploring its capability for assessing severity.

Study design

Retrospective study.

Methods

This retrospective study utilized a multi-source dataset. The pretraining set was derived from the ARXIV_V5_CHESTXRAY database (3,923 images, including 2,303 with edema labels). The primary HAPE-specific training set comprised radiographs from the 950th Hospital of the Chinese People’s Liberation Army (1,003 HAPE cases and 702 normal controls; 2007–2023). An external validation set was constructed from recent cases (Jan-Dec 2023) from two hospitals (679 HAPE cases and 436 normal controls), with strict patient separation. We implemented a multi-stage pipeline: (1) A DeepLabV3_ResNet-50 model was trained for lung segmentation on a subset of the pretraining set; (2) MobileNet_V2 and VGG19 architectures underwent pretraining for general pulmonary edema severity grading on the ARXIV_V5_CHESTXRAY dataset; (3) These models were then fine-tuned on the HAPE-specific training set.

Results

The segmentation model achieved a Dice coefficient of 99.03%. The binary classification model (VGG19) for edema detection achieved a validation AUC of 0.950. The multi-class models (MobileNet_V2) achieved macroaverage AUCs of 0.92 (3-class) and 0.89 (4-class). The model demonstrated high performance in distinguishing normal (class 0) and severe edema (class 3) (sensitivities: 0.91, 0.88). However, performance was critically low for intermediate grades (classes 1 and 2; sensitivities: 0.16, 0.37).

Conclusions

Transfer learning from general to HAPE-specific edema data produced a model that accurately segments lungs and differentiates severe HAPE from normal cases with high performance. However, its failure to reliably identify intermediate grades underscores the challenges of domain shift and fine-grained radiographic assessment. This work highlights both the promise and pitfalls of using heterogeneous datasets for rare disease diagnosis.

View this article's peer review reports

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

HAPE is a critical condition caused by hypoxia at high altitudes and is characterized by symptoms such as dry cough, dyspnea, pulmonary crackles, and exercise intolerance. The condition can progress rapidly, and studies report that the mortality rate among untreated patients may reach 50% [1, 2]. This is one of the most common causes of death related to altitude sickness. Common complications of HAPE include high-altitude cerebral edema, pulmonary embolism, pneumonia, and cerebral infarction [3,4,5,6,7]. As socioeconomic development and tourism thrive in high-altitude regions, an increasing number of individuals from low-altitude areas rapidly ascend to altitudes above 2,500 m via air and rail transport. A minority of these individuals may experience HAPE within hours or days after arriving at high altitude due to maladaptation [8, 9]. Timely diagnosis and quantification of pulmonary edema are priorities during treatment [10]. The vast and sparsely populated high-altitude regions, coupled with insufficient medical facilities and infrastructure, limit the widespread use of large imaging equipment such as CT scanners. Consequently, chest X-rays often serve as an accessible and crucial diagnostic tool for detecting pulmonary edema [11,12,13,14,15,16].

Deep learning based on big data is a rapidly advancing technology. Its integration with clinical practices has the potential to create a unified framework for clinical decision support, which could profoundly transform the field of precision medicine [17, 18]. Technically, X-ray attenuation is proportional to the severity of pulmonary edema [19,20,21,22]. This study employs deep learning methodologies and X-ray technology to develop a model capable of early detection and grading of pulmonary edema through an in-depth analysis of cardiopulmonary imaging morphology and X-ray attenuation. The workflow as showed in Fig. 1.

Acquisition of the chest radiographs, segmentation and classification

This retrospective study was approved by the Medical Ethics Committee of the General Hospital of Western Theater Command (Approval No: 2024EC7-ky037), which waived the requirement for written informed consent in accordance with national regulations for retrospective analyses of anonymized imaging data. All methods adhered to the Declaration of Helsinki and relevant institutional guidelines.

The hospital images were acquired via two Siemens YSIO MAX machines and one GE Definium 6000 X-ray machine (120 kV, automatic mAs, images extracted in PNG format). We randomly selected 1,000 images and corresponding labels from the ARXIV_V5_CHESTXRAY dataset [23], which includes a total of 15 categories: (1) Atelectasis; (2) Cardiomegaly; (3) Effusion; (4) Infiltration; (5) Mass; (6) Nodule; (7) Pneumonia; (8) Pneumothorax; (9) Consolidation; (10) Edema; (11) Emphysema; (12) Fibrosis; (13) Pleural Thickening; (14) Hernia; and (15) No finding. Additionally, we labeled 2,923 chest X-rays collected from January to December 2023, referencing the 15 categories in the ARXIV_V5_CHESTXRAY dataset, resulting in a total of 3,923 images, which included 2,303 cases with edema labels, designated the pretraining dataset (pretrain_dataset). The training dataset for this study comprised chest radiographs collected from the 950th Hospital of the Chinese People’s Liberation Army. This dataset included 1,003 radiographs from patients with HAPE and 702 from normal controls, acquired between January 2007 and December 2023. For external validation, a distinct dataset was constructed from more recent cases, encompassing 679 HAPE and 436 normal chest X-rays obtained exclusively between January and December 2023 from 950th Hospital and The General Hospital of Western Theater Command. Critically, no patient was represented in more than one dataset, ensuring the integrity of the validation process.

Inclusion criteria:

1.
Adults aged 18–94 years.
2.
Complete clinical laboratory and imaging data were obtained.
3.
HAPE cases must meet both clinical and imaging criteria according to the [24].

Clinical criteria included: acute onset of at least two of the following symptoms after rapid ascent to altitude >2500 m: dyspnea at rest, cough, cyanosis, or frothy sputum; accompanied by at least two of the following signs: tachypnea, tachycardia, central cyanosis, or pulmonary rales. Imaging criteria required the presence of radiographic findings consistent with pulmonary edema (e.g., patchy or diffuse alveolar infiltrates, Kerley B lines) that were not fully explained by cardiac failure or other pulmonary pathologies.

Exclusion criteria

Any subjects meeting the following conditions were excluded from this study.

1.
Poor image quality.
2.
Clinical or imaging data were missing.

From the pretrain_dataset, we randomly selected 1,000 images to form the segmentation training dataset (seg_dataset). Three radiology experts (Yu, Jiang, and Du, each with more than ten years of experience in radiology) manually segmented the lung regions in the seg_dataset, achieving an intraclass correlation coefficient (ICC) of 0.93. The images in the pretrain_dataset labeled with edema were assigned a label of 1, whereas the others were assigned a label of 0, creating an edema pretraining dataset for the identification of pulmonary edema. Furthermore, the three radiologists graded the images in the training_dataset and val_dataset on the basis of clinical outcomes, imaging reports, and comprehensive assessments, categorizing the severity of pulmonary edema into four levels: class 0—no edema; class 1—thickened vascular markings with blurred margins; class 2—interstitial edema; and class 3—alveolar edema. Class 0 represents a normal chest X-ray, whereas class 3 represents severe pulmonary edema with diffuse blurred infiltrates, potentially accompanied by pleural effusion and Kerley B lines.In cases of initial disagreement among the radiologists regarding the grading of the same image, a voting process was employed to reach a consensus. The inter-rater reliability for the four-class severity grading, assessed using Fleiss’ Kappa before adjudication, was 0.75, indicating substantial agreement.

Model development

Given the scarcity of annotated HAPE-specific datasets, we employed a transfer learning strategy. This involved pre-training models on a larger, more general dataset containing various pulmonary conditions to learn foundational radiographic features, followed by fine-tuning on our targeted HAPE dataset to adapt these features to the specific task of interest.

For image preprocessing, all the images were resized to 1024 × 1024 pixels, and the pixel values were normalized. Training and testing were conducted on an Nvidia RTX 3070. We report the accuracy and microaveraged area under the receiver operating characteristic curve in a “one vs. rest” format (AUROC), along with confidence intervals (CIs) as outcome measures.

Deep learning model for chest radiograph segmentation

We developed a convolutional neural network, DeepLabV3_ResNet50, for the image segmentation task on the basis of the seg_dataset. The model was trained for 50 epochs with a learning rate of 0.001, a batch size of 4, a validation batch size of 1, and a stochastic gradient descent (SGD) optimizer with a momentum of 0.9 and a weight decay of 0.00001. The active learning strategy is adopted. Manual fine-tuning is performed on samples with inaccurate model predictions to achieve iterative improvement of model performance.

Deep learning model for HAPE grading

Using masks generated by an automated segmentation model, we cropped the corresponding lung regions and constructed a binary classification deep learning model based on the VGG19 convolutional neural network to determine whether the images exhibited edema. The experiment utilized the pretrain_dataset with the following hyperparameter settings: a batch size of 32 or 50 epochs, an initial learning rate of 0.01, and the SGD optimizer. For normalization, we employed the ImageNet standard.

The MobileNet_V2 models were first pre-trained on the pretrain_dataset for multi-class edema grading. Subsequently, these pre-trained models were fine-tuned on the HAPE-specific train_dataset. The validation set (val_dataset) was used to test the predictive performance. The optimal model was selected on the basis of the area under the ROC curve (AUC). MobileNet_V2 was utilized to grade edema on a scale of 0–2(Classes 1 and 2 are merged into a single class) and 0–3, which is consistent with the parameters used for edema recognition.

To address class imbalance, we applied weighted random sampling during training. Data augmentation techniques included random horizontal flipping, rotation (± 10°), and brightness/contrast adjustment. Training employed early stopping with a patience of 10 epochs based on validation loss. All datasets were split at the patient level to prevent data leakage. Hyperparameters (e.g., learning rate, batch size) were optimized on a held-out validation set separate from the final test set.

Results

Patient and chest radiograph characteristics

The average age of the patients ranged from 20 to 94 years, as shown in Table 1. There were slightly fewer females than males.

Table 1 Demographic characteristics of patients: age and gender

Full size table

Segmentation

The segmentation model deeplabv3_resnet50 was trained on the seg_dataset over 50 epochs. The results on its training set indicated a global accuracy of 99.21%, a mean intersection over union (mIoU) of 98.09, and a mean Dice coefficient of 99.03. Consequently, deeplabv3_resnet50 was adopted as the segmentation model. The training results are illustrated in Fig. 2.

ROC curve analysis

The ROC curves of the HAPE detection model on the val_dataset are shown in Fig. 3. In the binary classification model for edema detection, VGG19 achieved a receiver operating characteristic area under the curve (ROC AUC) of 0.979 on the train_dataset and 0.950 on the val_dataset, as expected, demonstrating excellent performance with an accuracy of 89.42%. The training AUC was 0.979 (95% CI: 0.975–0.982), and the validation AUC was 0.950 (95% CI: 0.939–0.960).

The ROC curves for the edema grading model on the validation dataset are illustrated in Fig. 4. In the three-class classification task, MobileNet_V2 demonstrated promising outcomes with an accuracy of 87.22%, with a macroaverage ROC curve AUC of 0.92. The ROC curve AUCs for each class were as follows: class 0 (0.96), class 1 (0.84), class 2 (0.86). Sensitivity, specificity, precision for each class were: sensitivity [0.96, 0.40, 0.80]; specificity [0.89, 0.40, 0.92]. The macroaverage results included precision (0.72), and F1 score (0.72).

For the four-class classification task, MobileNet_V2 exhibited favorable performance, achieving an accuracy of 84.54%. The macroaverage ROC curve exhibited an AUC of 0.89, with individual AUCs for class 0 (0.95), class 1 (0.79), class 2 (0.86) and class 3 (0.96). Sensitivity, specificity, precision, and F1 scores for each class were as follows: sensitivity [0.91, 0.16, 0.37, 0.88], These results indicate that the model’s capability is effectively binary, performing well only in distinguishing class 0 (normal) from class 3 (severe edema), while showing poor sensitivity for the intermediate classes 1 and 2. specificity [0.88, 0.99, 0.97, 0.90]; precision [0.92, 0.36, 0.45, 0.80]; and F1 score [0.91, 0.22, 0.40, 0.84]. The macroaverage results comprised sensitivity (0.58), specificity (0.93), precision (0.63), and F1 score (0.59).

Confusion matrix analysis

A confusion matrix was generated for the Edema grading model on the validation dataset (Fig. 5). Each cell represents the alignment between the true severity level from the consensus score and the severity level predicted by the image model. The proportion of the predicted severity level matching the actual severity level is reported in each cell. Predictions for classes 0 and 3 were more accurate compared to classes 1 and 2. Even when merging classes 1 and 2 into a single category, the performance of the three-category model remained inferior to that of the two-category model.

Grad-CAM (Gradient-weighted class activation mapping)

Grad-CAM is a visualization technique for interpreting predictions of deep learning models. It generates heatmaps by computing gradients of a target class with respect to feature maps and multiplying them, highlighting the image regions that the model focuses on when making predictions.

Discussion

HAPE is a type of altitude-related noncardiogenic pulmonary edema caused primarily by acute exposure to high altitudes, which leads to excessive hypoxic pulmonary artery pressure, increased pulmonary vascular permeability, and impaired pulmonary fluid clearance [1, 3]. Clinically, it is characterized by symptoms such as dyspnea, cough, pink or white frothy sputum, and cyanosis, with auscultation revealing moist rales. In China, over 12 million people live and work at elevations above 2500 m, with more than 7.4 million residing at altitudes exceeding 3000 m. In 2023, Tibet received more than 55 million tourists. Therefore, in environments far from advanced medical facilities, the rapid use of portable X-ray imaging and offline-deployed artificial intelligence models to increase the early diagnostic efficiency and accuracy of pulmonary edema represents a significant exploration that can benefit the physical and mental health of people in sparsely populated high-altitude regions [25,26,27].

In 2018, Warren et al. [28] proposed the Radiographic Assessment of Lung Edema (RALE), which quantifies the severity of pulmonary lesions on chest X-rays and demonstrates a correlation between the RALE score, lung resection weight, severity of hypoxemia, and prognosis. Rajaraman et al. confirmed the ability of visualization and interpretation of convolutional neural network predictions to detect pneumonia in pediatric chest radiographs [29, 30]. Xiaole Fan developed a COVID-19 CT image recognition algorithm based on transformers and CNNs [31]. Guangyu Wang investigated a deep learning approach for diagnosing and differentiating viral, nonviral, and COVID-19 pneumonia via chest X-ray images [32]. Dominik Schulz’s deep learning model accurately predicts and quantifies pulmonary edema in chest X-rays [27].

HAPE differs fundamentally from cardiogenic pulmonary edema, ARDS, and COVID-19 pneumonia in terms of its pathological causes and processes. However, technically, the attenuation of X-rays is proportional to the severity of pulmonary edema. The aforementioned studies have demonstrated that training deep learning models can identify and differentiate exudative lesions on X-rays, and theoretically, this approach could also be applied to HAPE. To the best of our knowledge, this is currently the only study utilizing a model trained on various types of pulmonary edema images to identify HAPE. This approach employs a transfer learning approach to address the issue of insufficient image data for HAPE.

This experiment employed DeepLabV3 with the ResNet50 model for image segmentation on the train_dataset. The model automatically generates masks from the input images, achieving a global accuracy of 99.21% on the on its training set, indicating its strong performance. Specifically, the model also demonstrated excellent intersection over union (IoU) and Dice coefficients across various categories, measuring 98.09% and 99.03%, respectively. These results suggest that the selected model architecture and parameter settings effectively capture features within the images, resulting in high-precision segmentation.

We constructed a binary classification deep learning model using the VGG19 convolutional neural network to categorize images on the basis of the presence of edema. The model achieved an accuracy of 89.42%, with an area under the curve (AUC) of 0.979 for the training_dataset and a 95% confidence interval of [0.975, 0.982]. The val_dataset achieved an AUC of 0.950, with a 95% confidence interval of [0.939, 0.960]. Overall, the model demonstrated high performance in classification tasks, accurately predicted sample categories, and effectively distinguished positive from negative samples. The narrow confidence interval for the training_dataset AUC indicates high performance consistency, whereas the relatively small confidence interval for the val_dataset AUC further supports the model’s stability.

Timely and accurate quantification of pulmonary edema in chest X-ray images is crucial for the management of acute mountain sickness. A four-class deep learning model was developed on the basis of train_dataset and MobileNet_V2. Through model training and evaluation, we observed an overall accuracy of 84.54%, indicating the model’s applicability and effectiveness in multiclass classification. The macroaverage ROC AUC was 0.89, with a sensitivity of 0.58, specificity of 0.93, accuracy of 0.85, precision of 0.63, recall of 0.58, and F1 score of 0.59, suggesting strong discriminative ability among the classes. However, the discrepancies in the performance metrics reflect the model’s uneven ability to handle different categories.

Categories 0 and 3 showed robust performance with sensitivities of 0.91 and 0.88, respectively, underscoring the model’s accuracy in identifying these categories. In contrast, categories 1 and 2 displayed lower sensitivities of 0.16 and 0.37, suggesting areas for improvement. The limited number of images for grades 1 and 2 could be attributed to either case scarcity or the inherent complexity of manual classification in these grades. Merging classes 1 and 2 into a consolidated category did not significantly enhance the overall performance compared to the two-category model. Despite an accuracy of 87.22%, the three-category model’s ROC curves, with AUCs for class 0, 1 and 2 remained inferior to the two-category model. The discernment of intermediate classifications did not show substantial improvement compared to the four-class model. While the model demonstrated favorable performance based on AUC and accuracy metrics, there is room for improvement in sensitivity and precision to enhance classification accuracy further.

The most critical finding of our study is the model’s markedly reduced sensitivity for intermediate severity HAPE (classes 1 and 2). A model pre-trained on the broad and weakly-labeled concept of ‘edema’ from a general population struggles to master the specific radiographic nuances of early HAPE in a high-altitude cohort. This performance pattern, however, is not merely a failure but an important quantification of this domain shift challenge. It indicates that our model, in its current form, is not a reliable tool for grading early disease but rather serves as a proof-of-concept for distinguishing unequivocally normal from severe cases. This has significant implications for future AI research in rare diseases, emphasizing that fine-tuning alone may be insufficient to overcome large domain gaps.

Majkowska et al. utilized machine learning methods to automatically detect four abnormalities in X-ray images [33]. For the detection of airspace opacity, including pulmonary edema, the reported area under the receiver operating characteristic curve (AUROC) ranged from 0.91 to 0.94. They developed a model capable of detecting clinically relevant findings in chest X-rays at an expert level. Jarrel et al. employed deep learning techniques to diagnose congestive heart failure (CHF) via chest X-ray images. The authors used a BNP threshold of 100 ng/L as a biomarker for CHF, resulting in an AUROC of 0.82 [20]. Horng et al. not only diagnosed the presence of pulmonary edema but also quantified its severity via deep learning methods [34]. However, similar to the aforementioned studies, the efficiency of the deep learning models in recognizing and differentiating between Class 1 and Class 2 was relatively low. The highest recognition efficiency was achieved for Class 0 (no edema) and Class 3 (alveolar edema), which aligns with our findings. This may indicate a lack of sufficient sample data or training materials for these categories, thereby affecting the model’s learning performance.

The Grad-CAM analysis as delineated in Fig. 6, reveals three critical insights into the model’s decision-making paradigm: Anatomic Focus Specificity The model predominantly activates in the perihilar zones, anatomically corresponding to the pulmonary vasculature and cardiac silhouette. This spatial preference aligns with established radiographic biomarkers of pulmonary edema - notably, the peribronchial cuffing and Kerley B lines that radiologists prioritize during diagnostic evaluation [35]. Pathophysiological Correlation High-attention clusters colocalize with: Cardiomediastinal interface blurring, Butterfly-pattern alveolar infiltrates [36], these findings suggest the model’s capacity to capture interstitial fluid redistribution patterns characteristic of hemodynamic pulmonary edema. Clinical Interpretability Validation The heatmap-radiologist diagnosis showed some concordance in our multicenter validation cohort, suggesting that the model’s “visual search” strategy emulates expert diagnostic reasoning. Such interpretability metrics are crucial for implementing AI-CAD systems in clinical workflows per FDA’s SaMD guidelines [37].

Grad-CAM visualizations suggested that the model focused on clinically relevant regions, providing a preliminary level of interpretability and face validity for its predictions, which is a necessary step towards building trustworthy AI systems.

In this study, we developed a deep learning model that integrates image segmentation with the identification and grading of HAPE on the basis of subjective grading by radiologists. Our model exhibited excellent performance. We believe that our approach has several advantages. Typically, radiologists assess the severity of pulmonary edema through classification scoring, which requires experienced physicians. However, mountainous areas are often remote and distant from major cities, making access to large hospitals and experienced radiologists challenging.

Our work remains a preliminary proof-of-concept for binary edema detection and highlights the significant challenges in severity grading. It underscores that substantially more research and validation are required before any consideration of clinical deployment.

However, this study has several limitations. First, the domain shift between the weakly labeled pre-training dataset (general pulmonary edema) and the carefully adjudicated HAPE-specific fine-tuning dataset may have influenced feature learning, though fine-tuning was used to mitigate this effect. Second, the model showed reduced sensitivity in distinguishing intermediate severity grades (classes 1 and 2), which can be attributed to the inherent subtlety of radiographic findings in these categories and relatively lower sample sizes. Third, although lung segmentation performance was high, challenging anatomical variations—such as poor costophrenic angle visualization, subcutaneous emphysema, consolidation, or effusion—occasionally reduced segmentation accuracy. Future work will focus on advanced domain adaptation, expanding intermediate-class samples, and improving robustness to anatomical and pathological variations.

Furthermore, our segmentation model’s difficulty in handling pathologies like consolidation and effusion, traditionally considered limitations, can be reframed as a valuable insight. It reveals a systematic bias in models trained on healthy anatomies: they may inadvertently exclude the very pathological regions crucial for diagnosis. This suggests that for tasks like edema assessment, a pathology-aware segmentation model or an end-to-end network that jointly optimizes segmentation and classification might be necessary future directions, rather than the traditional segmented-then-classify pipeline we employed.

Conclusion

In conclusion, this feasibility study explored a transfer learning framework to address the acute data scarcity problem in HAPE diagnosis. Our results demonstrate that such an approach can facilitate the development of models with strong performance in binary tasks (e.g., normal vs. severe HAPE). However, the model’s inability to reliably grade intermediate severities provides a crucial cautionary tale about the limits of transferring knowledge from general to highly specific medical domains. The perceived limitations—domain shift, imperfect segmentation, and low intermediate-class sensitivity—are, in fact, the primary contributions of this work, as they map the uncharted territory and outline the specific challenges that must be overcome. Future efforts should focus on collecting larger, prospectively validated HAPE datasets, developing domain adaptation techniques explicitly designed for medical imaging, and creating integrated models that do not treat segmentation and diagnosis as separate problems. This study serves not as a presentation of a finished tool, but as a foundational investigation that defines the path forward for AI in high-altitude medicine.

Data availability

The de-identified imaging datasets generated during this study are available upon reasonable request from the corresponding author (Dr. Peng Wang, Email: [email protected]), subject to approval by the Ethics Committee and compliance with privacy regulations (e.g., China’s Personal Information Protection Law). Anonymized Grad-CAM heatmaps and model weights will be shared via a public repository upon publication. Public data are cited in Ref [23].

Abbreviations

HAPE:: High-Altitude Pulmonary Edema
Grad-CAM:: Gradient-weighted Class Activation Mapping
CT:: Computed Tomography
CNN:: Convolutional Neural Network
ROC:: Receiver Operating Characteristic
AUC:: Area Under the Curve
CI:: Confidence Interval
mIoU:: Mean Intersection over Union
DCA:: Decision Curve Analysis
MLP:: Multilayer Perceptron
SGD:: Stochastic Gradient Descent
BNP:: Brain Natriuretic Peptide (mentioned in comparative studies)
RALE:: Radiographic Assessment of Lung Edema
FDA:: U.S. Food and Drug Administration
SaMD:: Software as a Medical Device

References

Luks AM, Swenson ER, Bärtsch P. Acute high-altitude sickness. Eur Respir Rev. 2017;26:160096. https://doi.org/10.1183/16000617.0096-2016.
Article PubMed PubMed Central Google Scholar
Woods P, Alcock J. High-altitude pulmonary edema. Evol Med Public Health. 2021;9:118–9. https://doi.org/10.1093/emph/eoaa052.
Article PubMed PubMed Central Google Scholar
Hultgren H, Li Y, Zhang Y, Zhang Y, Zubieta-Calleja GR, Zubieta-DeUrioste N. High altitude pulmonary Edema, high altitude cerebral Edema, and acute mountain sickness: an enhanced opinion from the high Andes – La Paz, Bolivia 3,500 m. Rev Environ Health. 2023;38:327–38. https://doi.org/10.1515/reveh-2021-0172.
Article Google Scholar
Joyce KE, Lucas SJE, Imray CHE, Balanos GM, Wright AD. Advances in the available non-biological pharmacotherapy prevention and treatment of acute mountain sickness and high altitude cerebral and pulmonary oedema. Expert Opin Pharmacother. 2018;19:1891–902. https://doi.org/10.1080/14656566.2018.1528228.
Article CAS PubMed Google Scholar
Wu J, Han X, Ke H, Wang L, Wang K, Zhang J, et al. Pulmonary embolism at extreme high altitude: A study of seven cases. High Alt Med Biol. 2022;23:209–14. https://doi.org/10.1089/ham.2021.0109.
Article CAS PubMed Google Scholar
Schulz D, Rasch S, Heilmaier M, Abbassi R, Poszler A, Ulrich J, et al. Research advances in pathogenesis and prophylactic measures of acute high altitude illness. Respir Med. 2018;145:145–52. https://doi.org/10.1016/j.rmed.2018.11.004.
Article Google Scholar
Schulz D, Rasch S, Heilmaier M, Abbassi R, Poszler A, Ulrich J, et al. High altitude pulmonary edema: hemodynamic aspects. Int J Sports Med. 1997;18:20–5. https://doi.org/10.1055/s-2007-972589.
Article Google Scholar
Wang P, Zuo Z, Wu J, Wang J, Jiang R, Du F. Short-term changes in chest CT images among individuals at low altitude after entering high-altitude environments. Front Public Health. 2024;12. https://doi.org/10.3389/fpubh.2024.1392696.
Storz JF, Scott GR. Life ascending: mechanism and process in physiological adaptation to High-Altitude hypoxia. Crit Care. 2020;294:503–26. https://doi.org/10.1146/annurev-ecolsys-110218-025014.
Article Google Scholar
Pennardt A. High-Altitude pulmonary edema. Curr Sports Med Rep. 2013;12:115–9. https://doi.org/10.1249/JSR.0b013e318287713b.
Article PubMed Google Scholar
Barile M. Pulmonary edema: A pictorial review of imaging manifestations and current Understanding of mechanisms of disease. Eur J Radiol Open. 2020;7:100274. https://doi.org/10.1016/j.ejro.2020.100274.
Article PubMed PubMed Central Google Scholar
Lindow T, Quadrelli S, Ugander M. Noninvasive imaging methods for quantification of pulmonary edema and congestion. JACC: Cardiovasc Imaging. 2023;16:1469–84. https://doi.org/10.1016/j.jcmg.2023.06.023.
Article PubMed Google Scholar
Wang G, Liu X, Shen J, Wang C, Li Z, Ye L, et al. High-altitude pulmonary edema: findings at high-altitude chest radiography and physical examination. Radiology. 1989;170:661–6. https://doi.org/10.1148/radiology.170.3.2916019.
Article Google Scholar
Kim S, Rim B, Choi S, Lee A, Min S, Hong M. Deep learning in Multi-Class lung diseases’ classification on chest X-ray images. Diagnostics. 2022;12:915. https://doi.org/10.3390/diagnostics12040915.
Article CAS PubMed PubMed Central Google Scholar
Lo C-H, Kao W-F, How C-K, Li L-H, Chien D-K, Chiu Y-H. Chest pain in six Taiwanese teenagers at high altitudes. J Travel Med. 2022;29:taac050. https://doi.org/10.1093/jtm/taac050.
Article PubMed Google Scholar
Bennani S, Regnard N-E, Ventre J, Lassalle L, Nguyen T, Ducarouge A, et al. Using AI to improve radiologist performance in detection of abnormalities on chest radiographs. Radiology. 2023;309:e230860. https://doi.org/10.1148/radiol.230860.
Article PubMed Google Scholar
Parekh VS, Jacobs MA. Deep learning and radiomics in precision medicine. Nat Biomed Eng. 2019;4:59–72. https://doi.org/10.1080/23808993.2019.1585805.
Article Google Scholar
Langlotz CP. The future of AI and informatics in radiology: 10 predictions. Radiology. 2023;309:e231114. https://doi.org/10.1148/radiol.231114.
Article PubMed Google Scholar
Jabaudon M, Audard J, Pereira B, Jaber S, Lefrant J-Y, Blondonnet R, et al. Early changes over time in the radiographic assessment of lung edema score are associated with survival in ARDS. Chest. 2020;158:2394–403. https://doi.org/10.1016/j.chest.2020.06.070.
Article PubMed PubMed Central Google Scholar
Seah JCY, Tang JSN, Kitchen A, Gaillard F, Dixon AF. Chest radiographs in congestive heart failure: visualizing neural network learning. Radiology. 2019;290:514–22. https://doi.org/10.1148/radiol.2018180887.
Article PubMed Google Scholar
Sun W, Wu D, Luo Y, Liu L, Zhang H, Wu S, et al. A fully deep learning paradigm for pneumoconiosis staging on chest radiographs. IEEE J Biomed Health Inf. 2022;26:5154–64. https://doi.org/10.1109/JBHI.2022.3190923.
Article Google Scholar
Zhuang Y, Rahman MF, Wen Y, Pokojovy M, McCaffrey P, Vo A, et al. An interpretable multi-task system for clinically applicable COVID-19 diagnosis using CXR. XST. 2022;30:847–62. https://doi.org/10.3233/XST-221151.
Article Google Scholar
Wang X, Peng Y, Lu L, Lu Z, Bagheri M, Summers RM. ChestX-Ray8: hospital-scale chest X-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017. https://doi.org/10.1109/CVPR.2017.369
Luks AM, McIntosh SE, Grissom CK, Auerbach PS, Rodway GW, Schoene RB, et al. Wilderness medical society consensus guidelines for the prevention and treatment of acute altitude illness. Wild Environ Med. 2011;21:146–55. https://doi.org/10.1016/j.wem.2010.03.002.
Article Google Scholar
Liebman PR, Philips E, Weisel R, Ali J, Hechtman HB. Diagnostic value of the portable chest x-ray technic in pulmonary edema. Am J Surg. 1978;135:604–6. https://doi.org/10.1016/0002-9610(78)90045-4.
Article CAS PubMed Google Scholar
Vock P, Fretz C, Franciolli M, Bärtsch P. High-altitude pulmonary edema: findings at high-altitude chest radiography and physical examination. Radiology. 1989;170:661–6. https://doi.org/10.1148/radiology.170.3.2916019.
Article CAS PubMed Google Scholar
Schulz D, Rasch S, Heilmaier M, Abbassi R, Poszler A, Ulrich J, et al. A deep learning model enables accurate prediction and quantification of pulmonary edema from chest X-rays. Crit Care. 1978;27:201. https://doi.org/10.1186/s13054-023-04426-5.
Article Google Scholar
Warren MA, Zhao Z, Koyama T, Bastarache JA, Shaver CM, Semler MW, et al. Severity scoring of lung oedema on the chest radiograph is associated with clinical outcomes in ARDS. Thorax. 2018;73:840–6. https://doi.org/10.1136/thoraxjnl-2017-211280.
Article PubMed Google Scholar
Rajaraman S, Candemir S, Kim I, Thoma G, Antani S. Visualization and interpretation of convolutional neural network predictions in detecting pneumonia in pediatric chest radiographs. Appl Sci. 2018;8:1715. https://doi.org/10.3390/app8101715.
Article PubMed Google Scholar
Rajaraman S, Thoma G, Antani S, Candemir S. Visualizing and explaining deep learning predictions for pneumonia detection in pediatric chest radiographs. Medical imaging 2019: Computer-Aided diagnosis. San Diego, United States: SPIE; 2019. p. 27. https://doi.org/10.1117/12.2512752.
Chapter Google Scholar
Fan X, Feng X, Dong Y, Hou H. COVID-19 CT image recognition algorithm based on transformer and CNN. Displays. 2022;72:102150. https://doi.org/10.1016/j.displa.2022.102150.
Article CAS PubMed PubMed Central Google Scholar
Wang G, Liu X, Shen J, Wang C, Li Z, Ye L, et al. A deep-learning pipeline for the diagnosis and discrimination of viral, non-viral and COVID-19 pneumonia from chest X-ray images. Radiology. 2022;5:509–21. https://doi.org/10.1038/s41551-021-00704-1.
Article CAS Google Scholar
Majkowska A, Mittal S, Steiner DF, Reicher JJ, McKinney SM, Duggan GE, et al. Chest radiograph interpretation with deep learning models: assessment with Radiologist-adjudicated reference standards and Population-adjusted evaluation. Radiology. 2020;294:421–31. https://doi.org/10.1148/radiol.2019191293.
Article PubMed Google Scholar
Horng S, Liao R, Wang X, Dalal S, Golland P, Berkowitz SJ. Deep learning to quantify pulmonary edema in chest radiographs. Radiology: Artif Intell. 2021;3:e190228. https://doi.org/10.1148/ryai.2021190228.
Article Google Scholar
Filippini DFL, Hagens LA, Heijnen NFL, Zimatore C, Atmowihardjo LN, Schnabel RM, et al. Prognostic value of the radiographic assessment of lung edema score in mechanically ventilated ICU patients. JCM. 2023;12:1252. https://doi.org/10.3390/jcm12041252.
Article PubMed PubMed Central Google Scholar
Bos LDJ, Ware LB. Acute respiratory distress syndrome: causes, pathophysiology, and phenotypes. Lancet. 2022;400:1145–56. https://doi.org/10.1016/S0140-6736(22)01485-4.
Article PubMed Google Scholar
McNamara SL, Yi PH, Lotter W. The clinician-AI interface: intended use and explainability in FDA-cleared AI devices for medical image interpretation. Npj Digit Med. 2024;7:80. https://doi.org/10.1038/s41746-024-01080-1.
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank the radiologists for annotations and the hospitals for data support.

Funding

This study was supported by the Research and Development Program of The General Hospital of Western Theater Command (Grant No. 2024-YGJS-B01).

Author information

Authors and Affiliations

Department of Radiology, Chinese People’s Liberation Army The General Hospital of Western Theater Command, No. 270, Tianhui Road, Rongdu Avenue, Jinniu District, Chengdu, Sichuan Province, 610083, China
Ya Yang, Hongmei Yu, Jie Wu, Jianhao Li, Feizhou Du & Peng Wang
Department of Radiology, The 950th Army Hospital of the Chinese People’s Liberation Army, Yecheng County, Kashgar City, Xinjiang Province, 844900, China
Qijie Xiang
Department of High-Altitude Medicine, The 950th Army Hospital of the Chinese People’s Liberation Army, Yecheng County, Kashgar City, Xinjiang Province, 844900, China
Yonglin Yang

Authors

Ya Yang
View author publications
Search author on:PubMed Google Scholar
Hongmei Yu
View author publications
Search author on:PubMed Google Scholar
Qijie Xiang
View author publications
Search author on:PubMed Google Scholar
Jie Wu
View author publications
Search author on:PubMed Google Scholar
Jianhao Li
View author publications
Search author on:PubMed Google Scholar
Feizhou Du
View author publications
Search author on:PubMed Google Scholar
Yonglin Yang
View author publications
Search author on:PubMed Google Scholar
Peng Wang
View author publications
Search author on:PubMed Google Scholar

Contributions

(I) Conception and design: Ya Yang, Hongmei Yu (II) Administrative support: Peng Wang, Yonglin Yang (III) Provision of study materials or patients: Hongmei Yu, Qijie Xiang (IV) Collection and assembly of data: Ya Yang, Jie Wu, Jianhao Li, Feizhou Du (V) Data analysis and interpretation: Ya Yang, Peng Wang, Hongmei Yu (VI) Manuscript writing: Ya Yang, Hongmei Yu, Peng Wang (VII) Final approval of manuscript: All authors.

Corresponding author

Correspondence to Peng Wang.

Ethics declarations

Ethics approval and consent to participate

This retrospective study was approved by the Medical Ethics Committee of the General Hospital of Western Theater Command (Approval No: 2024EC7-ky037). The committee of the General Hospital of Western Theater Command waived the requirement for written informed consent to participate in this study, as per national regulations and institutional guidelines for retrospective analyses of anonymized imaging data. All methods were performed in accordance with the Declaration of Helsinki.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Yang, Y., Yu, H., Xiang, Q. et al. Assessment of a Grad-CAM interpretable deep learning model for HAPE diagnosis: performance and pitfalls in severity stratification from chest radiographs. BMC Med Inform Decis Mak 25, 402 (2025). https://doi.org/10.1186/s12911-025-03256-5

Download citation

Received: 30 July 2025
Accepted: 16 October 2025
Published: 30 October 2025
Version of record: 30 October 2025
DOI: https://doi.org/10.1186/s12911-025-03256-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Assessment of a Grad-CAM interpretable deep learning model for HAPE diagnosis: performance and pitfalls in severity stratification from chest radiographs

Abstract

Objectives

Study design

Methods

Results

Conclusions

Explore related subjects

Introduction

Acquisition of the chest radiographs, segmentation and classification

Exclusion criteria

Model development

Deep learning model for chest radiograph segmentation

Deep learning model for HAPE grading

Results

Patient and chest radiograph characteristics

Segmentation

ROC curve analysis

Confusion matrix analysis

Grad-CAM (Gradient-weighted class activation mapping)

Discussion

Conclusion

Data availability

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords