Introduction

The thymus is a key organ in T lymphocyte development and immune regulation, making its accurate segmentation and quantification crucial in medical imaging applications.T lymphocytes, which are vital for immune responses to both pathogens and tumors, are generated within the thymus, and their maturation is tightly associated with immune competence [1, 2]. As individuals age, the thymus undergoes involution, resulting in reduced T cell output and increased susceptibility to immune dysfunction, autoimmune diseases, and infections [3,4,5]. Therefore, the accurate assessment of thymus structure and function is of great clinical significance for the diagnosis and treatment of immune-related diseases.

Traditional imaging modalities such as CT and MRI provide anatomical information about the thymus; however, their interpretation is often subjective and operator-dependent. Manual segmentation and measurement are time-consuming and prone to variability because they rely heavily on the operator’s expertise. Additionally, the complexity of anatomical structures and unclear boundaries further complicate the task of achieving consistent and precise thymic evaluations. These limitations emphasize the need for efficient and reliable automated evaluation methods.

Previous studies have explored various deep learning approaches for organ segmentation, including nnU-Net-based models [6,7,8,9,10], which have been proven capable of accurately segmenting of highly variable organs, such as automatically segmenting the pancreas or breast in CT images or MR images. Similar frameworks have been widely applied to segmentation tasks in pathology [11, 12]. Research has shown that automatic neural network-based segmentation methods can effectively quantify thymic volume and capture thymic changes such as involution and hyperplasia. Previous approaches for thymic segmentation typically limited downstream analysis to simple metrics such as organ volume or the diameter of the largest inscribed circle. To our knowledge, our study is the first to propose a fully automated measurement pipeline that systematically extracts seven key thymic features from CT images: attenuation, anteroposterior (AP) diameter, transverse (TR) diameter, left lobe length and thickness, and right lobe length and thickness. These multi-dimensional anatomical metrics go beyond existing volumetric estimates and offer a richer representation of thymic morphology. Such detailed quantitative profiling may provide the foundation for future studies that link structural features to immunological function or disease progression.

While multi-stage segmentation frameworks have been explored for other organs, few studies have systematically applied or benchmarked a two-stage nnU-Net-based design for the thymus, an organ characterized by its anatomical variability and size. In this work, we propose a two-stage segmentation framework named “Thy-uNET,” which leverages coarse-to-fine segmentation to enhance localization and boundary delineation of the thymic region. To validate its efficacy, we implemented and directly compared four competitive baseline models—3D U-Net, UNetR, TransUNet3D, and standard nnU-Net—under identical training conditions. We found that our proposed model achieved consistently higher Dice and IoU scores, demonstrating its practical advantage in segmenting this underexplored and challenging anatomical structure.Furthermore, utilizing the results of the thymus segmentation, we have devised an automated algorithm for measurement that not only calculates the thymic volume but also identifies and extracts a range of crucial thymic indices, including thymic region CT attenuation, anteroposterior (AP) dimension, transverse (TR) diameter, left (LT) length, LT thickness, right (RT) length, and RT thickness. Figure 1 illustrates the overall workflow of our method, which begins with chest CT images as input for automatic thymus segmentation using the Thy-uNET model. Subsequently, the automated measurement procedure is used extract key thymic indices. Then a rigorous accuracy assessment step is implemented to guarantee the accuracy and reliability of the obtained measurements. By furnishing a more extensive and detailed set of thymic measurements, this methodology marked a significant advancement over existing approaches, thereby providing a more thorough assessment of the thymus. The performance of the proposed method was evaluated against manual segmentation performed by radiology experts, demonstrating significant improvements in both efficiency and the range of metrics calculated, while maintaining high accuracy and consistency. This automated approach provides a reliable and advanced tool for the structural evaluation of the thymus and the assessment of immune function.

Fig. 1
figure 1

Workflow of the proposed approach. (a) Utilizing manual segmentation and feature measurements of the thymic region performed by experienced radiologists, we trained the Thy-uNET model to develop an end-to-end automated deep learning approach. (b) The segmentation and measurement precision of Thy-uNET were validated using testing datasets from three different medical centers. (c) A reader study was conducted to compare the efficacy of Thy-uNET with that of physicians of varying levels of experience, and further evaluated the improvement in performance of radiology residents and junior radiologists when assisted by Thy-uNET

Materials and methods

Datasets

This study adhered to the Declaration of Helsinki and was approved by the ethics committees of Union Hospital, Tongji Medical College, Huazhong University of Science and Technology (WHUH, S0711) and the First Affiliated Hospital of Guangzhou Medical University (GYFYY, ES-2024-K173-01). This study included four cohorts: a training cohort (n = 500), an internal testing cohort (n = 100), an external testing cohort 1 (n = 100), and an external testing cohort 2 (n = 86). The training and internal testing cohorts were sourced from WHUH, primarily comprising patients undergoing routine health check-ups from January 2024 to April 2024. External testing cohort 1 was sourced from GYFYY, comprising lung cancer patients from the January 2023 to May 2023. External testing cohort 2 was the NSCLC-Radiomics-Genomics cohort (https://www.cancerimagingarchive.net/collection/nsclc-radiomics-genomics/) from The Cancer Image Archive (TCIA) database, a publicly available dataset. Patients were included if they met the following three criteria: (1) The CT scan covered the entire potential thymic region. (2) Patients without tumor invasion in the anterior mediastinum, prior chest surgery, or pneumonia in adjacent regions. (3) The quality of the CT images was sufficient for manual and automatic segmentation of the thymic region. Thymic segmentation and the subsequent feature measurements were performed by two senior radiologists (Bo Liang and Lian Yang) with over 25 years of experience in chest disease diagnosis, serving as the data for training Thy-uNET and as well as establishing the ground truth for evaluating the testing cohorts.

Automatic segmentation algorithm for the thymic region

Since accurate and robust segmentation is needed for the subsequent feature measurements of the thymic region, in this study, we proposed a two-stage segmentation algorithm (i.e. Thy-uNET model) based on the nnUNet architecture to overcome the difficulties posed by the small size and variable appearance of the thymus in chest CT images. The specific framework for the Thy-uNET model was illustrated in Supplementary Fig. S1. In the first stage, we employed the nnUNet architecture to perform a coarse segmentation of chest CT images. To address the issue of class imbalance arising from the small size of the thymus, we adjusted the loss function by increasing the weight assigned to the thymus category. This modification enabled the model to better learn the thymic region and capture its approximate location. In the second stage, based on the segmentation results obtained from the first stage, we cropped the original images to focus solely on the region of interest (ROI) that contains the thymus. This allowed us to perform a fine segmentation of the thymus within the cropped ROI. This two-stage approach ensures that the model is able to accurately identify and segment the thymus, even when it is a relatively small and challenging target within the chest CT images. The Thy-uNET model was trained using a combination of Dice loss and cross-entropy loss to balance segmentation accuracy and address the class imbalance. A higher weight was applied to the thymus class in the loss function. Optimization was performed using the Stochastic Gradient Descent (SGD) optimizer with a momentum of 0.99 and Nesterov Accelerated Gradient. The initial learning rate was set to 0.01 and dynamically adjusted using polynomial decay. To prevent overfitting, weight decay was set to 3 × 10⁻⁵.

To validate the benefit of our two-stage strategy, we trained and evaluated four competitive single-stage segmentation networks—3D U-Net, UNetR, TransUNet3D and standard nnU-Net—using identical data splits and training settings. Their performance is reported alongside Thy-uNET in Supplementary Tables S1 and visualized in Supplementary Fig. S2. These results demonstrate that the proposed two-stage framework yields superior segmentation accuracy compared with both classic single-stage and recent hybrid architectures. In addition, the overlap between the ground truth and the segmentation results of the Thy-uNET model is shown in Supplementary Fig. S3.

Automatic measurement algorithm for thymic features

It is reported that the thymus or thymic region has seven main features, including thymic region CT attenuation, AP dimension, TR diameter, LT length, LT thickness, RT length, and RT thickness The slices are traversed along the Z-axis to identify the slice with the largest thymic area. On this slice, the Euclidean distance transform is applied to find the largest inscribed circle that is entirely contained within the thymic region, and its radius and average HU value are calculated to evaluate density uniformity. Next, based on the location of the highest point of the thymus, the thymus is classified into a conical type and other types. Then, the TR diameter (width along the X-axis) and anteroposterior diameter (height along the Y-axis) of the thymus, as well as the length and thickness of the left and right lobes, are calculated. These indicators are obtained through geometric calculations and distance measurements, leveraging the morphological features of the thymus and the mask data. The Supplementary method provides a detailed explanation of the calculation methods and formulas for these seven features.

Reader study

Furthermore, we have carefully designed a reader study to enhance the comprehensiveness of the evaluation. Initially, 40 patients were randomly selected from each of the three testing cohorts. Specifically, we invited two radiology residents (Bingxin Gong [R1], Yi Li [R2]), two junior radiologists (Chanyuan Liu [J1] and Dongyong Zhu [J2]), and two senior radiologists (Qianqian Fan [S1] and Qing Sun [S2]) to participate in the study. They independently measured the thymic regions of the 120 patients across the three cohorts and recorded meticulously the time required for each measurement. To more intuitively demonstrate the practical efficacy of Thy-uNET in clinical assistance, one month after the initial measurements, we provided the thymic measurements generated by the Thy-uNET to the two radiology residents and the two junior radiologists. With the aid of Thy-uNET, these four doctors re-evaluated the cases and compared the results with their initial assessments. This approach aims to further validate the potential of Thy-uNET in improving doctors’ measurement efficiency and accuracy.

Statistical analysis

Continuous variable (age) was described using mean and standard deviation (SD), while categorical variables, including gender, CT scan sections, protocols, and CT scanner types, were described using counts and percentages. To evaluate the performance of Thy-uNET in measuring thymic features, we utilized Spearman’s R, intraclass correlation coefficient (ICC), mean absolute error (MAE), absolute error, coefficient of determination (R²), and Bland-Altman plots. Dice and Intersection over Union (IOU) were employed to assess the segmentation quality of Thy-uNET. In comparing absolute errors, the Kruskal-Wallis Test and Dunn’s test were used for pairwise comparisons between Thy-uNET and human assessments. A two-tailed P-value < 0.05 was considered statistically significant. Statistical analyses were conducted using R (Version 4.3).

Result

Population characteristics

A total of 786 patients were included in this study. The average ages in the training cohort, internal testing cohort, external testing cohort 1, and external testing cohort were 51.1 (SD 9.1), 51.1 (SD 13.4), 56.8 (SD 8.4), and NA (age data missing in NSCLC-Radiomic-Genomics Cohort), respectively. The number of females in the four cohorts was 274 (54.8%), 44 (44%), 23 (23%), and 28 (32.6%), respectively. Table 1 summarizes the characteristics of the patients.

Table 1 Characteristics of patients in four cohorts

Segmentation and measurement assessment of Thy-uNET

The segmentation performance of Thy-uNET was validated in three testing cohorts. Specifically, in the internal testing cohort, the model achieved a Dice of 0.83 and an IOU of 0.71. Thy-uNET also performed well in both external testing cohort 1 and external testing cohort 2, with Diceof 0.83 and IOU values consistently at 0.70 in both cohorts (Table 2). Figure 2 displays the manual segmentation image and the image segmented by Thy-uNET for one patient. Additionally, we analyzed the model’s segmentation performance across different subgroups. The results showed that Thy-uNET’s segmentation performance remained consistent across subgroups based on age (Age > 60 or Age ≤ 60), sex, CT scans section (thin-section or thick-section), and CT protocol (chest or chest + abdomen), with no significant differences observed. It is worth noting that the segmentation performance was slightly higher in the internal testing cohort compared to the external testing cohorts. Furthermore, Thy-uNET’s segmentation performance was also slightly better on images acquired with Philips scanners compared to images from GE scanners (Supplementary Fig. S4). These results further confirm the broad applicability and robustness of Thy-uNET.

Table 2 Segmentation performance of Thy-uNET in testing cohorts
Fig. 2
figure 2

Example images of the thymic regions segmented manually (a-c, the green region) and by the model (d-f, the red region) in axial, coronal, and sagittal views

Next, we evaluated the automatic measurement performance of the seven thymic features obtained by Thy-uNET. In the testing cohorts, for the measurement of thymic region CT attenuation, Thy-uNET achieved a Spearman R of 0.829 and an ICC of 0.841 (Fig. 3a). For the assessment of AP dimension, the Spearman R was 0.835 and the ICC was 0.839 (Fig. 3b). The Spearman R for measuring the TR diameter was 0.793, with an ICC of 0.819 (Fig. 3c). The Spearman R for measuring the LT length was 0.782, with an ICC of 0.780 (Supplementary Fig. S5a). The Spearman R for measuring the LT thickness was 0.615, with an ICC of 0.627 (Supplementary Fig. S5b). The Spearman R for measuring the RT length was 0.595, with an ICC of 0.597 (Supplementary Fig. S5c). The Spearman R for measuring the RT length was 0.491, with an ICC of 0.525 (Supplementary Fig. S5d). We also demonstrated the efficacy of Thy-uNET for the seven thymic features in three separate cohorts, and obtained similar results (Supplementary Fig. S6S81). Supplementary Fig. S9 displays the manual measurement and the automatic measurement by Thy-uNET. Table 3 displays these performance of Thy-uNET in measuring seven thymic features.

Fig. 3
figure 3

The scatterplots (left) and Bland-Altman plots (right) showing the correlation and agreement between expert readers and Thy-uNET in thymic region CT attenuation (a), AP dimension (b) and TR diameter (c)

Table 3 Measurement performance of Thy-uNET in testing cohorts

Reader study

A reader study cohort comprising 120 patients was assembled by randomly selecting 40 patients from each of the three testing cohorts. Two radiology residents, two junior radiologists, and two senior radiologists were invited to measure thymic features and record the time required. Supplementary Tables S2 and S3 provide the correlation coefficients and ICC between the measurements made by each level of radiologist, as well as those made by Thy-uNET, and the gold standard. We found that the measurement performance of Thy-uNET, in terms of both Spearman R and ICC, was comparable to that of the residents (R1, R2) and junior radiologists (J1, J2). To further evaluate the measurement performance of Thy-uNET, we calculated the absolute errors between its measurement results and those of expert readers. We observed that there were no significant differences between Thy-uNET and any of the readers in measuring thymic region CT attenuation and TR diameters (Fig. 4a and c). For the measurements of AP dimension, LT length, and LT thickness, although the absolute errors of Thy-uNET were slightly higher than those of the senior radiologists, they were comparable to the absolute error levels of the residents and junior radiologists (Fig. 4b, Supplementary Fig. S10a and S10c). In terms of RT length and RT thickness measurements, the absolute errors of Thy-uNET were slightly higher than those of the senior radiologists, J2 and R2 (Supplementary Fig. S10b and S10d). Notably, the time required for Thy-uNET to obtain thymic features was significantly shorter than that of all participating radiologists (Fig. 4d).

Fig. 4
figure 4

The absolute error between expert readers and Thy-uNET or readers in thymic region CT attenuation (a), AP dimension (b) and TR diameter (c). (d) shows the difference of time cost between readers and Thy-uNET

We have observed that there is room for improvement in the measurement of thymic featuresperformed by the two radiology residents and the two junior radiologists. After one month, we provided each patient’s Thy-uNET measurement information to four readers and conducted re-measurements. After referencing Thy-uNET, the measurement time for all four doctors was reduced (Fig. 5a), and it improved the residents’ performance in measuring thymic region CT attenuation, AP dimension, and TR diameter (Fig. 5b-d). However, it may not be helpful in terms of LT length, LT thickness, RT length, and RT thickness (Supplementary Fig. S11).

Fig. 5
figure 5

The time cost (a) and absolute error between expert readers and readers in thymic region CT attenuation (b), AP dimension (c) and TR diameter (d) before using AI or after using AI

Discussion

We have developed and evaluated a novel deep learning algorithm, Thy-uNET, specifically designed for end-to-end automated segmentation and measurement of the thymus or thymic region. Using manual segmentation and measurement data from two senior chest radiologists with over 25 years of experience as the gold standard, Thy-uNET accurately identifies and segments the thymus or thymic region from CT images, and further automatically calculates the thymic region CT attenuation and multiple dimensional parameters of the thymus or thymic region. This model has been validated in multi-center cohorts and public databases. The results demonstrate that Thy-uNET exhibits excellent performance in thymic region segmentation, achieving a Dice of 0.83. Additionally, when measuring seven key features, the algorithm not only significantly reduces the time required but also produces measurements comparable to those of junior radiologists, substantially lowering costs. Although the MAE for CT attenuation measurement reaches 10.95, we believe it does not impact clinical practice, and this value is also significantly lower than the standard deviation reported in anther study for manual CT attenuation calculations [13]. Importantly, when used as an auxiliary tool by clinicians, Thy-uNET shortens readers’ working time and improves the performance of certain measurement features. Compared to the similar work by Okamura et al. [14], which achieved a Dice of 0.76 and failed to provide thymic features beyond segmentation, Thy-uNET not only offers higher segmentation performance but also comprehensively covers detailed feature measurements of the thymus or thymic region, demonstrating outstanding performance in both multi-center data validation and reader study. This study establishes a foundation for multicenter research on body composition [15, 16]. Similar to other body composition, the thymus may also play a role in various diseases [17, 18].

The thymus is a central lymphoid organ in the body, responsible for inducing the maturation of T cells [19, 20]. The majority of vertebrates undergo thymic involution [21], a process where thymic epithelial tissue may be replaced by adipose tissue, ultimately leading to a decrease in thymic output [22, 23]. Consequently, there are possibilities that an individual’s thymus may not undergo involution, may undergo partial involution, or may undergo completely involution [24]. Currently, thymic function is generally assessed by measuring the abundance of T-cell receptor excision circles (TRECs) in the blood, which serve as a biomarker for naïve T-cell populations. This biomarker has become the gold standard for evaluating thymic function over the past two decades [25, 26]. Its advantages include being minimally invasive and allowing for sampling at any time; however, it also has drawbacks such as economic costs, the inability to spatially visualize the structure and composition of the thymus, and the potential influence of peripheral TRECs, which may hinder accurate assessment of recent thymic function [27]. A study with a large cohort has found that radiological thymic structure is correlated with the abundance of naïve CD8 T cells [28]. Therefore, by utilizing routine physical examinations or chest imaging follow-ups for cancer patients, combined with Thy-uNET, it is possible to assess the overall thymic function of patients radiologically, without any additional costs or invasive procedures.

Therefore, segmenting the thymic region can encompass all possibilities of dynamic processes, including non-involution, involution, and partial involution of the thymus, rather than focusing on a specific stage. Currently, there are automatic segmentation algorithms specifically developed for thymic epithelial tumors and automatic classification of thymic tumor subtypes. However, these are often limited to subtype differentiation within specific tumors or binary classification for specific tumor diagnoses [29, 30]. They typically only segment the soft tissue regions of tumors, neglecting the residual reticular tissue or small soft tissue portions after incomplete thymic involution. Hence, the development of an algorithm that can segment and assess the entire thymic region is necessary. It is foreseeable that the Thy-uNET algorithm holds the following practical clinical significance: (1) It can assess the changes in human thymic characteristics with age in multi-center big data, which will contribute to the development of the field of autoimmunity and provide updated and reliable data references. (2) Thy-uNET can provide efficient and accurate information, focusing on individuals with premature thymic involution or persistent thymic tissue, and their association with autoimmune diseases or tumor incidence. (3) Thy-uNET can offer a comprehensive view of the anterior mediastinum, playing a crucial role in multi-class automatic diagnostic tasks.

Our study has several limitations. Firstly, we did not perform automatic classification for thymic involution or thymic tumors, which may be more relevant to clinical practice. Secondly, there were significant differences between our automatic measurements and expert measurements for some features (such as the RT length and thickness). This discrepancy may arise because experts, when selecting the measurement slice, referred to previous methods for selecting the measurement plane and chose the section with the longest AP dimension [31], whereas we currently use an algorithm to calculate which slice has the largest thymic area across all slices and then perform measurements. Therefore, there may be differences in measurements. Although we believe our logic for selecting the measurement slice may be more reasonable, it also resulted in lower consistency for some data. Thirdly, we excluded patients with anterior mediastinal invasion by tumors, prior chest surgery, or pneumonia in adjacent regions, which may lead to poor performance of our algorithm in these specific cases, therefore, future work will aim to include larger and more diverse populations, including pediatric cohorts and patients treated with chest surgery.

Overall, the Thy-uNET we developed is the first algorithm capable of providing thymic information from CT images quickly and accurately. It offers crucial insights into human immunology and holds significant value from both scientific research and clinical practice perspectives due to its high reliability and comprehensive functionality. Future important work includes integrating the classification and diagnosis tasks of thymic involution and thymic diseases into this algorithm. Additionally, incorporating patients with anterior mediastinal invasion by tumors, prior chest surgery, or pneumonia in adjacent regions is necessary to enhance the algorithm’s generalizability.