Introduction

Acute Ischemic Stroke (AIS) is the second leading cause of death globally and the third leading cause of mortality and disability worldwide [1,2,3]. In China, stroke has become the leading cause of death, with over 2 million new cases reported annually [4]. Acute Ischemic Stroke (AIS) accounts for approximately 87% of all stroke cases [5]. Studies have shown that targeted treatment within the 6-hour thrombolytic therapy window after the onset of AIS can significantly reduce mortality by 20% and disability by 30% [6,7,8,9]. Therefore, rapid and accurate clinical and imaging diagnosis plays a crucial role in the evaluation and treatment of AIS [10]. Early identification and timely treatment of AIS can significantly reduce morbidity and mortality [11].

Non-Contrast Computed Tomography (NCCT) is widely used as the initial imaging modality for suspected acute stroke patients due to its low cost and rapid imaging capabilities [12]. However, within 6 h of AIS symptom onset, there is typically no significant change in brain tissue density, making it difficult to visually identify low-density areas of AIS infarcts in conventional NCCT images [13]. Previous studies have shown that radiologists can identify no more than 10% of infarcts based on NCCT [12]. In contrast, Magnetic Resonance Imaging (MRI) Diffusion-Weighted Imaging (DWI) offers higher sensitivity and accuracy in detecting AIS, enabling the identification of ischemic lesions within minutes of onset [12, 14, 15]. However, the high cost of MRI equipment limits its availability in primary healthcare facilities. Additionally, MRI scans are time-consuming, and patients with large infarcts often have difficulty remaining still, hindering rapid diagnosis in emergency settings. The strong magnetic field environment of MRI also restricts the use of essential life-support equipment during scanning. These factors limit the application of MRI-DWI in the emergency management of AIS.

Therefore, for this high-mortality disease requiring timely treatment, rapid and accurate assessment of AIS infarcts based on NCCT is of great significance for improving treatment outcomes. Radiomics, which involves converting medical images into quantitative data, can assist in clinical decision-making [16]. Radiomics allows for in-depth characterization of phenotypes associated with different lesions, generating new predictive indicators. Machine Learning (ML), a branch of Artificial Intelligence (AI), has been widely applied in the field of neuroscience [17,18,19,20,21]. Some researchers have attempted to use ML for intelligent assessment of AIS infarcts. This study aims to develop and validate a radiomics-based imaging biomarker using NCCT, combined with an ML model, to detect early microscopic changes in AIS patients.

Materials and methods

Study design and patient cohort

The study protocol was approved by the Institutional Review Board of our institution. As this is a retrospective study that does not involve patient intervention and all data were fully anonymized, the Medical Ethics Committee waived the requirement for informed consent. From June 2016 to July 2020, we retrieved 1570 patients who were diagnosed with AIS on their medical records. The inclusion criteria were as follows: (a) patients who underwent both NCCT and MRI-DWI scans; (b) NCCT images showing no abnormal intensity change; (c) AIS lesions visible on the MRI-DWI image; and (d) AIS diagnosis confirmed by a radiologist with10 years of experience and reviewed by a senior radiologist with 20 years of experience. Patients were excluded if: (a) severe artifacts on NCCT images; (b) cerebral hemorrhage lesions on NCCT; or (c) image quality was insufficient for analysis. A total of 153 AIS patients meeting these criteria were included as an internal dataset. From 2021 to the present, an additional 75 AIS patients were consecutively enrolled as an independent validation dataset. There were limited patients with multiple lesions, but the lesions per patient did not exceed four at most. These lesions were treated as separate sample in the analysis. Patient characteristics, including sex, age, onset-to-NCCT time, onset-to-MR time, number of AIS volumes of interest (VOIs), number of non-AIS VOIs, arterial occlusion, restricted diffusion, were collected.

CT and MRI image acquisition, preprocessing, delineation, and feature extraction

Detailed MRI and NCCT imaging protocols are summarized in Appendix Table S1 and S2.

All patients underwent NCCT scans during the acute phase of symptom onset within 6 h. MRI scans, including DWI, were performed subsequently if the patient remained clinically stable, and the median onset-to-DWI times were 42 and 32 h on the internal and independent test sets, respectively.

VOI was delineated on DWI images due to the inability of NCCT to show AIS lesions. To address spatial misalignment between NCCT and MRI modalities, we performed multimodal image registration using the Elastix toolbox (17), achieving precise spatial correspondence between NCCT and DWI datasets.

A Junior radiologist (Mj. X; 5-year experience in clinical radiology) manually contoured VOIs, encompassing both lesional and normal tissue regions, using ITK-SNAP software (version 3.8.0) (18). All VOIs were independently reviewed and approved by a senior radiologist (20 years of clinical radiology experience). Radiomics features were extracted from the spatially registered NCCT images corresponding to the MRI-derived VOIs.

Prior to feature extraction, all images were resampled to 1 × 1 × 1 mm3 via the B-spline interpolation. Feature extraction was performed on the open-source package Pyradiomics version 3.0.1 (19). The feature types included shape (depicts the shape of the VOI), first-order (depicts the distribution of voxel grey intensities within the VOI), and high-order statistics (depicts the grey texture of the VOI). If the image was processed by different filters, including Laplacian of Gaussian (LoG), Wavelet, Square, SquareRoot, Logarithm, Exponential, and Gradient, the feature values of the first-order and high-order statistics perhaps reflected different biological meanings. Figure 1 shows the whole experimental design.

Fig. 1
figure 1

The workflow of this study

MR image

Dataset split and radiomics feature screening

In this research, the dataset of 228 patients was split into two datasets (training set, n = 153; independent test set, n = 75) at a ratio of 7:3. The procedure of radiomics feature screening was performed on the training set. First, we applied a statistical method to evaluate the difference in the radiomics features between the non-AIS and AIS groups. The Shapiro‒Wilk test assessed the data distribution of the two groups. The t test was applicable to calculate the difference for normally distributed data; otherwise, the Mann‒Whitney U test was used. P < 0.05 was considered significant. Second, the Spearman test was implemented to remove the situation that the high correlation of the remaining features was very likely to lead to multicollinearity. Those features with a correlation index greater than 0.9 were removed. Subsequently, the recursive partition tree (RPT) algorithm was used to select the final features. Before the screening, the data were standardized by z score to fit the RPT model. In the RPT model, tenfold cross-validation with a five repeated times strategy was used to determine the most stable features. The output parameter was chosen as the “Importance” evaluation indicator. The importance of the final selected feature is shown in Fig. 2.

Fig. 2
figure 2

Importance of the most valuable variables screened by the recursive partition tree (RPT) algorithm

Classification model establishment

A random forest (RF) algorithm is a classical algorithm that is frequently used in the machine learning field. For the classification task, the RF model was an ideal model with good performance on a moderate or large dataset. Moreover, the RF model was concatenated by multiple trees, with a better ability for anti-overfitting and generalization.

Evaluating the performance of the classification model

We trained the RF model with the final selected features on the training set, validating the model performance on the validation set and independent test set. The area under the curve of the receiver operator characteristic (AUROC) with its 95% confidence interval, accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), kappa, and F1 index were used to assess the model efficacy. The calibration curve demonstrated the agreement between the observation and prediction of the two groups. Decision curve analysis (DCA) shows the net benefit that patients gained from the model at a certain threshold. Data analyses were conducted in R software version 4.1.2 (https://www.r-project.org/).

Results

Patient characteristic information

We recruited 153 AIS patients (from June 2016 to July 2020) with 179 AIS VOIs and 153 non-AIS VOIs as an internal dataset (training and validation). Another 75 patients (from 2021 to the present) had 94 AIS VOIs and 75 non-AIS VOIs as an independent test dataset. The patient’s mean age was 65.6 years (range, 33–95 years) in the internal dataset and 62.1 years (range, 21–90 years) in the independent test dataset. There were 102 males and 51 females in the main set and 53 males and 22 females in the independent test set. More detailed characteristic data for the patients are depicted in Table 1.

Table 1 Characteristic data for 228 patients

Feature type and selection

Each VOI of patients produced 1634 features, including first-order and high-order statistics and image features processed by filters (LoG, wavelet, square, square root, logarithm, exponential, and gradient). The high-order statistics comprised the grey level cooccurrence matrix (GLCM), grey level run length matrix (GLRLM), grey level size zone matrix (GLSZM), grey level dependence matrix (GLDM), and neighbouring grey tone difference matrix (NGTDM). Through the statistical calculation, 1196 out of the 1634 features were significant in the analysis. The Shapiro-Wilk test revealed that 28% of features (458/1634) followed a normal distribution. After the Spearman correlation analysis, 266 out of the 1196 features were considered independent, nonmulticollinearity variables. In the RPT model, 10 features were deemed to be the best variables to fit the classification model. The correlation between each feature is depicted in Fig. 3. Those optimal features contained four wavelet filter-based textural features, three LoG filter-based textural features, two gradient filter-based textural features and one textural feature from the original filter.

Fig. 3
figure 3

The correlation map among the 10 optimal features

Diagnosis performance of the RF model

The RF model with 10 optimal features was chosen to discriminate AIS from non-AIS. The results showed that the model achieved good diagnostic performance on the training, validation, and independent testing sets. On the training set, the AUC (Fig. 4) was 0.858 (95% CI: 0.808–0.908), the accuracy was 79.399%, the sensitivity was 81.679%, the specificity was 76.471%, and the F1 index was 0.817. On the validation set, the AUC (Fig. 4) was 0.829 (95% CI: 0.748–0.910), the accuracy was 77.778%, the sensitivity was 77.083%, the specificity was 78.431%, and the F1 index was 0.771. On the independent test set, the AUC (Fig. 4), accuracy, sensitivity, specificity, and F1-score were 0.789 (95% CI: 0.717–0.860), 73.965%, 68.085%, 81.333%, and 0.744, respectively. A more detailed performance of the RF model is displayed in Table 2.

Fig. 4
figure 4

The area under the curve (AUC) of the receiver operator characteristic (ROC) plot of the random forest (RF) model on the training (a), validation (b) and independent test sets (c)

Table 2 Predictive performance of RF model on the training, validation and independent test sets

We conducted subgroup analyses based on gender. For the gender subgroup, the ML model achieved an average AUROC of 0.861 (range: 0.816–0.862) in males and 0.842 (range: 0.780–0.885) in females. Gender did not significantly affect the model’s predictive performance.

Validating the performance efficacy of the RF model by calibration and decision curve analysis

As illustrated by DCA (Fig. 5), the patient’s net benefit from the model was relatively high on either the training, validation or independent test sets. The calibration curve for the probabilistic AIS score in the three datasets explained the good agreement between the predictions and observations (Fig. 6). The trend of a predictive score of each sample gained from the RF model is shown in Fig. 7.

Fig. 5
figure 5

Decision curve analysis (DCA) plot for the training (a), validation (b) and independent test sets (c)

Fig. 6
figure 6

Calibration curve to evaluate the RF model agreement between prediction and observation on three datasets

Fig. 7
figure 7

The bar plot shows the distribution of each volume of interest (VOI) prediction score obtained from the RF model on training (a), validation (b) and independent sets (c)

Discussion

The prognosis of AIS patients varies depending on the timing of diagnosis and intervention, making timely and accurate early diagnosis critical. Given the long imaging time and numerous contraindications associated with MRI, and the fact that MRI is not always available in some stroke centers, NCCT screening is more commonly used for acute cerebrovascular disease patients in China. However, NCCT has limitations. While it is sensitive to acute cerebral hemorrhage, it is less sensitive to mild cerebral edema and cellular damage caused by AIS infarcts, which typically do not result in significant changes in brain tissue density shortly after onset, making it difficult to distinguish infarct areas from normal brain tissue in NCCT images [15]. In contrast, MRI-DWI is highly sensitive to changes in water molecule movement caused by AIS infarcts and can clearly display infarcts within 6 h of onset, offering higher diagnostic sensitivity.

Although NCCT and MRI-DWI differ in imaging principles, paired NCCT and MRI-DWI images exhibit structural consistency, meaning that the tissue specificity and morphological features of the same infarct in NCCT and MRI-DWI images from the same patient are consistent. Therefore, this study employs a Cross-Modality Guided Learning (CMGL) strategy: first, the model is pre-trained using MRI-DWI images to learn the morphology and location of AIS infarcts, and then the trained model parameters are transferred to the NCCT modality to focus on learning the tissue differences between infarcts and normal brain tissue in NCCT.

This study proposes a radiomics approach combined with machine learning to identify acute AIS (< 6 h) and non-AIS on NCCT images, and validates and tests the approach on a corresponding dataset. The results demonstrate that the NCCT-based model performs excellently in predicting AIS. The Random Forest (RF) model showed robust diagnostic performance across the training, validation, and independent test sets. The areas under the Receiver Operating Characteristic (ROC) curves were 0.858 (95% CI: 0.808–0.908), 0.829 (95% CI: 0.748–0.910), and 0.789 (95% CI: 0.717–0.860), respectively. The accuracies were 79.399%, 77.778%, and 73.965%, while the sensitivities were 81.679%, 77.083%, and 68.085%. The specificities were 76.471%, 78.431%, and 81.333%, respectively.

The strength of this study lies in the NCCT-based model’s ability to accurately display acute ischemic stroke, thereby shortening the time from onset to treatment. Since this study is based on NCCT rather than MRI or CT Perfusion (CTP), it is particularly useful for patients in primary healthcare settings. This method can provide valuable references for clinicians or radiologists to make early decisions and interventions, significantly improving patient outcomes.

Compared to previous studies, the model in this study demonstrates higher accuracy in identifying AIS. For example, Guan et al. [22] used radiomics to extract image features and constructed a classifier to distinguish infarct areas from normal areas on CT, achieving an average classification accuracy of approximately 0.65, while our study achieved an accuracy of 73.965%. Additionally, this study included a larger patient cohort of 228 patients, compared to only 56 patients in Guan et al.‘s study. Qiu et al. [23] developed a machine learning model based on CT scans of 157 AIS patients (onset time < 6 h), using manually segmented ischemic lesions on MRI-DWI as the gold standard. The model was validated on an additional 100 CT scans, showing a high correlation (r = 0.76) between machine learning-calculated lesion volumes and the gold standard. The machine learning method can identify and quantify the ischemic core from baseline CT scans of AIS patients with accuracy close to that of MRI, providing critical imaging biomarkers for treatment decisions. This study reached similar conclusions, demonstrating the strong performance of the NCCT-based machine learning model in revealing acute stroke (< 6 h).

However, this study has several limitations. First, the gender imbalance in the dataset may affect the robustness of the classifier, and future validation with larger datasets is needed. Second, as a retrospective study, the classification model can only learn features from existing rules, making it difficult to conduct prospective clinical studies. Third, differences in imaging quality between different CT and MR scanners, as well as the lack of immediate MR scans after NCCT acquisition, may affect the accuracy of the results. Fourth, the independent test set (n = 75) is relatively small, therefore, future studies should expand the external validation cohort through multicenter collaboration. Additionally, the absence of CTP and MRP data in the training set makes it difficult for the model to accurately determine the extent of the ischemic penumbra, which will be a focus of future research. Future studies will incorporate explainable AI frameworks (e.g., SHAP) to elucidate nonlinear interactions among radiomic features and validate these findings in prospective multicenter cohorts.

Conclusion

The results of this study indicate that a radiomics-based machine learning model using NCCT can distinguish between AIS and non-AIS patients within 6 h of onset. This model has the potential to serve as an auxiliary tool, providing valuable references for clinicians, and further validation of its clinical application is warranted.