Abstract
Background
Lumbar spinal stenosis is one of the most common surgery-requiring conditions of the spine in the aged population. As elderly patients often present with multiple comorbidities and limited physiological reserve, individualized risk assessment using comprehensive geriatric assessment is crucial for optimizing surgical outcomes.
Methods
Patients 65 years or older who underwent elective surgery for lumbar spinal stenosis between June 2015 and December 2018 were prospectively enrolled, resulting in 261 eligible patients of age 72.3 ± 4.8 years (male 108, female 153). Twenty-seven experienced complications of Clavien-Dindo grade 2 or higher within 30 days, and 79 received transfusion during hospital stay. The cohort was split into train-validation (n = 208) and test (n = 53) sets. A total of 48 features, including demographics, comorbidity, nutrition, and perioperative status, were collected. Logistic regression, support vector machine (SVM), random forest, XGBoost, and LightGBM were trained using five-fold cross-validation. AUROC and AUPRC were considered the primary performance metrics, and the results were compared with those estimated with ACS-NSQIP scoring system. A set of Compact models incorporating a smaller number of features was also trained, and SHAP analysis was conducted to evaluate the models’ interpretability.
Results
The reduced number of features did not result in the drop of AUROC and AUPRC for all machine learning algorithms (P > 0.05). when compared to the ACS-NSQIP scoring system, which achieved a test AUROC of 0.38 (95% confidence interval [CI], 0.13–0.73) and 0.22 (95% CI, 0.10–0.36) on the first two tasks, the Compact model showed significantly greater AUROC values nearing or surpassing 0.90. Decision tree-based algorithms demonstrated larger test AUROC than logistic regression and generally agreed on the most influential features for each task.
Conclusions
Advanced machine learning models have consistently shown greater performance and interpretability over conventional methodologies, implying their potential for a more individualized risk assessment of the aged population.
Trial registration
Not applicable as this research is not a clinical trial.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Lumbar spinal stenosis (LSS) is one of the most common degenerative diseases of the spine, often leading to chronic pain and gait disturbances that significantly impair quality of life [1,2,3]. When symptoms progress, surgical decompression with or without spinal fusion becomes necessary to alleviate neurological deficits and restore function [1,2,3,4]. LSS is particularly prevalent in the elderly population, with its incidence increasing with age due to progressive degenerative changes in the spine [1, 5]. Elderly patients often present with multiple comorbidities and reduced physiological reserves, posing a significant risk of perioperative complications, including surgical site infections, cardiopulmonary events, and prolonged recovery times [5,6,7,8,9,10]. Therefore, accurate risk stratification in this population is crucial for optimizing surgical decision-making and improving postoperative outcomes.
Various scoring systems have been utilized to comprehensively assess the health status of elderly patients and predict postoperative complications and transfusion risk. Commonly used tools include the American Society of Anesthesiologists Physical Classification System (ASA classification), the American College of Surgeons National Surgical Quality Improvement Program (ACS-NSQIP) surgical risk calculator, and the Charlson Comorbidity Index (CCI), all of which provide structured risk assessments based on preoperative patient characteristics [11,12,13]. Particularly for patients undergoing surgical intervention of LSS, comprehensive geriatric assessment (CGA) has been proposed to evaluate the risk of early postoperative complications [5]. Unlike traditional methods, CGA aims to integrate multiple dimensions of health, including frailty, activities of daily living (ADL), mini-mental state examination (MMSE), nutritional status, and medication burden [9, 10, 14,15,16].
However, despite their clinical effectiveness, the aforementioned methods rely on a limited number of predefined variables and simple statistical models, which may fail to adequately account for the complexity of health conditions in elderly patients. Contrastingly, machine learning (ML) offers a data-driven approach that can integrate numerous clinical factors and capture intricate, nonlinear relationships among them [17, 18]. Among various techniques, decision tree-based algorithms, including random forests and gradient boosting machines have gained attention due to their robustness and interpretability [19,20,21,22].
Recently, a growing body of research has demonstrated the effectiveness and potential of ML in the context of geriatric surgery. For instance, in gastrointestinal cancer surgeries, ML models have outperformed traditional bedside scoring systems in predicting postoperative morbidity and mortality [23]. Similarly, several studies have reported the superior performance of ML algorithms in predicting delirium, a common postoperative complication among elderly patients [24, 25]. Furthermore, there have been efforts to apply ML to forecast cardiovascular events in non-cardiac surgeries, as well as to predict the onset of acute kidney injury [26, 27]. Collectively, these findings highlight the substantial potential of ML as a powerful tool for personalized risk assessment and decision-making in elderly patients, who often present with diverse and complex clinical profiles. In the context of spinal surgery, ML have been applied to various prediction tasks, including estimating prolonged length of hospital stay, predicting the likelihood of blood transfusion after adult spinal deformity surgery, and forecasting changes in patient-reported outcomes such as the SRS-22R questionnaire [28,29,30,31]. However, most of the preliminary studies were not focused on the geriatric population, and we aimed to specify the cohort as the elderly are more vulnerable to adverse events after surgery.
Using a prospectively collected cohort of elderly patients undergoing surgery for LSS, we develop and evaluate ML models predicting early postoperative complications and the need for blood transfusion during the hospitalization period. By leveraging ML techniques, we seek to improve the accuracy of perioperative risk assessment compared to traditional methods, particularly the ACS-NSQIP scoring system. Furthermore, Shapley additive explanations (SHAP) analysis is performed to identify key risk factors contributing to early postoperative outcomes and, if possible, provide clinically actionable insights [32].
Materials and methods
Patient cohort
Patients aged 65 years or older who underwent any type of elective surgery for LSS between June 2015 and December 2018 were prospectively enrolled. Those treated for spinal deformities or stenosis at non-lumbar levels, as well as patients requiring emergency operations, were excluded to ensure a more homogeneous study population. The study was approved by the Institutional Review Board (IRB) of our organization, and informed consent was obtained from all participants.
Data acquisition
A wide range of patient characteristics were collected one day before surgery. First, basic demographic profiles, including age, sex, height, weight, and body mass index (BMI), were recorded. Additionally, various scoring systems for evaluating the medical and functional status of geriatric patients were employed. Activities of daily living (ADL), instrumental activities of daily living (IADL), and economic dependency were assessed, with nonzero values indicating functional dependency [14]. Short-form geriatric depression scale (GDS, range: 0–15) and mini-mental state examination (MMSE, range: 0–30) were used to assess the patient’s psychological state, with a GDS score of 5 or greater indicating depression and MMSE score of 23 or below suggesting cognitive impairment [15, 33]. Mini nutritional assessment (MNA, range: 0–30) was performed to assess the patient’s general nutritional status, with a higher score indicating more adequate nutrition [16]. As comorbidity status is crucial in evaluating elderly patients, CCI and the number of medications were also recorded [13]. Frailty measures, including the modified frailty index (mFI), timed-up-and-go test (TUGT) time in seconds, fall history within 6 months before surgery, and fracture due to fall within 6 months before surgery, were assessed [9].
Preoperative features such as ASA classification and laboratory test results were collected one day prior to surgery [12]. The laboratory tests included leukocyte count, hemoglobin, blood glucose, blood urea nitrogen (BUN), serum creatine, estimated glomerular filtration rate (eGFR), total cholesterol, serum protein, serum albumin, aspartate aminotransferase (AST), and alanine aminotransferase (ALT). Furthermore, features relevant to surgical plans, including spine fusion and the number of involved vertebra levels, were also considered as inputs. In total, 48 variables were collected from each patient. Comprehensive interviews and complete retrieval of test results were conducted for every individual, taking approximately one hour per patient, resulting in no missing data. Min-max scaling was applied to all of the values into a range between 0 and 1.
In this study, several outcome measures were predicted. First, postoperative complications occurring within 30 days were evaluated during the hospitalization period and at outpatient visits 1 month after surgery. These complications were categorized into general and surgical complications, with events classified as Clavien-Dindo grade 2 (complication requiring pharmacological treatment with drugs other than analgesics, antiemetics, antipyretics, diuretics, and electrolytes) or higher considered positive for postoperative complications [34]. Additionally, the need for transfusion during hospitalization was recorded as a binary variable, capturing any instance of red blood cell (RBC), fresh frozen plasma (FFP), or platelet transfusion. As a result, three outcome variables were studied in this research: (1) general or surgical postoperative complications within 30 days (2) general postoperative complications within 30 days (3) the need for transfusion within the hospitalization period.
Model development
The patient cohort was randomly split into train-validation and test sets in a roughly 4:1 ratio, to ensure sufficient data for model training while preserving an independent set for final evaluation. Although the dataset was imbalanced, stratification and oversampling were not performed, as the same dataset was intended to be used for multiple tasks. Instead, to address the class imbalance during model training, a class weight ranging from 3 to 10 was assigned to the positive class, depending on the specific task and outcome distribution. To assess potential differences in input and output variables between the train-validation and test sets, statistical analyses were performed. Continuous variables were compared using independent t-tests, while categorical variables were analyzed using chi-square tests. P-values below 0.05 were considered statistically significant.
In this study, two types of predictive models were developed. The Complex model utilized all available input features collected, whereas the Compact model was designed to use a simplified set of 28 clinically relevant features, selected to eliminate redundancy in the input space and promote applicability for future studies. The predictive performance of both models was compared against the probability of “serious complication” as estimated by the ACS-NSQIP score, a widely recognized benchmark for predicting complications in non-cardiac surgery [11].
For each of the three outcome variables, five different machine learning algorithms were trained: logistic regression (LogReg), support vector machine (SVM) with radial basis function kernels, random forest classifier (RF), extreme gradient boosting (XGBoost) classifier, and light gradient boost machine (LightGBM) [20,21,22]. To maximize the utilization of available data, 5-fold cross-validation was applied to the train-validation set. Models were trained to minimize binary cross-entropy loss, and hyperparameter tuning was performed using Optuna (version 4.2.1), with 200 optimization trials conducted within a sufficiently broad hyperparameter search space, as detailed in Table 1 [35]. The combination of an extensive search space and a large number of trials was designed to ensure a thorough exploration of the parameter landscape and a high likelihood of converging to a near-optimal solution. The outputs from models trained in each fold were combined using soft voting to generate the final prediction probabilities.
Evaluation of model performance
The area under the receiver-operating curve (AUROC) and the area under the precision-recall curve (AUPRC) on the test was used as the primary performance measure in this study, as these metrics are independent of the decision threshold. Additionally, several auxiliary metrics, including accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value, and F1-score, were evaluated. They were calculated at Youden’s J point, a decision threshold that maximizes the sum of sensitivity and specificity [36].
First, the non-inferiority of the Compact model compared to the Complex model was statistically evaluated. Similarly, the performance of the Compact model was also compared against the ACS-NSQIP score, which served as a reference benchmark. In addition, performance differences among various machine learning algorithms within the Compact model framework were assessed. Since the dataset was relatively small, 95% confidence intervals for performance metrics, as well as their comparisons, were estimated using bootstrap resampling.
To assess the interpretability of the models, SHAP analysis was conducted on the Compact model. It evaluates the influence of each input feature on the final prediction, and a greater mean absolute SHAP value indicates a greater contribution to the model’s decision-making process [32]. For each trained model, the top 10 most influential features were identified, and their contributions were qualitatively examined using beeswarm plots generated from the test set.
Results
Cohort characteristics
A total of 278 patients enrolled in the study, and 17 were excluded due to the cancellation of surgery or change of treatment plans, resulting in 261 eligible for the study. The mean age of the cohort was 72.3 ± 4.8 years, and 108 (41.3%) patients were male. In total, 27 (10.3%) patients experienced either general or surgical postoperative complications of Clavien-Dindo grade 2 or higher within 30 days, with 20 (7.7%) general and 7 (2.7%) surgical. General complications consisted of 5 (1.9%) delirium, 5 (1.9%) cardiovascular, 4 (1.5%) urinary, 3 (1.1%) respiratory, 2 (0.8%) gastrointestinal, and 1 (0.4%) sacral fracture complications. Among surgical complications, neurologic symptoms were the most common, with 4 (1.5%) cases, followed by infection, hematoma, and pneumoperitoneum, each with 1 (0.4%) case. Transfusion was performed in 79 (30.2%) patients within the hospitalization period.
After the dataset was split, the train-validation and test sets contained 208 (79.7%) and 53 (20.3%) patients, respectively. Among the collected variables, only age (72.0 ± 4.8 vs. 73.5 ± 4.8 years; P = 0.049) and TUGT time (21.8 ± 44.0 vs. 13.7 ± 5.6 s; P = 0.02) showed significant differences. No evidence of a significant difference was found in general or surgical complication rates (10.6% vs. 9.4%; P = 1.00), general complication rates (7.7% vs. 7.5%; P = 1.00), and transfusion rates during hospitalization (31.7% vs. 24.5%; P = 0.39). A detailed comparison of all features between the train-validation and test sets is provided in Tables 2 and 3. The distribution of the output variable across the five cross-validation folds was examined using Fisher’s exact test, and no statistically significant differences were observed (P = 0.99, P = 0.72, and P = 0.98, respectively).
Model performance
The performance of the best models trained with each algorithm on each task is summarized in Tables 4, 5 and 6, and the receiver-operating characteristic (ROC) curve of the Compact model are plotted in Fig. 1.
Receiver-operating characteristic curves on the test set (Compact model). Receiver-operating characteristic curves for predicting (a) early general and surgical complications, (b) early general complications, and (c) transfusion within hospitalization period. LogReg, logistic regression; SVM, support vector machine; AUC, area under the receiver-operating characteristic curve
Compared to the Complex model, the Compact model did not demonstrate statistically significant inferiority in terms of AUROC or AUPRC for any of the tested machine learning algorithms. Furthermore, when compared to the ACS-NSQIP scoring system, which achieved a test AUROC of 0.38 (95% confidence interval [CI], 0.13–0.73) and 0.22 (95% CI, 0.10–0.36) on the first two tasks, the Compact model showed significantly greater AUROC values nearing or surpassing 0.90 for all algorithms except for LogReg (P = 0.10 and P = 0.61, respectively for each task). Although the Compact model generally yielded higher AUPRC values as well, statistical significance was observed only in the case of the SVM trained on the first task (P = 0.044).
Overall, decision tree-based algorithms, including RF, XGBoost, and LightGBM, consistently demonstrated greater performance than LogReg, while the performance of SVMs fluctuated depending on the task. For LogReg, only XGBoost for the second task—predicting general postoperative complications within 30 days—showed statistically significantly higher performance, with AUROC (0.60 vs. 0.95; P = 0.03) and AUPRC (0.13 vs. 0.51; P = 0.04). In the other tasks, no statistically significant improvements over LogReg were observed. The superiority of tree-based models over SVM was shown in the task of predicting transfusion. In terms of AUROC, all tree-based models demonstrated statistically significantly higher performance: RF (0.76 vs. 0.93; P = 0.03), XGBoost (0.76 vs. 0.96; P = 0.005), and LightGBM (0.76 vs. 0.92; P = 0.049). For AUPRC, a statistically significant improvement was observed only with XGBoost (0.45 vs. 0.80; P = 0.04).
Explainability analysis
Table 7 lists the top 10 most influential features of the Compact model for each ML algorithm identified by SHAP analysis. The results indicate that the key contributing factors were generally consistent across different algorithms, particularly among decision tree-based models. For early postoperative complications, important predictors included frailty measures such as fall history within 6 months and TUGT time, as well as preoperative laboratory test results and fusion operation. In the case of transfusion prediction, hemoglobin level, nutrition, and involved spinal levels were identified as major contributing factors.
Figures 2, 3 and 4 present the beeswarm plots and mean absolute SHAP values obtained from each task, comparing LogReg and XGBoost. As demonstrated in Figs. 2a, 3a, and 4a, LogReg consistently exhibited poor group separation in the beeswarm plots, with SHAP values concentrated in only a few variables. Contrastincly, XGBoost demonstrated clear group separation across all tasks, with SHAP values more evenly distributed across multiple variables, as illustrated in Figs. 2b, 3b, and 4b. This suggests that XGBoost exhibits greater interpretability than LogReg. SHAP analysis results for SVM, random forest, and LightGBM are provided in the additional figures dedicated for demonstrating SHAP results, and decision tree-based models like random forest and LightGBM have shown similar patterns to XGBoost (see Additional files 1, 2, and 3).
Discussion
In clinical research, statistical analysis methods have gradually shifted from traditional approaches, such as LogReg, to ML techniques due to their capability to capture nonlinear complex relationships in high-dimensional data [17, 18]. This shift is particularly relevant in the management of elderly patients, where multiple interrelated factors must be considered to achieve optimal outcomes [2, 5, 37, 38]. Accordingly, we applied ML to analyze our prospectively collected cohort, aiming to predict early postoperative complications and transfusion requirements in elderly patients undergoing surgery for LSS.
Comparison between models
As demonstrated earlier, the ML models consistently outperformed the reference method, ACS-NSQIP, in both AUROC and AUPRC. In particular, for the first two tasks involving the prediction of early postoperative complications, ACS-NSQIP yielded AUROC values below 0.5, indicating substantially limited predictive capability. It should be noted that AUROC and AUPRC were adopted as the primary evaluation metrics because they are independent of the decision threshold. Although auxiliary metrics, such as accuracy, sensitivity, and specificity, were measured at Youden-J point, careful selection of the decision threshold is warranted, with due consideration of the potential consequences associated with false predictions.
No evidence of statistically significant differences in AUROC and AUPRC was observed between the Complex and the Compact model. In other words, the exclusion of time- and labor-intensive variables such as ADL, IADL, MMSE, and CCI did not compromise predictive performance. These findings suggest that the Compact model may offer a more practical and scalable approach for future research and clinical implementation. Nevertheless, much validation and clinical studies must be preceded before real application.
When comparing the performance across ML algorithms, decision tree–based models such as XGBoost and LightGBM generally outperformed others, particularly in the task of predicting transfusion, where they showed superior performance compared to LogReg and SVM. Of particular note are the SHAP analysis results, which illustrate model interpretability: while LogReg tended to rely heavily on a single variable, tree-based models distributed importance more evenly across multiple variables, reflecting a more holistic consideration of diverse clinical features. Considering both predictive performance and interpretability, models such as XGBoost and LightGBM appear to be more suitable for clinical application.
Clinical implications
This section discusses the clinical implications and potentially actionable insights derived from the SHAP analysis of the Compact model based on RF, XGBoost, and LightGBM algorithms. By examining the contribution of individual features to the model’s predictions, we aim to identify clinically relevant patterns and risk factors that may inform preoperative decision-making and patient-specific care strategies.
First, frailty-related indicators, such as fall history within six months, and TUGT time, were one of the major contributing factors for early postoperative complications. This finding aligns with previous studies and highlights the importance of assessing frailty and muscle waste in elderly patients before lumbar spine surgery [9, 10, 39, 40]. Dependency in daily living and poor nutritional status were also associated with a greater risk of early postoperative complications and transfusion, highlighting the need for perioperative fitness for better recovery [5, 41]. Therefore, in patients with multiple risk factors, it would be prudent to actively consider non-surgical treatment options first and reserve surgical intervention only for cases with absolute indications, such as the occurrence of neurological complications.
Interestingly, age and comorbidity burden were found to be less influential than expected, with only diabetes showing notable contributions to the outcome variables [42, 43]. Instead, preoperative cholesterol, kidney-related markers, and nutritional status played more significant roles, suggesting the importance of effective management of comorbidities in the geriatric population [41, 44]. Specifically, cholesterol levels may have emerged as a contributing factor probably due to their link with cardiovascular health [45]. This finding emphasizes the importance of lipid management, even in patients without a history of cardiovascular disease, as a potentially actionable strategy to mitigate surgical risks in elderly patients [46,47,48].
Regarding transfusion risk, preoperative hemoglobin level was identified as a primary determinant, which is consistent with contemporary clinical practices and literature [29, 49]. Furthermore, patients who underwent procedures involving multiple vertebral levels, had a higher likelihood of requiring transfusion during hospitalization. Poor nutritional status and prolonged TUGT time was also associated with increased risk, underscoring the importance of preoperative nutritional management [34, 50, 51]. Thus, sufficient correction of nutrition should be considered prior to elective LSS in elderly patients to reduce the likelihood of transfusion events.
Limitations and future work
This section outlines the limitations of the study and directions for future work. First, although statistical analyses have demonstrated that the developed ML models outperformed the ACS-NSQIP score in terms of AUROC, the generalizability of the findings is limited by the relatively small sample size (n = 261). The single-center design without external validation and the class imbalance in the dataset may have introduced considerable bias. Furthermore, the study included only patients who underwent elective surgery, raising concerns about potential selection bias. It is also possible that clinical decisions made by healthcare providers based on ASA classification or surgical planning may have led to an underestimation of the actual risk, further limiting the objectivity of the outcome labels.
The limited dataset size was largely due to the prospective enrollment of elderly patients and the extensive time and resources required to collect a wide range of questionnaire-based features. In particular, constructing the input for the Complex model necessitated a minimum of one hour of thorough, face-to-face interviews per patient to obtain data such as ADL, IADL, MNA, MMSE, and CCI. To address the associated burden, we developed the Compact model using only routinely available variables such as blood test results and clinically essential features. Notably, statistical analyses revealed no significant inferiority in predictive performance compared to the Complex model.
Therefore, this study not only demonstrates the value of ML in the proposed tasks as a proof-of-concept but also lays the groundwork for scalable and cost-effective research in larger cohorts. Although immediate clinical application may be limited, we believe that, with the development of multi-institutional, multiethnic datasets and sufficient validation through clinical trials, this approach holds promise for enabling individualized care for the geriatric population. Further integration into electronic health records or regulatory approval should also be followed.
Another downside of this study lies in the restricted scope of output variables. Due to the relatively small dataset, it was not feasible to develop models capable of predicting organ-specific complications. With a larger cohort in future research, such granularity may become attainable, allowing the model not only to assess the overall risk of complications but also to identify specific systems or organs that require heightened clinical attention. Additionally, in the geriatric population, long-term outcomes such as morbidity and mortality are of particular clinical relevance. The inability to evaluate these outcomes within the current study further limits its scope. Future investigations should therefore aim to expand not only the sample size but also the follow-up period to comprehensively assess both short- and long-term risks.
Conclusion
ML offers a data-driven approach that can integrate numerous factors and capture intricate, nonlinear relationships among them. In this study, we have developed ML algorithms that predict early postoperative complications and transfusion within the hospitalization period in elderly patients undergoing elective surgery for LSS and observed that they can provide higher predictive performance than well-established scoring systems such as ACS-NSQIP. Although the scope of the study is limited by the small cohort collected from a single institution, it may serve as a foundation for future larger-scale reasearch and hopefully clinical application.
Data availability
The dataset generated and analyzed during the current study is not publicly available due to patient privacy concerns and ethical restrictions but are available from the corresponding author upon reasonable request.
Abbreviations
- ALT:
-
Alanine aminotransferase
- ASA:
-
American Society of Anesthesiologist Physical Classification System
- AST:
-
Aspartate aminotransferase
- AUPRC:
-
Area under the precision-recall curve
- AUROC:
-
Area under the receiver-operating characteristic curve
- ADL:
-
Activities of daily living
- BMI:
-
Body mass index
- BUN:
-
Blood urea nitrogen
- CCI:
-
Charlson comorbidity index
- eGFR:
-
Estimated glomerular filtration rate
- GDS:
-
Geriatric depression scale
- IADL:
-
Instrumental activities of daily living
- LightGBM:
-
Light gradient boosted machine
- LSS:
-
Lumbar spinal stenosis
- mFI:
-
Modified frailty index
- ML:
-
Machine learning
- MMSE:
-
Mini-mental state examination
- MNA:
-
Mini nutritional assessment
- NPV:
-
Negative predictive value
- PPV:
-
Positive predictive value
- RF:
-
Random forest classifier
- ROC:
-
Receiver-operating characteristic
- SHAP:
-
Shapley additive explanations
- SVM:
-
Support vector machine
- TUGT:
-
Timed-up-and-go test
- XGBoost:
-
Extreme gradient boosting
References
Jensen RK, Jensen TS, Koes B, Hartvigsen J. Prevalence of lumbar spinal stenosis in general and clinical populations: a systematic review and meta-analysis. Eur Spine J. 2020;29:2143–63. https://doi.org/10.1007/s00586-020-06339-1.
Katz JN, Zimmerman ZE, Mass H, Makhni MC. Diagnosis and management of lumbar spinal stenosis: a review. JAMA. 2022;327(17):1688–99. https://doi.org/10.1001/jama.2022.5921.
Kwon JW, Moon SH, Park SY, Park SJ, Park SR, Suk KS, Kim HS, Lee BH. Lumbar spinal stenosis: review update 2022. Asian Spine J. 2022;16(5):789. https://doi.org/10.31616/asj.2022.0366.
Junjie L, Jiheng Y, Jun L, Haifeng Y. Comparison of unilateral biportal endoscopy decompression and microscopic decompression effectiveness in lumbar spinal stenosis treatment: a systematic review and meta-analysis. Asian Spine J. 2023;17(2):418. https://doi.org/10.31616/asj.2021.0527.
Chang SY, Son J, Park SM, Chang BS, Lee CK, Kim H. Predictive value of comprehensive geriatric assessment on early postoperative complications following lumbar spinal stenosis surgery: a prospective cohort study. Spine. 2020;45(21):1498–505. https://doi.org/10.1097/BRS.0000000000003597.
Christmas C, Makary MA, Burton JR. Medical considerations in older surgical patients. J Am Coll Surg. 2006;203(5):746–51. https://doi.org/10.1016/j.jamcollsurg.2006.08.006.
Sobottke R, Aghayev E, Röder C, Eysel P, Delank SK, Zweig T. Predictors of surgical, general and follow-up complications in lumbar spinal stenosis relative to patient age as emerged from the spine Tango registry. Eur Spine J. 2012;21:411–7. https://doi.org/10.1007/s00586-011-2016-y.
Choi JH, Birring PS, Lee J, Hashmi SZ, Bhatia NN, Lee YP. A comparison of Short-Term outcomes after surgical treatment of multilevel degenerative cervical myelopathy in the geriatric patient population: an analysis of the National surgical quality improvement program database 2010–2020. Asian Spine J. 2024;18(2):190. https://doi.org/10.31616/asj.2023.0276.
Subramaniam S, Aalberg JJ, Soriano RP, Divino CM. New 5-factor modified frailty index using American college of surgeons NSQIP data. J Am Coll Surg. 2018;226(2):173–81. https://doi.org/10.1016/j.jamcollsurg.2017.11.005.
Ali R, Schwalb JM, Nerenz DR, Antoine HJ, Rubinfeld I. Use of the modified frailty index to predict 30-day morbidity and mortality from spine surgery. J Neurosurg Spine. 2016;25(4):537–41. https://doi.org/10.3171/2015.10.SPINE14582.
Bilimoria KY, Liu Y, Paruch JL, Zhou L, Kmiecik TE, Ko CY, Cohen ME. Development and evaluation of the universal ACS NSQIP surgical risk calculator: a decision aid and informed consent tool for patients and surgeons. J Am Coll Surg. 2013;217(5):833–42. https://doi.org/10.1016/j.jamcollsurg.2013.07.385.
Horvath B, Kloesel B, Todd MM, Cole DJ, Prielipp RC. The evolution, current value, and future of the American society of anesthesiologists physical status classification system. Anesthesiology. 2021;135(5):904–19. https://doi.org/10.1097/ALN.0000000000003947.
Charlson ME, Carrozzino D, Guidi J, Patierno C. Charlson comorbidity index: a critical review of clinimetric properties. Psychother Psychosom. 2022;91(1):8–35. https://doi.org/10.1159/000521288.
Katz S. Assessing self-maintenance: activities of daily living, mobility, and instrumental activities of daily living. J Am Geriatr Soc. 1983;31(12):721–7. https://doi.org/10.1111/j.1532-5415.1983.tb03391.x.
Tombaugh TN, McIntyre NJ. The mini-mental state examination: a comprehensive review. J Am Geriatr Soc. 1992;40(9):922–35. https://doi.org/10.1111/j.1532-5415.1992.tb01992.x.
Vellas B, Guigoz Y, Garry PJ, Nourhashemi F, Bennahum D, Lauque S, Albarede JL. The Mini nutritional assessment (MNA) and its use in grading the nutritional state of elderly patients. Nutrition. 1999;15(2):116–22. https://doi.org/10.1016/S0899-9007(98)00171-3.
Shehab M, Abualigah L, Shambour Q, Abu-Hashem MA, Shambour MK, Alsalibi AI, Gandomi AH. Machine learning in medical applications: A review of state-of-the-art methods. Comput Biol Med. 2022;145:105458. https://doi.org/10.1016/j.compbiomed.2022.105458.
Vellido A. The importance of interpretability and visualization in machine learning for applications in medicine and health care. Neural Comput Appl. 2020;32(24):18069–83. https://doi.org/10.1007/s00521-019-04051-w.
Xu F, Uszkoreit H, Du Y, Fan W, Zhao D, Zhu J, Explainable AI. A brief survey on history, research areas, approaches and challenges. InNatural language processing and Chinese computing: 8th cCF international conference, NLPCC 2019, dunhuang, China, October 9–14, 2019, proceedings, part II 8 2019 (pp. 563–574). Springer International Publishing. https://doi.org/10.1007/978-3-030-32236-6_51.
Breiman L. Random forests. Mach Learn. 2001;45:5–32.
Chen T, Guestrin C, Xgboost. A scalable tree boosting system. InProceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining 2016 Aug 13 (pp. 785–794). https://doi.org/10.1145/2939672.2939785.
Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu TY. Lightgbm: A highly efficient gradient boosting decision tree. Adv Neural Inf Process Syst. 2017;30.
Frezza B, Nurchis MC, Capolupo GT, Carannante F, De Prizio M, Rondelli F, Fegatelli DA, Gili A, Lepre L, Costa G. A comparison of machine Learning-Based models and a simple clinical bedside tool to predict morbidity and mortality after Gastrointestinal Cancer surgery in the elderly. Bioengineering. 2025;12(5):544.
Oosterhoff JH, Karhade AV, Oberai T, Franco-Garcia E, Doornberg JN, Schwab JH. Prediction of postoperative delirium in geriatric hip fracture patients: a clinical prediction model using machine learning algorithms. Geriatric Orthop Surg Rehabilitation. 2021;12:21514593211062277.
Liu Y, Shen W, Tian Z. Using machine learning algorithms to predict high-risk factors for postoperative delirium in elderly patients. Clin Interventions Aging 2023 Dec 31:157–68.
Peng X, Zhu T, Wang T, Wang F, Li K, Hao X. Machine learning prediction of postoperative major adverse cardiovascular events in geriatric patients: a prospective cohort study. BMC Anesthesiol. 2022;22(1):284.
Peng X, Zhu T, Chen Q, Zhang Y, Zhou R, Li K, Hao X. A simple machine learning model for the prediction of acute kidney injury following noncardiac surgery in geriatric patients: a prospective cohort study. BMC Geriatr. 2024;24(1):549.
Zhang AS, Veeramani A, Quinn MS, Alsoof D, Kuris EO, Daniels AH. Machine learning prediction of length of stay in adult spinal deformity patients undergoing posterior spine fusion surgery. J Clin Med. 2021;10(18):4074. https://doi.org/10.3390/jcm10184074.
Durand WM, DePasse JM, Daniels AH. Predictive modeling for blood transfusion after adult spinal deformity surgery: a tree-based machine learning approach. Spine. 2018;43(15):1058–66. https://doi.org/10.1097/BRS.0000000000002515.
Joshi RS, Lau D, Ames CP. Artificial intelligence for adult spinal deformity: current state and future directions. Spine J. 2021;21(10):1626–34. https://doi.org/10.1016/j.spinee.2021.04.019.
Joshi RS, Lau D, Scheer JK, Serra-Burriel M, Vila-Casademunt A, Bess S, Smith JS, Pellise F, Ames CP. State-of-the-art reviews predictive modeling in adult spinal deformity: applications of advanced analytics. Spine Deform. 2021;9(5):1223–39. https://doi.org/10.1007/s43390-021-00360-0.
Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Adv Neural Inf Process Syst. 2017;30.
Montorio I, Izal M. The geriatric depression scale: a review of its development and utility. Int Psychogeriatr. 1996;8(1):103–12. https://doi.org/10.1017/S1041610296002505.
Clavien PA, Barkun J, De Oliveira ML, Vauthey JN, Dindo D, Schulick RD, De Santibañes E, Pekolj J, Slankamenac K, Bassi C, Graf R. The Clavien-Dindo classification of surgical complications: five-year experience. Ann Surg. 2009;250(2):187–96. https://doi.org/10.1007/978-1-4471-4354-3_3.
Akiba T, Sano S, Yanase T, Ohta T, Koyama M, Optuna. A next-generation hyperparameter optimization framework. InProceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining 2019 Jul 25 (pp. 2623–2631). https://doi.org/10.1145/3292500.3330701.
Fluss R, Faraggi D, Reiser B. Estimation of the Youden index and its associated cutoff point. Biom J. 2005;47(4):458–72. https://doi.org/10.1002/bimj.200410135.
Woodman RJ, Mangoni AA. A comprehensive review of machine learning algorithms and their application in geriatric medicine: present and future. Aging Clin Exp Res. 2023;35(11):2363–97.
Choudhury A, Renjilian E, Asan O. Use of machine learning in geriatric clinical care for chronic diseases: a systematic literature review. JAMIA Open. 2020;3(3):459–71.
Hebert JJ, Abraham E, Wedderkopp N, Bigney E, Richardson E, Darling M, Hall H, Fisher CG, Rampersaud YR, Thomas KC, Jacobs WB. Preoperative factors predict postoperative trajectories of pain and disability following surgery for degenerative lumbar spinal stenosis. Spine. 2020;45(21):E1421–30. https://doi.org/10.1097/BRS.0000000000003587.
Schonnagel L, Zhu J, Camino-Willhuber G, Guven AE, Tani S, Caffard T, Haffer H, Muellner M, Chiapparelli E, Arzani A, Amoroso K. Relationship between lumbar spinal stenosis and axial muscle wasting. Spine J. 2024;24(2):231–8. https://doi.org/10.1016/j.spinee.2023.09.020.
Martinez-Ortega AJ, Piñar-Gutiérrez A, Serrano-Aguayo P, González-Navarro I, Remón-Ruíz PJ, Pereira-Cunill JL, García-Luna PP. Perioperative nutritional support: a review of current literature. Nutrients. 2022;14(8):1601. https://doi.org/10.3390/nu14081601.
Elsamadicy AA, Havlik JL, Reeves B, Sherman J, Koo AB, Pennington Z, Hersh AM, Sandhu MR, Kolb L, Lo SF, Shin JH. Assessment of frailty indices and Charlson comorbidity index for predicting adverse outcomes in patients undergoing surgery for spine metastases: a National database analysis. World Neurosurg. 2022;164:e1058–70. https://doi.org/10.1016/j.wneu.2022.05.101.
Zhang X, Hou A, Cao J, Liu Y, Lou J, Li H, Ma Y, Song Y, Mi W, Liu J. Association of diabetes mellitus with postoperative complications and mortality after non-cardiac surgery: a meta-analysis and systematic review. Front Endocrinol. 2022;13:841256. https://doi.org/10.3389/fendo.2022.841256.
Schiesser M, Kirchhoff P, Müller MK, Schäfer M, Clavien PA. The correlation of nutrition risk index, nutrition risk score, and bioimpedance analysis with postoperative complications in patients undergoing Gastrointestinal surgery. Surgery. 2009;145(5):519–26. https://doi.org/10.1016/j.surg.2009.02.001.
Boekholdt SM, Arsenault BJ, Mora S, Pedersen TR, LaRosa JC, Nestel PJ, Simes RJ, Durrington P, Hitman GA, Welch KM, DeMicco DA. Association of LDL cholesterol, non–HDL cholesterol, and Apolipoprotein B levels with risk of cardiovascular events among patients treated with statins: a meta-analysis. JAMA. 2012;307(12):1302–9. https://doi.org/10.1001/jama.2012.366.
Mesregah MK, Mgbam P, Fresquez Z, Wang JC, Buser Z. Impact of chronic hyperlipidemia on perioperative complications in patients undergoing lumbar fusion: a propensity score matching analysis. Eur Spine J. 2022;31(10):2579–86. https://doi.org/10.1007/s00586-022-07333-5.
Lavu MS, Eghrari NB, Makineni PS, Kaelber DC, Savage JW, Pelle DW. Low-density lipoprotein cholesterol and Statin usage are associated with rates of pseudarthrosis following single-level posterior lumbar interbody fusion. Spine. 2024;49(6):369–77. https://doi.org/10.1097/BRS.0000000000004895.
Meyer AC, Eklund H, Hedström M, Modig K. The ASA score predicts infections, cardiovascular complications, and hospital readmissions after hip fracture-A nationwide cohort study. Osteoporos Int. 2021;32(11):2185–92. https://doi.org/10.1007/s00198-021-05956-w.
Barrie U, Youssef CA, Pernik MN, Adeyemo E, Elguindy M, Johnson ZD, El Ahmadieh TY, Akbik OS, Bagley CA, Aoun SG. Transfusion guidelines in adult spine surgery: a systematic review and critical summary of currently available evidence. Spine J. 2022;22(2):238–48. https://doi.org/10.1016/j.spinee.2021.07.018.
White SJ, Cheung ZB, Ye I, Phan K, Xu J, Dowdell J, Kim JS, Cho SK. Risk factors for perioperative blood transfusions in adult spinal deformity surgery. World Neurosurg. 2018;115:e731–7. https://doi.org/10.1016/j.wneu.2018.04.152.
Xu S, Xiong X, Li T, Hu P, Mao Q. Preoperative low serum albumin increases the rate of perioperative blood transfusion in patients undergoing total joint arthroplasty: propensity score matching. BMC Musculoskelet Disord. 2024;25(1):695. https://doi.org/10.1186/s12891-024-07811-5.
Acknowledgements
Not applicable.
Funding
The authors declare that this research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Author information
Authors and Affiliations
Contributions
SYC, HK, and BSC conceptualized the study, collected patient data, and performed formal analysis. SYC conducted statistical analyses of the data. WR developed and evaluated the machine learning models presented in the research. WR was a major contributor in writing the manuscript, and SYC, HK, and BSC contributed in review and editing of the manuscript. All authors have read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. Approval from the Institutional Review Board of Seoul National University College of Medicine and Seoul National University Hospital was obtained (Approval No.: H-1504-122-668), and informed consent was received from each participant upon enrollment to the study.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Rhee, W., Chang, S.Y., Chang, BS. et al. Prediction of early postoperative complications and transfusion risk after lumbar spinal stenosis surgery in geriatric patients: machine learning approach based on comprehensive geriatric assessment. BMC Med Inform Decis Mak 25, 279 (2025). https://doi.org/10.1186/s12911-025-03125-1
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1186/s12911-025-03125-1






