Background

Idiopathic nephrotic syndrome (INS) is a common childhood glomerular disease characterized by macroalbuminuria, hypoproteinemia, edema, and hyperlipidemia. Although the outcomes for most patients with INS are positive, a subset of cases may progress to kidney damage and end-stage renal disease, thereby endangering the health and lives of children [1]. Infection, thrombosis, electrolyte imbalance, and acute kidney injury (AKI) are common complications of INS, with AKI being the most serious [2]. Children with INS are at risk of chronic kidney disease (CKD), which can be promoted by AKI, leading to a worse prognosis [3]. However, as most existing studies on AKI focus on the adult population, accurate epidemiological and clinical analyses of AKI in children with INS are limited. AKI morbidity in hospitalized children with INS in the United States has increased from 3.3% to 8.5% between 2000 and 2009 [4]. In 2020, a nationwide study in Korea indicated that the incidence of AKI in 814 hospitalized children with INS was 16.2% [5]. Furthermore, in 2024, a Chinese study reported that the morbidity rate of AKI in 172 hospitalized children with steroid-resistant INS was 22.7% [6]. These studies indicate that the incidence rate of AKI is increasing, suggesting that early AKI identification is urgent.

Both the risk and diagnostic prediction of AKI are crucial for assessing and improving the prognosis of INS. However, because the diagnosis of AKI in children with INS mostly depends on professional pediatricians, accurate and effective methods are urgently needed. Recently, numerous risk factors have been associated with AKI in INS [7], encompassing dehydration [8], infection [9], posterior reversible encephalopathy [10], persistent proteinuria, and exposure to nephrotoxic medication [11,12,13], However, because AKI is complex in INS and is affected by numerous factors, effective biomarkers for AKI diagnosis are lacking.

Recently, machine learning (ML) algorithms have been developed into powerful tools for various healthcare fields, including disease screening, diagnosis, and prognosis, which can enhance patient safety and reduce medical costs [14]. Numerous ML methods have been used to create AKI prediction models, and these approaches exhibited strong predictive capabilities and are applicable to both adult [15] and pediatric [16] patients with critical illnesses. However, despite the strengths of ML approaches stemming from their sophisticated models, these methods fail to precisely predict the development of AKI in pediatric patients with INS over time, which could be attributed to the scarcity of research conducted with large sample sizes.

Therefore, this study aimed to develop an interpretable ML model designed for multicenter retrospective cohort studies, which focused on enabling early and precise prediction of AKI in children with INS, while also highlighting the significance of various features. This objective was accomplished by conducting both internal and external validations.

Methods

Patients and definitions

This study aimed to develop an ML model that can be used in multicenter retrospective cohort studies. A flowchart of the study is shown in Fig. 1. This retrospective observational study reviewed 3,390 hospitalized patients diagnosed with INS at the Children’s Hospital of Chongqing Medical University, Chongqing, China, from January 2012 to December 2022 as a derivation and internal validation cohort. The inclusion criteria were as follows: (1) fitting the diagnostic criteria for INS of The Subspecialty Group of Renal Diseases, the Society of Pediatrics, Chinese Medical Association: (a) the qualitative test of morning urine protein is (+++) to (++++) on three or more occasions within one week, the ratio of morning urine protein to creatinine (mg/mg) is ≥ 20, or the 24-hour urine protein excretion is ≥ 50 mg/kg; (b) serum albumin ≤ 25 g/L; (c) serum cholesterol > 5.7 mmol/L; and (d) edema of varying degrees. Both the first and second criteria were simultaneously fulfilled, and secondary and hereditary factors were ruled out [17]. (2) Aged 6 months to 17 years and had complete clinical and laboratory information. (3) No congenital and secondary nephrotic syndromes. An external dataset consisting of 356 patients admitted to three independent hospitals across China (Xuzhou Children’s Hospital, Chengdu Women’s and Children’s Central Hospital, and Chongqing University Three Gorges Hospital) between January 2022 and December 2022 was used for external validation. The inclusion and exclusion criteria were identical to those used for the internal cohort.

Fig. 1
figure 1

Flow chat of the study cohort

Demographic characteristics, vital sign data, and laboratory test results were collected within 48 h of patient admission to determine significant features and construct a predictive model. AKI was defined and classified in accordance with the 2012 Kidney Disease Improving Global Outcomes guidelines [18]. Considering oliguria in the INS, serum creatinine (SCr) is the only decisive factor for the diagnosis of AKI. The latest SCr value within 6 months was defined as the baseline SCr. If SCr was not available, the lowest SCr value during hospitalization was defined as the baseline. AKI stage 1 was defined as mild AKI, whereas AKI stages 2 and 3 were defined as severe AKI. Kidney dysfunction for > 3 months is known as CKD and includes five stages depending on the glomerular filtration rate [19]. CKD stages 3, 4, and 5 were defined as severe CKD.

This study was performed in accordance with the Declaration of Helsinki. It was approved by the Institutional Review Board of the Children’s Hospital of Chongqing Medical University (No. 2023 − 487), Xuzhou Children’s Hospital (No. 2024-05-57-H57), Chengdu Women’s and Children’s Central Hospital (No. 2025(63)), and Chongqing University Three Gorges Hospital (No. 2025(010)). All the patients provided informed consent.

Data collection

Demographic characteristics and clinical and laboratory data were collected from electronic medical records. The demographic characteristics and clinical dataset included sex, age at onset, complications, clinical classification, pathological type, medication exposure, weight, height, admission times, length of hospital stay, and use of hemopurification. Nephrotoxic anbiotics includes cefazoline, sulfonamide, levofloxacin, vancomycin, amphotericin, voriconazole and rifampicin. The laboratory dataset consists of white blood cell count, red blood cell count, platelet (PLT), hemoglobin, total cholesterol, triglyceride, high-density lipoprotein (HDL), low-density lipoprotein (LDL), blood glucose, total bilirubin, serum albumin (ALB), serum globulin, alanine transaminase (ALT), aspartate transaminase (AST), lactic dehydrogenase (LDH), alkaline phosphatase, serum natrium, serum potassium, serum calcium, serum phosphate, serum chlorine, serum magnesium, uric acid, SCr, baseline SCr, blood urea nitrogen (BUN), baseline BUN, urine pH, and 24-hour proteinuria. During the subsequent analysis, features with a missing value ratio of > 25% were removed to minimize the bias that incomplete data could introduce.

Model development, validation, and explanation

The dataset obtained from the Children’s Hospital of Chongqing Medical University cohort was divided into different portions after being balanced using SMOTE and Tomek links. Specifically, 70% of the data was allocated for training purposes, 10% for testing, and the remaining 20% was used for internal validation to prevent overfitting. Furthermore, an independent external dataset gathered from three hospitals across China was used for testing (external validation).

All demographic characteristics, as well as clinical and laboratory data, were used as variables in the model. Missing data were handled using the multiple imputation method. Six methods (Chi-square test, Pearson correlation coefficient, LASSO regression, mutual information, recursive feature elimination, and Random Forest [RF]) were used to identify biomarkers closely related to AKI, thereby enhancing the predictive performance of the model and reducing the risk of overfitting. The selection frequency of each variable was computed using these six methods, and the top 15 variables with the highest frequencies were selected as the final key feature set.

Based on the 15 key features, five different ML models were created: Logistic Regression (LR), RF, K-nearest neighbors (KNN), Naïve Bayes (NB), and Support Vector machines (SVM) to establish a stacking integrated learning model. This model used RF, KNN, NB, and SVM as the basic learners and LR as the meta-learner to integrate the prediction results of the basic learners and enhance the overall prediction ability of the model. A diverse set of evaluation metrics was used to thoroughly assess the performance of the developed model. The sensitivity, specificity, accuracy, area under the curve (AUC), area under the precision-recall curve (AUCPR), positive predictive value (PPV), negative predictive value (NPV), and balanced accuracy were computed using R software (version 4.4.1, https://www.r-project.org) to evaluate predictive effectiveness. Additionally, during the verification process, both five- and ten-fold cross-validations were performed to confirm the reliability of the prediction model.

To enhance the transparency and reliability of the models in real-world applications, this study employed the SHapley Additive Explanations (SHAP) algorithm to conduct an in-depth analysis of the internal workings of each ML model. The ARF model was used to predict the morbidity of severe CKD in the AKI cohort.

Statistical analyses

All statistical analyses were performed using the R software. In descriptive statistics, continuous variables that adhered to a normal distribution were presented as the mean ± standard deviation, whereas those that did not follow a normal distribution were described using the median along with the first (Q1 or P25) and third quartile (Q3 or P75). Categorical variables were reported as frequencies and percentages. All statistical analyses were conducted using two-tailed tests, and a P-value < 0.05 indicated a significant difference between the groups.

Results

Baseline characteristics

Among the 3,390 patients with INS, 437 were diagnosed with AKI (incidence rate, 12.9%). The baseline demographic characteristics and clinical and laboratory data are shown in Tables 1 and 2, respectively. A total of 59 baseline features were collected, and the baseline levels of 47 features, including age at onset, infection status, clinical classification, pathological classification, medication exposure, neutrophil count, and serum creatinine level, were significantly different between the AKI and non-AKI cohorts. Respiratory tract infections were the most common infections in both groups, whereas serious infections occurred more frequently in the AKI group than in the non-AKI group. More cases in the AKI cohort presented as steroid-resistant nephrotic syndrome than in the non-AKI cohort. A total of 88 individuals received renal biopsies, of which seven were in the AKI cohort. Within this subgroup, one patient exhibited minimal change disease (MCD), four were diagnosed with focal segmental glomerulosclerosis (FSGS), and two had mesangial proliferative glomerulonephritis. Among the 81 patients assigned to the non-AKI cohort, pathological evaluations identified 54 cases of MCD, 14 cases of FSGS, 13 cases of mesangial proliferative glomerulonephritis, two cases of IgA nephropathy, one case of membranous nephropathy, one case of lipoprotein glomerulopathy, and one case of C1q nephropathy. More patients in the AKI cohort underwent hemodialysis or haemodiafiltration than in the non-AKI cohort. Moreover, given the recent growing application of rituximab in INS, a higher utilization rate of rituximab was observed in the AKI group than in the non-AKI group. Most cases were stage 1 and stage 2 AKI, whereas patients in stage 3 were older than those in stages 1 and 2 (Supplementary Table S1).

Table 1 Clinical and demographic characteristic of the cases in two groups
Table 2 Laboratory characteristic of the cases in two groups

Selection of key features

As the relative importance of each feature depended on the contribution to the prediction of AKI in patients with INS, 15 key features were selected, including systolic pressure, urine pH, exposure to nephrotoxic antibiotics, admission times, length of hospital stay, LDL, diastolic pressure (DBP), PLT, eosinophilia, exposure to cyclophosphamide, respiratory tract infection, exposure to furosemide, and exposure to tacrolimus and triglycerides (Supplementary Figure S1).

Identification of the final model and model performance

Based on the six ML models and key features selected, the stacking model demonstrated the optimal performance in predicting mortality for patients with INS and AKI, which reached an accuracy of 0.858, a sensitivity of 0.789, a specificity of 0.815, an AUC of 0.888, an AUCPR of 0.892, a PPV of 0.816, and a NPV of 0.787 (Fig. 2, Supplementary Table S2).

Fig. 2
figure 2

Performance of all machine learning models to predict AKI. Receiver operating characteristic (ROC) curve (a) and precision recall (PR) curve (b) of every model

A clinical decision curve analysis approach was used to assess the clinical applicability of each model. Across a broad spectrum of threshold probabilities, the advantages of the predictive model exceeded those of the two extreme curves, indicating that the stacking model had the strongest clinical utility. A calibration curve was used to assess the accuracy of each prediction model. The curve of the stacking model was most closely aligned with the 45° diagonal line, indicating a satisfactory calibration level (Supplementary Figure S2).

The performance of the model was evaluated using both the internal and external validation datasets from three other hospitals in different regions of China. The stacking model continued to exhibit outstanding performance, achieving an AUC of 0.822 in the external validation set (Fig. 3).

Fig. 3
figure 3

External validation of all machine learning models to predict AKI. Receiver operating characteristic (ROC) curve (a) and precision recall (PR) curve (b) of every model

Additional cross-validation was performed to establish a suitable sample size for this study and evaluate the resilience of the stacking model to variations across different sites. The final stacking model attained mean AUC scores of 0.893 (95% confidence interval [CI], 0.885–0.901) in the five-fold cross-validation and 0.894 (95% CI, 0.887–0.902) in the ten-fold cross-validation (Supplementary Figure S3).

Model explanation

Given that clinicians struggle to accept predictive models that cannot be directly explained or understood, the SHAP method was used to interpret the final stacking model output. This involved calculating the contribution of each variable to the prediction, thereby providing clarity regarding the outcomes of the model.

The SHAP summary plot illustrates the relative impact of each feature on the model output within the internal validation set ranked in descending order according to the mean absolute SHAP value (Fig. 4a). Positive SHAP values indicate an increased risk of AKI, with higher SHAP values corresponding to a greater risk. The top five features that were positively correlated with AKI were exposure to nephrotoxic antibiotics, cyclophosphamide, respiratory tract infection, urine pH, and admission times (Fig. 4b).

Fig. 4
figure 4

Global model explanation by the SHAP method. The mean absolute SHAP value summary plot indicates the mean impact of each feature on the prediction outcome for AKI (a). The SHAP summary plot of 15 features included in the Stacking model (b). A SHAP value above zero suggests an elevated likelihood of AKI, with a higher SHAP value corresponding to a greater risk of AKI. In the internal validation dataset, each sample is illustrated by a point for every feature. The color of each sample’s point is dictated by its feature value, while vertical density is represented through accumulation

The SHAP force and waterfall plots not only displayed the predicted probability of AKI but also showed how an individual’s characteristics influenced the model outputs, providing a personalized interpretation for AKI prediction. For example, Fig. 5a and c present a positive prediction of AKI (the predicted probability was 74%), indicating that exposure to tacrolimus, cyclophosphamide, nephrotoxic antibiotics, respiratory tract infection, and urine pH are key factors associated with a higher predicted risk of AKI. These indicators suggest that nephrotoxic drugs and infections may contribute significantly to a high-risk profile. Figure 5b and d show a negative prediction of AKI (the predicted probability was 7%).

Fig. 5
figure 5

Local model explanation by the SHAP method. Force Plots and Waterfall plot of risks contributed by each feature for individual child at high (a, c) or low (b, d) risk of developing AKI

Web server of the model

To enhance the application of the stacking model, it was implemented on a website that provides AKI risk predictions in patients with INS. This tool was used to predict the probability of AKI occurrence in two distinct cases of INS. The predicted probabilities were 95% (Fig. 6a) and 4% (Fig. 6b), respectively. The website was created at http://94.191.23.246:5003.

Fig. 6
figure 6

Convenient application for clinical utility. The stacking model, which incorporates 15 features for convenient use, can serve as a tool for predicting AKI. Upon entering the actual values of these 15 features, the application will automatically present the predicted probability at 95% (a) or 4% (b), corresponding to the outcomes of AKI (a) or non-AKI (b)

Morbidity prediction of severe CKD in the AKI corhort

After a 6-month follow-up, 15 and 29 patients in the AKI and non-AKI cohorts, respectively, developed severe CKD. Fifteen key feature variables were used to predict CKD in patients with AKI and investigate the risk factors associated with CKD development following AKI (Table 3). However, the baseline levels of three features (ALT, AST, and serum phosphate) differed significantly between the AKI-CKD and non-AKI-CKD cohorts. Compared with the non-AKI-CKD group, patients in the AKI-CKD group presented with higher serum lipid and phosphate levels and lower serum calcium, ALT, and AST levels.

Table 3 Key characteristic of the cases with severe CKD in AKI cohort

An RF model was used to predict severe CKD in patients with INS and AKI, which reached a sensitivity of 0.884, specificity of 0.975, AUC of 0.976, and AUCPR of 0.949 in the testing set (Supplementary Table S3).

Discussion

The precise etiology and pathogenesis of INS remain unclear. INS is involved in immune-mediated inflammation, and AKI is the most severe clinical complication of INS [20, 21]. The occurrence of AKI can also impede the alleviation of INS symptoms [22]. Following the onset of AKI, patients are at risk of developing chronic renal insufficiency and may require prolonged renal replacement therapy, irrespective of renal function normalization [23]. To the best of our knowledge, this investigation is the first large-scale multicenter study of hospitalized Chinese children with INS conducted to date, focusing on a comprehensive analysis of AKI prediction. This study used six ML models and targeted applications within the pediatric population with INS. A range of predictive risk factors was determined, and a prediction model for patients with INS was developed. This model integrates clinical and laboratory information that can be readily obtained from an electronic medical record system through the application of ML techniques.

Although precise epidemiological data are currently unavailable, the literature suggests that the reported incidence rates differ across studies. A study of 17 pediatric nephrology centers across the United States revealed that nearly 58.6% of 336 children with INS experienced AKI [10]. A nationwide study in Japan indicated that the incidence rate of severe AKI in children with INS at the first onset was 24% [24]. In this study, the incidence of AKI across China was 12.9%, which differs from the findings of other studies. This discrepancy may be attributed to variations in regional characteristics and ethnic factors.

The combination of AKI and INS may be associated with advancing age [10, 25], a finding that was corroborated in this study. Children in the AKI group exhibited a higher average age of onset than those in the non-AKI group. This study found that men have a higher susceptibility to AKI, which is consistent with earlier findings [26]. Patients in the AKI group tended to remain in the hospital for a longer duration than those in the non-AKI group [27]. This prolonged hospitalization not only results in elevated medical expenses but also increases the risk of additional complications. Consistently, in this dataset, children diagnosed with AKI had an average hospital stay of 10 days and four hospital admissions, whereas those without AKI had a hospital stay of 7 days and two hospital admissions. This finding could be attributed to the observation that patients in the AKI group are frequently steroid-resistant [28], with most pathological types being non-minimal change diseases [29].

Hypertension and infection may serve as significant risk factors for the development of AKI and could potentially lead to the advancement of severe CKD, including end-stage renal disease [8, 30], both of which are more commonly observed in patients with AKI, consistent with previous findings. Thus, managing infection and blood pressure plays a crucial role in decreasing the incidence of AKI. Exposure to nephrotoxic drugs, such as renin-angiotensin-aldosterone system inhibitors [31], diuretics [32], calcineurin inhibitors [33], and nephrotoxic antibiotics [34], is associated with the progression of AKI. As the use of biological agents for INS increases [35], several studies have indicated that rituximab can be safely administered to patients with INS and AKI [36]. However, this study revealed that the administration of rituximab was considerably more frequent in the AKI group than in the non-AKI group, despite the relatively small sample size. Additional investigations involving larger cohorts are necessary to confirm these findings.

Notably, compared to the non-AKI group, patients in the AKI group exhibited decreased levels of blood calcium and urine pH, along with increased levels of blood phosphorus, blood potassium, and blood uric acid. Persistent proteinuria and severe hypoalbuminemia occur more frequently in patients with AKI than in those without AKI [30]. In contrast, other reports have indicated no significant differences in serum albumin levels between individuals with and without AKI [6]. In this study, the AKI group exhibited lower serum albumin levels; however, these findings were not significant. The 24-hour urinary protein levels were significantly higher in the AKI group. A systematic evaluation revealed that higher serum uric acid concentrations are associated with an increased likelihood of AKI [22]. Moreover, patients with INS presenting with serum uric acid levels of 390 µmol/L or greater upon admission have a heightened risk of developing AKI [6]. A reduction in urine pH may be associated with AKI, which can result from increased serum uric acid levels and oxygen deficiency in kidney tissues [36].

AKI severity positively correlates with the duration of renal function recovery [37]. Notably, in this cohort, 15 patients progressed to severe CKD (stage 3 or higher), highlighting the crucial need for timely identification and management of AKI. ALT and AST levels were used as predictive factors for severe CKD in pediatric patients with AKI, which has been documented in adult patients. Clinical studies have revealed that serum ALT and AST levels progressively decline in patients with CKD as their condition advances [38, 39]. A study on patients with type 2 diabetes mellitus in China revealed that an elevated AST/ALT ratio serves as an independent risk factor for diabetic kidney disease, consistent with the positive correlation between this ratio and CKD observed in the present study. The underlying mechanism may involve the positive association between elevated AST/ALT ratio and pro-inflammatory cytokines such as tumor necrosis factor-alpha, interleukin-4, and interleukin-6, which potentially mediate the inflammatory pathway leading to CKD development [40]. Further in-depth research is warranted regarding these aspects in pediatric nephropathy-associated CKD.

The ultimate model incorporated 15 easily accessible variables, which could be accessed through the implementation of an electronic medical record system across hospitals. This sophisticated yet comprehensible predictive tool can assist clinicians in identifying high-risk patients at an early stage while also providing personalized insights into significant risk factors. Consequently, it provides a vital opportunity for timely intervention and enhances informed decision-making. Furthermore, the model leverages the largest sample size of Chinese children diagnosed with INS to date, comprising data from four hospitals throughout China, thereby significantly enhancing its statistical power and reliability. However, one limitation of this model is that most samples originated from hospitals located in western China, which may lead to bias.

Conclusions

This study designed and evaluated a straightforward and comprehensible ML model for predicting AKI in real-time among hospitalized children with INS. This model used routinely gathered data from EHR systems, demonstrating strong predictive performance in identifying children with INS who had a high likelihood of developing AKI, as validated by a cohort sourced from four distinct hospitals. The broad implementation of this model in a clinical setting could serve as a practical, efficient, and economical method for preventing AKI. Additionally, it can be applied across a broader range of healthcare services and benefit a larger demographic.