Introduction

The HELLP syndrome, first reported in 1982, is a rare and sudden complication occurring in women during pregnancy or after childbirth [1,2,3]. This syndrome represents three complications of hemolysis, increased liver enzymes, and low platelet count, affecting 0.2–0.6% of pregnancies worldwide [1, 4]. Since the causes and pathogenesis of HELLP syndrome are not yet fully known and well understood, distinguishing it from other pregnancy-related disorders is complicated [1, 5, 6]. Furthermore, late diagnosis leads to a delay in treatment, which challenges disease management [5, 6]. In addition to its side effects, including intravascular coagulation, placental abruption, pulmonary edema, and retinal detachment, this syndrome causes perinatal complications like death [2, 4]. Consequently, timely diagnosis of HELLP syndrome is vital and life-saving to prevent complications in the mother and fetus [7].

The related studies usually employed a limited number of HELLP patients and did not provide comprehensive information regarding the prognostic factors for adverse outcomes in patients [2]. While some scholars have described HELLP syndrome as an advanced type of pregnancy poisoning, others considered it of a different nature [6, 8]. Given such discrepancy in the findings, HELLP syndrome is required to be recognized as a distinct entity from other related complications during pregnancy [9]. Although some algorithms are employed using the mother’s biochemical and clinical changes in early pregnancy to predict pregnancy poisoning, no accurate algorithm has been found to predict the HELLP syndrome. Considering the controversial relationship between this syndrome and preeclampsia, this syndrome is a hypertensive pregnancy disorder with a more severe inflammatory reaction compared to preeclampsia [8]. The HELLP can be different from preeclampsia because 15–20% of patients with HELLP do not have antecedent hypertension or proteinuria [10].

The occurrence prediction and early diagnosis of pregnancy-related diseases enable physicians to take preventive measures and try more effective and risk-based pregnancy care pathways [11, 12]. Clinical decisions are often made based on the intuition and experience of practitioners, not based on the rich information found in scientific databases. This practice leads to unwanted bias, errors, and excessive medical costs, which affects the quality of services provided to patients [13]. The patient records are among the main sources for conducting medical research [9]. Given the exponential development of medical databases and resources, extracting knowledge from all available data using traditional processing and analysis methods is time-consuming. The informatic tools play a significant role in analyzing these data to provide meaningful tools, such as data mining, for diagnosis, prognosis, and treatment purposes. Data mining uses various techniques to extract valuable information or knowledge from data. These techniques can be employed to collect data from all fields of science, including medicine [14,15,16]. Data mining consists of the process of determining and analyzing hidden information to discover useful knowledge. In this vein, discovering hidden patterns through data mining has significantly improved our understanding of disease diagnosis, progression, management, quality of care, and clinical decision-making by medical professionals (as the main factor of success in the healthcare process) [16,17,18,19]. Insights obtained from data mining indicate that maintaining a high level of care can affect cost, income, and operational efficiency [1]. The purpose of the development of a model in data mining projects is to discover knowledge and achieve results that are practical in the future [18].

Since the causes of HELLP syndrome are not well understood, the present study employed a data mining process to discover the required knowledge in preventing and diagnosing this syndrome in time. Five data-mining algorithms were used to investigate the retrospective data (including the demographic, clinical, and molecular factors affecting the diagnosis of HELLP syndrome) collected from a population of mothers within 25–37 weeks of pregnancy who showed evidence of hemolysis, low platelets, and abnormal liver tests. In addition, the present study aimed to compare and discover the patterns effective in the development of this syndrome compared to pregnancy poisoning. Given the difficulty of the diagnosis of HELLP syndrome, using ML for prediction can assist in easier detection of this condition and increase the accuracy of diagnoses.

Background

Moreira et al. proposed a model using artificial neural networks (ANNs) and fuzzy logic to predict HELLP syndrome in high-risk pregnancies. In the model, the learning capacity of ANNs was combined with the reasoning ability of fuzzy systems. The model employed mobile cloud computing in mind by avoiding diffuse inference, which required considerable computational effort. This study has reported an F1 score of 70.5% [1].

In another study, Melinte-Popescu et al. (2023) predicted the severity of HELLP syndrome using machine learning (ML) algorithms. They evaluated and compared the predictive performances of four ML-based models (decision tree [DT], random forest [RF], K-nearest neighbor (KNN), Naïve Bayes [NB]) to predict HELLP syndrome and its subtypes according to the Mississippi classification. In this study, all clinical and paraclinical features, including mother’s age, number of pregnancies, being a smoker, mother’s history of unsuccessful pregnancy, constant blood pressure and chronic kidney diseases, edema, gender, infant death, mother’s death, headache, nausea, epigastric pain, platelet, aspartate aminotransferase, and lactate dehydrogenase were used. The results indicated that HELLP syndrome is better predicted by DT (F1 Score = 94%) and KNN (F1 Score = 94%) [20].

Moreover, Melinte-Popescu et al. published a paper in 2023 entitled “Predictive Performance of Machine Learning-Based Methods For the Prediction of Preeclampsia—A Prospective Study.” This study aimed to evaluate and compare the predictive performances of ML-based models for the prediction of preeclampsia and its subtypes, such as HELLP syndrome. This prospective case-control study evaluated pregnancies in women who attended a tertiary maternity hospital in Romania between November 2019 and September 2022. The patients’ clinical and paraclinical characteristics were evaluated in the first trimester and were included in four ML-based models: DT, NB, support vector machine (SVM), and RF, and their predictive performance was assessed. Early-onset preeclampsia (EO-PE) was best predicted by DT (accuracy: 94.1%) and SVM (accuracy: 91.2%) models, while NB (accuracy: 98.6%) and RF (accuracy: 92.8%) models had the highest performance when used to predict all types of preeclampsia. The predictive performance of these models was modest for moderate and severe types of preeclampsia, with accuracies ranging from 70.6 to 82.4% [4]. The ML-based models could be useful tools for EO-PE prediction and could differentiate patients who will develop preeclampsia as early as the first trimester of pregnancy [21].

Furthermore, Villalaín et al. published a paper to predict the delivery within seven days after diagnosis of EO-PE using ML models. They aimed to develop a prediction model using ML tools for the need for delivery within seven days of diagnosis (model D) and the risk of developing HELLP syndrome or abruptio placentae. Maternal basal characteristics and data gathered during EOPE diagnosis: gestational age, blood pressure, platelets, creatinine, transaminases, angiogenesis biomarkers (soluble fms-like tyrosine kinase-1, placental growth factor), and ultrasound data were pooled for analysis. The most relevant variables were selected by bio-inspired algorithms. They developed basal models that solely included demographic characteristics of the patient (D1, HA1) and advanced models, adding information available upon diagnosis of EOPE (D2, HA2). First, they developed a predictive model of the need for delivery within seven days of diagnosis (model D), considering this as the window of the effect of antenatal corticosteroids on fetal maturation. Second, they developed a model to calculate the risk of developing HELLP syndrome or abruptio placentae at any point after EOPE diagnosis (model HA), as these are the most acute and harder-to-predict complications. In their case, they tried SVM, KNN algorithm, Gaussian Naïve Bayes (GNB), and DT models and selected them relying on the F1 score metric. At the time of diagnosis of EOPE, SVM with evolutionary feature selection process provided good predictive information of the need for delivery within seven days and development of HELLP/abruptio placentae, using maternal characteristics and markers that can be obtained routinely [22].

In 2022, Zheng et al. conducted a retrospective study to compare ML and logistic regression (LR) as predictive models for adverse maternal and neonatal outcomes of preeclampsia. The objective of this study was to evaluate the performance of ML and LR in developing short-term predictive models for binary maternal or neonatal outcomes involving preeclampsia, such as HELLP syndrome. The models were generated by common clinical indicators. They employed LR and six ML methods as binary predictive models for a dataset containing 733 women diagnosed with preeclampsia. The participants were grouped by four different pregnancy outcomes. After the imputation of missing values, statistical description and comparison were conducted preliminarily to explore the characteristics of the documented 73 variables. Sequentially, correlation analysis and feature selection were performed as preprocessing steps to filter contributing variables for developing models. In addition, the models were evaluated by multiple criteria. The RF classifier, multi-layer perceptron (MLP), and SVM demonstrated better discriminative power for prediction by comparing the area under the receiver operating characteristic (ROC) curve. However, the DT classifier, RF, and LR yielded better calibration ability, as verified by the calibration curve [23].

Chen et al. published a paper to predict adverse outcomes in de novo hypertensive disorders of pregnancy (HDP). A multitude of ML statistical methods were employed to develop two prediction models, one for maternal complications, including HELLP syndrome, and the other for perinatal deterioration. The maternal model using the RF algorithm produced an area under the curve (AUC) of 0.984 (95% CI (0.978, 0.991)). The best predictor variables selected by the model were platelet count, fetal head/abdominal circumference ratio, and gestational age at the diagnosis of de novo HDP; the perinatal model using the boosted tree algorithm yielded an AUC of 0.925 (95% CI (0.907, 0.945]). The most robust predictor variables selected were gestational age at the diagnosis of de novo HDP, fetal femur length, and fetal head/abdominal circumference ratio. These prediction models can help identify de novo HDP patients at increased risk of complications who might need intense maternal or perinatal care [24].

Moreover, in 2022, Huang et al. conducted a study that predicted preeclampsia complicated by fetal growth restriction and its perinatal outcome based on an ANN model. In this study, authors tried to adopt an ANN to assess the effect and predictive value of changes in maternal peripheral blood parameters and clinical indicators on pregnancy outcomes, such as HELLP syndrome in patients with preeclampsia complicated by fetal growth restriction. A total of 15 factors—maternal age, pre-pregnancy body mass index, inflammatory markers (neutrophil-to-lymphocyte ratio and platelet-to-lymphocyte ratio), coagulation parameters (prothrombin time and thrombin time), lipid parameters (high-density lipoprotein, low-density lipoprotein, and triglyceride counts), platelet parameters (mean platelet volume and platelet crit), uric acid, lactate dehydrogenase, and total bile acids—were correlated with preeclampsia complicated by FGR. A total of six ANNs were constructed with the adoption of these parameters. The accuracy, sensitivity, and specificity of predicting the occurrence of the following diseases and adverse outcomes were respectively as follows: 84.3%, 97.7%, and 78% for preeclampsia complicated by FGR; 76.3%, 97.3%, and 68% for provider-initiated preterm births: 81.9%, 97.2%, and 51% for predicting the severity of FGR; 80.3%, 92.9%, and 79% for premature rupture of membranes 80.1%, 92.3%, and 79% for postpartum hemorrhage; and 77.6%, 92.3%, and 76% for fetal distress [25].

Ejiwale et al. published a paper in 2021 entitled “Prediction of Concurrent Hypertensive Disorders in Pregnancy and Gestational Diabetes Mellitus Using ML Techniques.” This retrospective study sought to investigate, construct, evaluate, compare, and isolate a supervised ML predictive model for the binary classification of co-occurring HDP and gestational diabetes mellitus (GDM) in a cohort of otherwise healthy pregnant women. A total of 33 models were constructed with the following six supervised ML algorithms: LR, RF, DT, SVM, Stacking Classifier (SC), and Keras Classifier (a deep learning [DL] classification algorithm). All the algorithms were evaluated using the Stratified K-fold cross-validation (k = 10) method. The findings of this study indicated that the use of readily available routine prenatal attributes and appropriate ML methods can reliably predict the co-existence of HDP and GDM [26].

In another investigation in 2020, Marik et al. aimed to predict pregnancy toxicity through ML. This study sought to employ all clinical and laboratory data available during prenatal visits in early pregnancy, including HELLP syndrome, and use them to develop a predictive model for pregnancy poisoning. A total of 16,370 records were used in this study, and 67 variables that were examined in different models included the mother’s characteristics, medical history, usual laboratory results before birth, and drug use. In this research, a set of significant features for prediction was identified, and the use of usual information on the risk of pregnancy poisoning was predicted. In this work, two algorithms, gradient boosting and elastic net, were employed, and the highest performance was an AUC of 89% [27].

Furthermore, in 2019, Moreira et al. published a paper entitled “Data Analytics in Mobile Health Environments For High-risk Pregnancy Outcome Prediction.” They proposed the development, performance evaluation, and comparison of ML algorithms based on Bayesian networks capable of identifying at-risk pregnancies based on the symptoms and risk factors presented by the patients. The performance comparison of several Bayes-based ML algorithms determined the best-suited algorithm for predicting, identifying, and accompanying HDP. The contribution of this study focused on finding a smart classifier for the development of novel mobile devices, which presented reliable results in the identification of problems related to pregnancy. Through the well-known cross-validation method, this proposal was evaluated and compared with other recent approaches. The averaged one-dependence estimators presented better results on average than the other approaches. These findings are key to improving the health monitoring of women suffering from high-risk pregnancies around the world. Therefore, this study can contribute to a reduction in both maternal and fetal deaths [28].

Table 1 presents a summary of the reviewed literature.

Table 1 Summary of the reviewed literature

Materials and methods

The present research is a descriptive cross-sectional study conducted in four stages. In the first stage, data elements were identified. Then, patient records were gathered, and in the third stage, the dataset was preprocessed and prepared for modeling. Finally, ML models were implemented and evaluated.

Data element identification

Given the nature of this study, the first step was to identify the effective data elements in the diagnosis of HELLP syndrome, based on which data collection should be conducted. In order to identify the data elements, first, a literature review was conducted. According to the minimum clinical datasets, this literature review ensures that the set of data elements is considered for inclusion in the comprehensive set of elements [29]. The literature review was conducted using electronic databases, including the Scientific Information Database (SID, in Persian), PubMed, Scopus, Web of Science, Medline, and the Google Scholar search engine, to identify appropriate related resources. All the full texts were assessed, and data elements were extracted after excluding the irrelevant papers. Then, an interview was conducted with a group of experts in the field of obstetrics. These interviews were face-to-face, and each interview was conducted for one hour at a maximum. After five interviews and a thematic analysis of the data obtained using MAXQDA (version 12) software, data elements were extracted.

Data collection

In the second stage, patients’ records were collected at Shohadaye Tajrish Hospital in Tehran, Iran. These data were collected from patients between 2010 and 2021. In this study, the data of 384 pregnant mothers were analyzed, including 375 with pre-eclampsia (129 with severe pre-eclampsia, 175 with moderate pre-eclampsia, and 72 with mild pre-eclampsia), 2 with eclampsia, and 6 with HELLP syndrome. It is worth mentioning that the data has been collected without identifying variables, such as name, surname, and national ID number.

Data preparation

In the third stage, the developed dataset was prepared for modeling. The dataset did not contain missing data; however, due to being imbalanced, the Synthetic Minority Oversampling Technique (SMOTE) method was employed to make it balanced. The validity of data and labels was rechecked by an expert, and in case of the presence of any outlier, the correct value was replaced using the patient’s medical record. Considering that the main objective of the present study was the diagnosis of HELLP syndrome and determining the factors affecting the diagnosis of this disease, patients with HELLP syndrome were labeled as 1 and other cases were labeled as 0.

Modeling and evaluation

In the final stage, multiple algorithms were studied, and based on their initial results on the data elements, they were compared. Ultimately, the algorithm that had acceptable initial results was chosen for use in this study. These nine ML algorithms included network-based algorithms (MLP and DL), ensemble algorithms (RF, XGBboost, and Adaboost), and classic algorithms (DT, SVM, LR, and KNN). The ML models were implemented using Python programming language. For cross-validation, the holdout method was used by dividing the dataset into 70% of the training set and 30% of the test set and then using the k-fold method with k = 5 and k = 10. Evaluation indices for each implemented model were calculated for accuracy, precision, sensitivity, F1 score, and AUC. In order to select the best algorithm, the implemented models were compared based on the F1 score. Finally, according to the best model obtained, the importance of the variables in the model as effective features in the diagnosis of HELLP syndrome was reported.

Multi-layer Perceptron (MLP)

The MLP algorithm is a type of feedforward ANN that consists of multiple layers of nodes, each connected to nodes in the adjacent layers. It is a supervised learning algorithm that uses a backpropagation algorithm to update the weights of the connections between nodes to minimize the error between the predicted output and the actual output. The MLP is a versatile algorithm that can be applied for various tasks, including classification, regression, and pattern recognition. It is known for its ability to learn complex patterns in data and is commonly used in applications such as image and speech recognition.

Deep learning (DL)

The DL is a subset of ML that involves ANNs with multiple layers (hence the term “deep”). These networks are capable of learning complex patterns and relationships in data by automatically extracting and transforming features at different levels of abstraction. The DL algorithms have been successfully applied to various tasks, such as image and speech recognition, natural language processing, and autonomous driving. Some popular DL architectures include convolutional neural networks (CNNs) for image recognition, recurrent neural networks (RNNs) for sequence prediction, and generative adversarial networks (GANs) for generating realistic images.

Random Forest (RF)

The RF is an ensemble learning algorithm that combines multiple DTs to create a more robust and accurate model. Each DT in the RF is built using a subset of the training data and a random selection of features, which helps to reduce overfitting and improve generalization. The final prediction is made by aggregating the predictions of all the individual trees, either through a majority voting mechanism for classification tasks or averaging for regression tasks. The RF is known for its high accuracy, scalability, and ability to handle large datasets with high dimensionality. It is also resistant to overfitting and noise in the data, making it a popular selection for various ML tasks.

XGBoost

The XGBoost is a powerful and efficient ML algorithm known for its speed and performance in handling large datasets. It belongs to the ensemble learning method of boosting, where multiple weak learners are combined to create a strong learner. The XGBoost utilizes a gradient boosting framework, which focuses on minimizing the loss function by adding new models that complement the shortcomings of existing models. It is highly customizable, allowing users to tune parameters, such as learning rate, maximum depth of trees, and the number of boosting rounds to optimize performance. The XGBoost is frequently employed in various ML competitions and has been proven to achieve state-of-the-art results in various applications.

AdaBoost

The AdaBoost is a popular ensemble learning algorithm that combines multiple weak classifiers to create a strong classifier. The algorithm works by sequentially training a series of weak learners on the training data, with each subsequent learner focusing on the instances that were misclassified by the previous learners. The predictions of each weak learner are then combined through a weighted sum to make the final prediction. The AdaBoost is particularly effective in dealing with complex classification problems and has been successfully applied in a wide range of domains, including computer vision, speech recognition, and bioinformatics.

Decision tree (DT)

The DT algorithm is a popular supervised ML technique used for classification and regression tasks. It works by recursively splitting the dataset into subsets based on the value of a certain attribute, with the purpose of developing a tree-like structure where each internal node represents a decision based on an attribute, and each leaf node represents the class label or predicted value. The DTs are easy to interpret and visualize, making them useful for understanding the underlying patterns in the data. However, they can be prone to overfitting if the tree is too deep or complex and may not perform well on datasets with high dimensionality or noisy data. Various extensions and ensemble methods, such as RF and Gradient Boosting, have been developed to address these limitations and improve the performance of DTs.

Support Vector Machine (SVM)

The SVM is a robust supervised learning algorithm commonly used for classification and regression tasks. The main objective of SVM is to find the hyperplane that best separates the data points into different classes by maximizing the margin between the classes. It works by mapping the input data into a higher-dimensional space and finding the optimal hyperplane separating the classes with the maximum margin. The SVM effectively handles high-dimensional data and is known for its ability to generalize well to unseen data. In addition, SVM can handle non-linear data by using kernel functions to map the data into a higher-dimensional space.

Logistic regression (LR)

The LR is a statistical model employed for binary classification tasks where the output variable is categorical and includes two classes. It works by estimating the probability that a given input belongs to a particular class, using a logistic function to map the input features to the output. The model calculates the log-odds of the probability that the input belongs to the positive class and then applies a sigmoid function to convert this into a probability score between 0 and 1. During training, the algorithm adjusts the weights of the input features to minimize the error between the predicted probabilities and the actual class labels. The LR is a simple yet powerful algorithm frequently used in various fields, such as healthcare, finance, and marketing, for its interpretability and ease of implementation.

K-Nearest Neighbor (KNN)

The KNN algorithm is a simple and intuitive ML algorithm for classification and regression tasks. In KNN, the algorithm classifies new data points based on the majority class of its KNN in the training dataset. The value of k is a hyperparameter that can be tuned to optimize the model’s performance. The KNN is a non-parametric algorithm, meaning it makes no assumptions regarding the underlying data distribution. This makes KNN a versatile algorithm that can be applied to a wide range of datasets and is particularly useful for datasets with non-linear relationships. However, KNN can be computationally expensive, especially with large datasets, as it requires calculating the distance between all data points in the training set.

Results

Identified data elements

Table 2 indicates the list of data elements that were identified in this study. Variables were classified into four categories: demographics, medical history, test results, and outcome. In this research, the diagnosis feature in the outcome category was the target feature.

Table 2 The dataset features and their descriptions

Data gathering and Preparation

The number of samples gathered in this study was equal to 384. The descriptive statistics of the variables of this dataset are presented in Table 1. In this data set, after applying preliminary pre-processing, such as changing the shape of some variables from a string to a number due to the imbalance of the dataset, Random Under-Sampling, Near Miss Under-Sampling, Adaptive Synthetic Sampling (ADASYN), and SMOTE methods were employed for balancing. Among these methods, SMOTE outperformed based on the final results. Using the SMOTE method, the number of samples increased from 384 to 757. In this sampling process, the percentage of the positive class with HELLP syndrome was 1.5%, and after applying SMOTE, this class was increased to 48%. Tables 3 and 4 present descriptions of nominal and quantitative variables, respectively.

Table 3 Description table of nominal and rank variables of the study
Table 4 Description table of quantitative variables of the study

Modeling

Several algorithms were applied for modeling, among which seven were selected for reporting in this work based on the characteristics of the dataset and the evaluation results of the models. These algorithms included DT, MLP neural network, KNN, SVM, DL, RF, AdaBoost, XGBboost, LR and holdout; k-fold (k = 5, k = 10) modes were used for validation. Therefore, three different models were created and validated for each algorithm using these three validation methods. The evaluation results for a better comparison of the algorithms are presented in Tables 5 and 6, and 7.

Table 5 Performance of data mining models using holdout (70 train-30 test)
Table 6 Performance of data mining models using 5-fold
Table 7 Performance of data mining models using 10-fold

According to the results, the MLP algorithm achieved the best performance among all algorithms in the holdout cross-validation method with F1 Score = 0.994. The ROC diagram of this model is indicated in Fig. 1. Moreover, Table 8 presents the confusion matrix for the best model.

Table 8 Confusion matrix for MLP model
Fig. 1
figure 1

ROC diagram of MLP using holdout cross-validation

In both 5-fold and 10-fold cross-validation methods, the DL algorithm with F1 Score = 0.989 and F1 Score = 0.993, in respective order, outperformed the other algorithms. The ROC diagram of these two models is indicated in Figs. 2 and 3.

Fig. 2
figure 2

ROC diagram of DL using 5-fold

Fig. 3
figure 3

ROC diagram of DL using 10-fold

In Fig. 4, the F1 score comparison of all studied algorithms for three validation methods is presented.

Fig. 4
figure 4

Comparison of the performance of models in different holdout, 5-fold, and 10-fold situations

Figure 5 indicates the importance of the included features in modeling. In this study, platelet count, gestational age, and aspartate transaminase (AST) were the most important features in the modeling, which has been confirmed based on many medical studies, and the number of abortions, twins, and blood pressure were the least important features in the diagnosis of HELLP syndrome. Figure 5 presents the importance of the included features in modeling.

Fig. 5
figure 5

Feature importance based on the modeling output

Discussion

In the present investigations, the medical records of 384 patients referred to Shohadaye Tajrish Hospital in Tehran, Iran, were analyzed, and after applying pre-processing, the diagnosis model of HELLP syndrome was implemented using ML algorithms. Among all the implemented algorithms, those based on neural networks outperformed other algorithms. Although there was not much difference between the high-performance models, the best model in this study was implemented with the ANN algorithm and holdout validation method.

The ANNs have not been frequently utilized in other studies, except for Huang (25), who reported an F1 score of 0.88, while our study achieved an F1 score of 0.995 for MLP. Moreover, Zheng (23) employed MLP but did not exhibit the best performance among the algorithms tested. Furthermore, in all the studies reviewed, only cross-validation was applied; specifically, 10-fold cross-validation was employed in all cases except for Melinte-Popscu (20), which utilized 5-fold to predict the severity of HELLP syndrome. Holdout validation was not employed in any of these studies and has not been compared.

The results indicate that the ML models reported in this study for the diagnosis of HELLP syndrome demonstrated reliable performance compared to previous investigations, with an F1 score exceeding 99% in all cross-validation modes.

For instance, in Moreira’s study, the F1 score for the proposed neuro-fuzzy algorithm was 0.705 [1]. In the Melinte-Popescu study, the highest value of the F1 score for predicting the severity of HELLP syndrome in the DT was 0.94. (20) In addition, Melinte-Popescu’s study on preeclampsia prediction reported that the highest F1 score for predicting all cases of preeclampsia using the NB algorithm was 0.98 [21].

In Moreira’s study, the F1 Scores for all Bayesian machine learning models predicting delivery outcomes for pregnant women and fetuses in cases of HDP, including HELLP syndrome, varied between 0 and 1, indicating different performance levels across the classes [28].

In addition, the effective and important features of the diagnosis of HELLP syndrome were determined in this study. Among the baseline features used in the modeling, gestational age was the most important, and then, the mother’s age was reported as the most important feature. The other two characteristics in this category, namely the number of pregnancies and the mother’s BMI, were found to be equally important but less significant than the previous two characteristics. According to the results presented in the figure, the number of births is a contributing factor to HELLP syndrome. However, as noted previously [30], unlike preeclampsia, not giving birth is not considered a risk factor for HELLP. Moreover, mothers with a history of childbirth account for 50% of affected patients. Moreover, among the characteristics of the baseline category, twins and the number of miscarriages did not affect the prediction model of HELLP syndrome.

Among the clinical features, nausea and diastolic blood pressure had the most significant impact on modeling, but headache, epigastric pain, and systolic blood pressure did not affect the modeling results. According to previous medical studies [31,32,33], blood pressure abnormalities are present in nearly 85% of cases of HELLP syndrome; however, this symptom may not be observed in severe cases of HELLP syndrome.

Among the characteristics of biomarkers, platelets are the most important in modeling, which is also the first factor of significant importance in all categories. This result is consistent with the definition of HELLP syndrome in all references. Other characteristics of this category, including AST, FBS, Bili D, creatinine (CR), LDH, and Bili T, had a significant impact. In addition, among the features of this category, only ALT is considered a feature without impact on modeling. In general, the characteristics of this category were much more effective in diagnosing the disease than other categories.

In Maric’s study [27], the highest coefficient for the blood pressure variable in the prediction model of preeclampsia (which includes patients with HELLP syndrome) was selected using an elastic net. This is not consistent with the results of our study, which considered HELLP syndrome separately from preeclampsia.

In Zheng’s study, after performing feature selection, 15 features were identified among various factors, including demographics, complications, delivery characteristics, neonate features, physical examinations, and laboratory examinations. The largest number of these features (23) and the most significant ones were found in the laboratory examination category, with FBS being noted, which aligns with the results of the present study.

Some previous investigations have diagnosed HELLP syndrome using ML, among which we can mention the study of Melite et al., in which the severity of HELLP syndrome was predicted in three different severity groups using the data of 81 patients. I In this study, four machine learning algorithms were employed, and their results were compared, with DT showing the best performance. Other studies, such as the one conducted by Moya et al., have reported results that were significantly lower than those obtained in the present study.

The rarity of HELLP syndrome samples has made it challenging to conduct such studies, and this issue can be seen in all studies. What distinguishes this study from other studies to some extent is the number of patients included in the study, the quality of the collected data, and finally, based on the clinical approach, the obtained results were attempted to be employed to determine the characteristics that are effective in the diagnosis of HELLP syndrome.

Conclusion

Considering the impact of data mining techniques in the diagnosis and prediction of diseases, data mining techniques were used in the present study to develop a prediction model for HELLP syndrome. The results obtained from the evaluation of the models presented in this study revealed that data mining algorithms can be used successfully in developing HELLP syndrome prediction models. Since other algorithms besides Decision Trees (DTs) also achieved F1 Scores above 0.90, the performance of these algorithms was consistently high across all three evaluation methods: 5-fold cross-validation, 10-fold cross-validation, and hold-out (70/30 test/train). Despite some small differences among the algorithms, their performances were closely aligned. Moreover, this study indicated that Biomarker features have the most significant impact on the diagnosis of HELLP syndrome. Although the accuracy of the obtained results was considerably high, more detailed investigations are necessary to assess the validity and generalizability of the findings and ultimately improve the quality of care for pregnant women.

Further studies involving larger groups of HELLP syndrome patients are recommended. Additionally, research focusing on the differential diagnosis of HELLP syndrome from other pregnancy-related conditions, such as preeclampsia and eclampsia, is suggested. It is also recommended to utilize clustering machine learning methods for this purpose. It is also recommended to apply interpretable DL models to assess the significance of the features used in the present work. In addition, external validity is suggested for implementing models on other datasets. Limited access to data on HELLP syndrome and restricted access to data from Shohadaye Tajrish Hospital can be considered limitations of the present research, which arose due to the implementation of this research during the COVID-19 pandemic.