Predictive maintenance framework for assessing health state of centrifugal pumps

IAES International Journal of Artificial Intelligence (IJ-AI)
Vol. 13, No. 1, March 2024, pp. 850~862
ISSN: 2252-8938, DOI: 10.11591/ijai.v13.i1.pp850-862  850
Journal homepage: http://ijai.iaescore.com
Predictive maintenance framework for assessing health state of
centrifugal pumps
Panagiotis Mallioris1
, Evangelos Diamantis2
, Christos Bialas1
, Dimitrios Bechtsis1
1
Department of Industrial Engineering and Management, School of Engineering, International Hellenic University, Thessaloniki, Greece
2
UTECO S.A. Industrial and Marine Automation Systems, Piraeus, Greece
Article Info ABSTRACT
Article history:
Received May 10, 2023
Revised Jul 18, 2023
Accepted Aug 2, 2023
Combined with advances in sensing technologies and big data analytics,
critical information can be extracted from continuous production processes
for predicting the health state of equipment and safeguarding upcoming
failures. This research presents a methodology for applying predictive
maintenance (PdM) solutions and showcases a PdM application for health
state prediction and condition monitoring, increasing the safety and
productivity of centrifugal pumps for a sustainable and resilient PdM
ecosystem. Measurements depicting the healthy and maintenance-prone
stages of two centrifugal pumps were collected on the university campus. The
dataset consists of 5,118 records and includes both running and standstill
values. Additionally, Spearman statistical analysis was conducted to measure
the correlation of collected measurements with the predicted output of
machine conditions and select the most appropriate features for model
optimization. Several machine learning (ML) algorithms, namely random
forest (RF), Naïve Bayes, support vector machines (SVM), and extreme
gradient boosting (XGBoost) were analyzed and evaluated during the data
mining process. The results indicated the effectiveness and efficiency of
XGBoost for the health state prediction of centrifugal pumps. The
contribution of this research is to propose an effective framework collectong
multistage health data for PdM applications and showcase its effectiveness in
a real-world use case.
Keywords:
Big data
Centrifugal pumps
Industry 5.0
Machine learning
Predictive maintenance
This is an open access article under the CC BY-SA license.
Corresponding Author:
Panagiotis Mallioris
Department of Industrial Engineering and Management
School of Engineering, International Hellenic University (IHU)
Thessaloniki 57400, Greece
Email: panmalliw@gmail.com
1. INTRODUCTION
Industry 5.0 is shaping technology-driven ecosystems into sustainable, resilient, human-centric, and
value-driven ecosystems [1]. The concept of predictive maintenance (PdM) is the utilization of input features
such as high velocity, variability, veracity, volume, and value measurements [2], generating production
forecasts, providing crucial information for equipment condition and facilitating maintenance management [3].
Specifically, PdM can reduce maintenance and overtime costs by 20% while decreasing downtime by 5% [4].
Additionally, through [5] findings, a PdM solution can predict approximately 70% of failures and reduce
scheduled repairs and maintenance costs by up to 12% and 30%, respectively. Therefore, it is crucial to further
research PdM in modern industrial environments for constructing innovative, sustainable, and resilient
manufacturing processes.

Int J Artif Intell ISSN: 2252-8938 
Predictive maintenance framework for assessing health state of centrifugal pumps (Panagiotis Mallioris)
851
Several maintenance approaches are applied, namely corrective, preventive, and predictive
maintenance [6]. As defined by [7], corrective maintenance, also named run-to-failure, focuses on repairing
equipment or individual system components following their malfunction. Argued that in corrective
maintenance, replacements or repairs are performed when the critical part is entirely worn out, and a failure
occurs [8]. Hence, system malfunctioning can lead to unwanted events, jeopardizing the safety of operators
and increasing production downtimes. Differentiating from the approach above, preventive maintenance is
performed periodically in specific timeframes regardless of the system's health state [9].
PdM increases safety and productivity, decreases downtimes and reduces operational costs. Historical
data containing key input features, statistical outputs, and data-driven algorithms enable the prediction and
early detection of malfunctions [10]. PdM tools identify the necessity of a maintenance action based on sensor
measurements for condition-based maintenance (CBM) approaches [11] or the prognosis of the remaining
lifetime of the equipment remaining useful life (RUL) of industrial equipment [12]. Specifically, CBM
uses historical or real time data to diagnose critical components' state and schedule maintenance prior
breakdown [11]. Moreover, the reliability of prognosis and health management systems is based upon
diagnosing critical components' degradation state for RUL prediction [8]. Important indicators for depicting a
component's or machinery's health state vary in each manufacturing sector. The most common vital
measurements are vibrations, temperatures, acoustic emissions, currents, pressures, and rotational speed [13].
In the transition towards Industry 5.0, the advent of AI, namely machine and deep learning algorithms
and big data analytics, has provided researchers with valuable tools for predicting the condition of machinery
and its linear degradation over time. AI algorithms capture sensor measurements (input features) and produce
classification or regression outputs (labels), such as the ‘healthy’ and ‘unhealthy stage’, by employing
mathematical equations in the form of activation and loss functions. A predictive algorithm is trained based on
historical data (input features) to accurately predict the targeting output, labelling the stage of the machinery
to identify an impending failure [14]. Commonly used machine learning (ML) models include random forest
(RF), Naïve Bayes, support vector machines (SVM), and extreme gradient boosting (XGBoost) [15].
There are many reports on the successful application of AI models in forecasting health states or
upcoming failures in electric inductive motors. Presented an anomaly detection approach using the
Simulink/MATLAB programming environment in an electrical motor-driven system connected with a
gearbox [16]. Vibration, rotation axis, and current signals were proposed as input features for the artificial
neural nerworks (ANN) detection algorithm. Similarly, a deep convolutional neural networks (CNN) model
for machine state identification of conveyor motors was presented [17]. Key input features were considered,
namely vibration, temperature, pressure, acceleration, rotational speed, and torque, while accuracy, precision,
and recall were the evaluation metrics. The authors' findings indicated that vibration velocity above 4.5mm/s
constitutes unsatisfactory vibration severity. Focusing on centrifugal pumps, a failure classification approach
was presented [18]. Using a context-based Multilayered Bayesian algorithm and vibration, temperature, and
pressure as input features, the authors classified failure data into multiple classes based on the estimated fault
magnitude with an F1-Score of 98%. However, most research papers analyzed PdM solutions handling
historical measurements of healthy inductive motors where malfunction data is sparse. This poses the challenge
of conducting health state predictions, having a biased dataset towards healthy state records and reducing the
overall accuracy of real-time malfunction prediction. To overcome this issue, our study proposes a PdM
solution where measurements collected upon healthy and maintenance-prone stages of centrifugal pumps have
similar volumes. The novelty of our study facing the challenge above is to collect multistage health condition
data of the same manufacturer, providing adequate information to the prediction algorithms to recognize and
classify health state conditions. Additionally, our research proposes model-optimization using Spearman
statistical correlation for feature selection, enhancing the overall accuracy of the PdM approach. Overall, this
contributes to the RUL research. further assisting the safety and productivity of the system.
The main objective of this research is to develop a PdM model based on health and maintenance-
prone data collected from two different centrifugal pumps, referring to healthy and maintenance-prone stages,
respectively. Moreover, descriptive statistics and Spearman statistical analysis will be conducted to identify
the correlation between input measurements and the predicted label of health state. Hence, a generic model
will be developed for producing accurate outcomes based on historical input data. Furthermore, typical AI
models, namely RF, Naïve Bayes, SVM, and XGBoost, will be evaluated based on “Accuracy”, “Precision”,
“Recall”, “F1 score”, and “Cohen Kappa score”. The prediction model’s scope is to maximize the overall
system’s efficiency, increase reliability and productivity, and reduce maintenance costs and downtimes.
Moreover, state-of-the-art AI algorithms are compared to provide meaningful insights regarding the health
state of the equipment based on crucial input features extracted from historical data. The training process will
utilize two different datasets, each depicting healthy and maintenance-prone stages of centrifugal pumps,
overcoming the issue of analyzing measurements biased towards the healthy data outputs. Our solution
provides experimented algorithms with insightful information assisting in health state recognition and
malfunction prediction of centrifugal pumps. The remainder of the paper is structured as follows: i) Section 2

 ISSN: 2252-8938
Int J Artif Intell, Vol. 13, No. 1, March 2024: 850-862
852
presents the methodology applied to this research, ii) Section 3 refers to the description of the use case and
overall results; iii) While section 4 provides the conclusions of this work and future research directions.
2. METHOD
In the proposed case study, two centrifugal pumps depicting different health stages, healthy and
maintenance-prone, were studied using a PdM approach to compare various AI algorithms and optimize the
selection of input features. Using the Mitsubishi Electric FA Smart Condition Monitoring kit provided by
UTECO S.A, GX Works 3 software and Python software scripts authors collected 5,118 rows of measurements
depicting key features: velocity, demodulation, acceleration and temperature. Data analysis techniques, namely
data cleaning and dimensionality reduction, were conducted for handling and pre-processing heterogeneous
sensor data. In real-world applications, the initial measurements frequently contain inconsistencies like missing
values, data duplication, outliers, and structural errors. Apprehending these inconsistencies before feeding the
input features to a predictive algorithm is necessary. Hence, a data analysis procedure, processing raw
heterogeneous measurements, is essential before a decision-making approach. The initial stage of the analysis
procedure consists of data cleaning. Data cleaning can be defined as detecting and correcting an error [19]. As
a first step, researchers and analysts handling raw data inputs should identify missing or not a number (NaN)
values and apply any necessary actions, namely replacing or deleting them.
Moreover, a common issue in data analytics procedures is dimensionality reduction. One of the
challenges in predictive maintenance applications is the significant difference between the number of records
depicting healthy and malfunctioning measurements. In most applications, embedded sensors will store
measurements of the healthy system, creating a rough analogy and making the predictions biased into a healthy
system output. The novelty of our research facing the abovementioned issue is to handle and merge two
different datasets of centrifugal pumps of the same manufacturer. The datasets include varying working hours
and age, one containing healthy and one maintenance-prone data, to provide adequate information to the
training model to recognize and classify health state conditions. Following pre-processing, the topics discussed
in this research are feature selection and data mining procedures. For the feature selection, Spearman statistical
analysis was conducted to determine collected measurements and select the most appropriate features for model
optimization. Regarding data mining, RF, Naïve Bayes, SVM, and XGBoost, were evaluated based on
“Accuracy”, “Precision”, “Recall”, “F1 score”, and “Cohen Kappa score” for health state prediction of
centrifugal pumps. Figure 1 proposes a comprehensive framework for PdM in centrifugal pumps.
Figure 1. Framework for PdM and health state prediction in centrifugal pumps with feature selection and
model optimization (adapted from [20])
2.1. System definition
The mitsubishi smart condition monitoring sensor kit collected sensor data measurements in the
proposed case study. The dataset contains historical data from two centrifugal pumps and includes healthy and

853
maintenance-prone measurements. Specifically, each pump outputted five columns containing key features:
temperature and vibration parameters such as velocity_ISO, rms_demodulation, rms_acceleration, and peak-
to-peak acceleration. Moreover, the final column included the status characterisation, meaning the output used
for the prediction algorithm, indicating the presence of failure, whether healthy or not, on the centrifugal pump
at the exact moment [20]. Variable velocity_ISO denotes the rotational speed of the centrifugal pumps,
measured in mm/s. High-velocity values usually indicate imbalance or misalignment in the detected system.
The work in [21] used rotational speed on a deep transfer learning approach for upcoming failure prediction
on a rotor kit. Furthermore, demodulation measurements can be vital for early bearing failure indication. The
output frequencies identified in the demodulation spectrum are helpful for damage detection in rolling element
bearings. Proposed a hybrid PdM approach combining data-driven algorithms and a physics-based model to
predict the optimal maintenance timeline of a computer numerically controlled (CNC) milling machine [22].
Time domain force X root mean square (RMS) and the frequency domain forces were considered some of the
features with the highest correlation, while the hybrid approach outputted an error ratio of 3.17%.
Additionally, acceleration denotes the rate of change of velocity. Acceleration refers to the ratio of
velocity shifts over time in speed and direction and indicates gear defect upon detection. Fast fourier transform
(FFT) converts time waveform into acceleration spectrum, followed by mathematical equations that produce
the velocity and demodulation outputs. A health state classification approach of conveyor motors [17] used
acceleration alongside temperature and rotational speed, outputting 100% correct failure classification.
Finally, this research also considered peak-to-peak acceleration and temperature measurements for
the health state classification of centrifugal pumps. Peak-to-peak refers to the maximum distance between the
negative and positive peaks of the vibration spectrum. The amplitude indicates the vibration's intensity and
depicts the detected issue's severity. At the same time, temperature (o
C) can be a vital indication, as it is stated
that before the total malfunction of a machine, its temperature rapidly rises. In their analysis, [23] considered
statistical values of vibration measurements for anomaly and RUL prediction in CNC milling machines.
The proposed use case denotes a condition-based maintenance approach with a binary classification
output predicting the health state of centrifugal pumps. Variables collected from embedded sensors of the
Mitsubishi Smart Condition Monitoring kit, namely machine, temperature, and vibration features, were
considered for further processing and model optimization. Centrifugal pumps consist of mechanical parts and
bearings that change the condition state precisely due to long-term use and overall strain (Figure 2), making
them suitable for PdM applications. Our research focuses on developing and optimizing a predictive model to
accurately characterize the condition state of centrifugal pumps based on two specifically selected datasets
depicting healthy and maintenance-prone measurements, respectively.
Figure 2. Machine condition degradation over time: Machine condition and time dimensions establish a
hierarchy of attributes: (i) out-of-order, smoke, temperature, noise, vibrations and state change, and
(ii) minutes, days, weeks, and months
2.2. Data analysis of input features
The initial data analysis procedure denoted the manipulation of missing or NaN values. As mentioned
in section 2, the approach to handling missing data is replacing or deleting them, depending on the volume of

 ISSN: 2252-8938
854
particular instances in the overall dataset. In our case, the measurements depicting NaN values were sparse
(three instances), and the approach chosen was removing the entire missing record. Both healthy and
maintenance-prone measurements were summarized, to compose a complete unbiased dataset for data mining
procedures and health state classification. Furthermore, the Spearman correlation method was used to find the
significance of each feature, enabling the selection of the most convenient features to be applied to the
prediction algorithm.
Feature selection focuses on extracting non-informative input variables from historical data that do
not enhance the efficiency of the prediction model and may cause overfitting issues [24]. Spearman correlation
is appropriate when handling continuous parameters, such as time series and sensor measurements, where input
features and predicted labels do not express a linear relationship or the predicted output is an ordinal value.
Hence in this research, Spearman is considered the most suitable method. The mathematical expression of the
Spearman correlation analysis can be written as (1):
𝜌 = 1 −
6 ∑ 𝑑𝑖
2
𝑛
𝑖=1
𝑛(𝑛2−1)
(1)
where 𝑑𝑖 = difference in paired ranks and n = number of cases.
2.3. AI algorithm selection
The advent of artificial intelligence (AI), namely ML and deep learning algorithms alongside big data
analysis, has provided researchers with valuable tools for predicting the condition of machinery or its RUL.
This study examines and compares ML algorithms which efficiently describe the linear degradation of
machinery parts (bearings, gearboxes, and many more) on centrifugal pumps. RF, Naïve Bayes, SVM, and
XGBoost are state-of-the-art algorithms referred to in the literature [25].
Moreover, a Naïve Bayes algorithm was selected for condition state prediction of machine
components in a steel hot rolling mill process providing root mean square error (RMSE) of 2.98 [26]. Authors
considered steel type, weight, length, temperature, maintenance dates, and thickness parameters as input
features. Furthermore, Tutivén et al. [27] proposed an SVM algorithm for bearing failure prediction on wind
turbines. It was highlighted that the accuracy of their model was optimal due to the high correlation between
mean central shaft temperature and bearing failure.
2.3.1. Random forest
One of the most popular and efficient ML algorithms, RF, can be selected for classification (RF
classifier) or regression (RF regressor) outputs. It is a well-suited candidate for health state classification and
RUL applications. Its main philosophy is built upon decision trees, a collection of CART-like trees where each
individual tree grows while training. Precisely, a ‘forest’ consists of a certain number of decision trees (or
branches) that each outputs a specific prediction. Decision trees resemble the structure of a tree containing
roots, branches, and leaves, which output properties, decision rules, and outcomes, respectively. The central
concept of the algorithm is the strategic choice (hill climbing) of the independent variable, where each branch
will expand. Information entropy in (2), which summarizes the impurity of the sample, is one of the most
commonly applied criteria for selecting the independent variable.
(𝑆) = −𝑝+𝑙𝑜𝑔2𝑝+ − 𝑝−𝑙𝑜𝑔2𝑝− (2)
Where S is the training sample of the separation node, 𝑝+ is the fraction of positive examples of S, and 𝑝− is
the fraction of negative examples of S. Hence, the final predicted output will express most of the decision
outcomes, also called as ‘’bagging’’ prediction method, decreasing the possibility of error.
2.3.2. Naïve Bayes
The Naïve Bayes algorithm, based on the Bayes theorem, calculates the probability of a particular event
A occurring before a previous event B that has already happened. In the prediction state of the algorithm, it is
assumed that the selected variables are independent. Thus, each input feature independently contributes to the
classification output regardless of any possible correlation among the given variables. In addition, the theorem
overcomes the issue of calculating probabilities where valuable information is absent or many parameters must
be considered for an accurate output by calculating conditional probabilities and implementing estimations instead
of event frequencies. The respective probabilities are calculated by (3), called Bayes classifier.
𝑃(𝐴|𝐵) = 𝑃(𝐵|𝐴) ∗ 𝑃(𝐴)/𝑃(𝐵) (3)
Where as: P(A) and P(B) is the probability of events A and B occurring respectively, P(A│B) is the probability
of A occurring when B has already occurred and P(B│A) as the exact opposite.

855
2.3.3. Support vector machines
Developed the SVM and followed the concept of classifying input features by creating hyperplanes
in a multi-dimensional space with exact dimensions as the features [28], [29]. The optimal solution is found
when the separating hyperplane margin constitutes the maximum distance between the closest data of the
classified features. This distance is called maximum margin hypersurface and, in linearly separable problems,
is defined by the number of features (support vectors). The margin is calculated with constrained quadratic
optimization algorithms such as the hinge loss formula. An advantage of the SVM algorithm is that through
kernel functions, non-linearly separable problems can be converted into linearly separable ones by
transforming the original hypothesis space and eventually solving them. Overall cost calculation function (4):
𝑚𝑖𝑛𝑤
1
2
∑ 𝑤𝑖
2
+
𝑛
𝑖=1 𝐶 ∑ 𝑚𝑎𝑥(0,1 − 𝑡 ∗ 𝑦)
𝑚
𝑗=1 (4)
where w denotes a vector projecting sample point, n is the number of features, m is the number of samples, and
C denotes a scalable number for misclassifications control.
2.3.4. XGBoost
Extreme gradient boosting (XGBoost) implements the gradient boosting algorithm upon decision
trees. Boosting refers to creating vectors composed of the function derivatives, calculated on each input feature,
thus optimizing weaker branches and adding new branches which predict the residuals of error of the previous
ones. Furthermore, an objective function is constantly minimized and updated, combining a convex cost
function that quantifies the predictive model's accuracy and a penalty term. XgBoost employs the Quantile
Sketch algorithm to secure the optimal output among weighted input features and a cross-validation method at
each iteration. The mathematical equation of XGBoost can be defined as (5):
𝑦𝑖
̂ = ∑ 𝑓𝑘(𝑥𝑖),
𝐾
𝑘=1 𝑓𝑘 ∈ ℱ (5)
where K denotes the number of trees, f is the functional space of F, and F refers to possible classification and
regression trees. Hence, the final predicted output refers to weaker branches optimization, also called as
‘‘boosting’’ prediction method, increasing overall accuracy.
2.3.5. Comparative analysis of the ML algorithms
Several factors must be considered to ensure effective and efficient use of AI algorithms and highlight
applicability for each domain. Ease of implementation, speed of execution, computational intensity, sensitivity
to outliers, sensitivity to specific dataset characteristics and overall accury are some of the critical factors
identified in this research study. RF is a versatile algorithm that provides an efficient PdM solution when
handling outliers and non-linear input features such as vibration, because all values including outliers are
treated as positive or negative values outputting the desired result. However, enhancing the overall accuracy
of the RF algorithm may require the increase of trees and branches which can result in extending computational
intensity and reducing the speed of execution. On the other hand, although Naïve Bayes is a fast and scalable
algorithm capable of handling multiple clusters simultaneously, estimated probabilities are sensitive to outliers
leading to inaccuracies in classification outcomes. Moreover, despite its implementation simplicity and low
computational intensity, the Naïve Bayes algorithm lacks flexibility regarding hyper-parameter fine-tuning. It
may result in inaccurate classification outputs since it hypothesizes independence among all features.
Additionally, the main characteristics of SVM algorithm are the robustness to unprocessed raw data based on
its methodology of determining decision boundaries with support vectors and the ability of generalization,
providing efficient classification outputs on previously unseen input measurements. However, SVM can
become computationally and memory intensive when handling high volume input datasets due to extensive
kernel manipulation and being highly sensitive to hyper-parameter fine tuning, namely regularization
parameter and appropriate kernel selection, negatively affecting overall performance. Similarly, an advantage
of the XGBoost methodology is its robustness, reducing the need for extensive fine-tuning regarding the model
parameters and decreasing the possibilities of overfitting. Nevertheless, XGBoost can become computationally
and memory intensive when handling high-volume datasets [30]. XGBoost is characterized as a highly accurate
and efficient state-of-the-art algorithm on classification and regression applications because of its architecture
of optimising and learning based on previous outputs and weaker branches.
3. USE CASE DESCRIPTION
Embedded sensors of the Mitsubishi smart condition monitoring kit collected 5,118 rows of
measurements depicting key features, namely velocity_ISO, rms_demodulation, rms_acceleration, and peak-
to-peak acceleration from two centrifugal pumps of the same manufacturer, in a healthy and maintenance-

 ISSN: 2252-8938
856
prone state respectively. Both machines operate individually in the Alexander Campus of the International
Hellenic University facilities; an indicative dataset can be found on Kaggle (https://www.kaggle.com/
datasets/panosmallioris/sensor-measurements-of-centrifugal-pumps). It is worth mentioning that experimented
pumps work approximately every 5 minutes for a period of 1 minute filling a water tank, or in extreme cases
when a large amount of water supply is requested. Following the extraction of NaN values after initial pre-
processing, each pump individually outputted a sample of 2,557 rows for further analysis. Another action in
handling raw collected data was extracting measurements where the pumps were stationary. The remaining
data, namely 857 rows depicting the running states of both pumps, were used for training (80%) and evaluation
(20%) of the experimented algorithms.
Python programming language, Scikit-learn library, and Jupyter Notebook were employed to develop
health state classification ML models. The descriptive statistics of numerical input features depicting the
training and validation dataset of RF, Naïve Bayes, SVM, and XGBoost models are summarised in Table 1.
The measurements of rms_demodulation and rms_acceleration have relatively low deviations (0.115 and 0.212,
respectively), differentiating from velocity_ISO, peak-to-peak acceleration, and temperature, where a high
deviation is noticed. On the other hand, significant deviations in velocity_ISO, peak-to-peak acceleration, and
temperature indicate inconsistencies and instability in the output measurements of experimented centrifugal
pumps (one in healthy condition and one in maintenance-prone condition, respectively), making them
promising candidates for depicting the health state of the machinery and enhance prediction performance.
Table 1. Descriptive statistics of numerical input features. The features of velocity, rms_demodulation,
rms_acceleration, peak-to-peak acceleration and temperature are highlighted as the most relevant
for further examination
Variable Unit Min Max Average Standard deviation
velocity_ISO mm/s 0.277 4.905 0.76 0.548
rms_demodulation g 0.008 0.408 0.127 0.115
rms_acceleration g 0.052 0.585 0.263 0.212
peak-to-peak acceleration g 0.388 5.218 2.206 1.783
temperature o
C 19 28 24.505 1.805
3.1. AI algorithm selection
In this section, Spearman statistical analysis was conducted using the Python programming language
and stats module from the SciPy library. To pre-determine which of the input features, namely velocity_ISO,
rms_demodulation, rms_acceleration, peak-to-peak acceleration, and temperature, are the most informative
and further enhance the prediction algorithm's performance, Spearman correlation values were calculated. As
mentioned in section 2.2, Spearman correlation is selected when the dependent and independent variables do
not express a linear relationship, or the predicted label depicts an ordinal value. Therefore, Spearman statistical
analysis was considered the most appropriate choice for feature selection and model optimization. The results
are presented in Table 2.
The correlation outputs, shown in Table 2, indicated that the velocity_ISO parameter is the most
informative for health state classification, with a correlation of 0.865. Moreover, rms_demodulation (0.856),
peak-to-peak acceleration (0.849), and rms_acceleration (0.832) also output promising results, making them
suitable candidates for feature selection regarding model optimization. However, although the temperature is
considered one of the most critical inputs for short-term anomaly detection (Figure 2), the high deviation of
measurements resulted since the temperature sensor was more affected by the environmental conditions
(measurements were conducted in winter solstice) than by the actual temperature output of the machine. Hence,
as calculated by the correlation analysis, temperature input will have a negligible effect or not be relevant to
the health state classification output. Thus, in the specified use case, the temperature selection as an input
feature is expected to be biased and negatively affect the predictive algorithm for the health state classification.
Table 2. Spearman statistical analysis results
Variable Correlation Variable Correlation
velocity_ISO 0.865 peak-to-peak acceleration 0.849
rms_demodulation 0.856 temperature -0.435
rms_acceleration 0.832
3.2. Predictive model training
The distinctive models, namely RF, Naïve Bayes, SVM, and XGBoost were trained based on the
Scikit-learn default parameters configurations. The collected dataset depicting 857 running state measurements

857
of both types of machinery was split into 80% for training purposes and 20% for validation purposes,
respectively. It is worth mentioning that the proposed research handles a balanced dataset with relatively equal
amounts of measurements both for healthy and maintenance-prone states of machinery. In case the input dataset
was biased, meaning it depicted a specific state more frequently, the ML model would potentially learn always
to predict that state outputting and overall poor performance in new out-of-sample inputs. In such cases,
trimming away samples from the high-frequency labels (undersampling) or using class weights, weighting
appropriate outputs that occur more frequently with a fraction of 1 is preferred. Moreover, this approach defined
the ‘shuffle’ parameter as ‘True’ during the training set. This is an essential aspect of training optimization.
The input dataset is more likely to become biased towards any class if the specific label is seen more frequently
towards the end of the training, even if the dataset is appropriately balanced. This occurs because the model
will learn that the quickest way to reduce the overall loss is to predict the class seen in the specific train batches
more frequently. This will conclude in training loss spikes, and the model will most likely cycle through local
minimums outputting the label that is currently being repeated and never output the global minimum depicting
the optimal model. Therefore, shuffling training samples in combination with the outputted targets will
potentially enhance the performance of the health state classification model.
3.3. Model evaluation
Several evaluation metrics calculate the performance of prediction outputs for an experimented model.
The evaluation is performed in the validation dataset, meaning that it occurred in a part of the overall collected
measurements. In our case, health state classification denotes the predicted output. Hence, binary classification
evaluation methods will accurately represent the overall system performance. The core of evaluation methods
is the true negatives (TN), true positives (TP), false negatives (FN), and false positive (FP) values depicting
the binary outcome. Respectively, TN refers to negative outputs, correctly classified as negative by the model,
TP refers to positive results, correctly classified as positive, while FN means negative classes incorrectly
classified as positive and FP positive classes incorrectly classified as negative [31]. Regarding health state
classification outcome, TN value refers to a healthy pump with a condition output value of 0. In contrast, TP
refers to a maintenance-prone centrifugal pump with a condition output value 1. More specifically, the metrics
implemented in our research were accuracy, precision, recall, F1-score, and Cohen Kappa score [32].
Accuracy =
𝑇𝑃+𝑇𝑁
𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁
(6)
Precision =
𝑇𝑃
𝑇𝑃+𝐹𝑃
(7)
Recall =
𝑇𝑃
𝑇𝑃+𝐹𝑁
(8)
F1-score =
2∗𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛∗𝑅𝑒𝑐𝑎𝑙𝑙
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙
(9)
Cohen Kappa score =
𝑝0− 𝑝𝑒
1− 𝑝𝑒
(10)
Where po refers to relative observed agreement and pe refers to hypothetical probability of chance agreement.
4. RESULTS AND DISCUSSION
RF, Naïve Bayes, SVM and XGBoost models were developed for a CBM approach predicting the
health state of centrifugal pumps. Two different approaches were compared, one using velocity_ISO,
rms_demodulation, rms_acceleration, peak-to-peak acceleration, and temperature as input features and one
removing the temperature parameter, following Spearman correlation analysis. Insightful metrics (accuracy,
precision, recall, F1-score, and Cohen Kappa score) were implemented to evaluate the predictive output. The
predicted output denotes a binary system outputting healthy and maintenance-prone conditions. Table 3
presents the results of each model without temperature as an input feature. XGBoost with 98.83% accuracy
and 98.89% F1-score and RF with 98.25% accuracy and 98.33% F1-score outputted the best results accurately
classifying the health state of validation measurements. Similarly, in Table 4 XGBoost with 99.41% accuracy
and 99.31% F1-score and RF with 98.83% accuracy and 98.59% F1-score outputted the best overall results.
Moreover, in the second set of experiments, the predictive models outputted more accurate results by removing
the temperature parameter as an input feature verifying the conclusions of Spearman analysis. Figure 3 presents
the overall accuracy performance depicting the results prior to and following feature selection.
Based on the outputted evaluations in both experiments, we notice a recall outcome of 100%, except
for SVM with temperature as an input parameter which outputted occasions of FN values. Hence, most of the
misclassification outputs have been attributed to FP values (error in precision), meaning that the predicted

 ISSN: 2252-8938
858
model classified the condition of the centrifugal pump as a maintenance-prone state. At the same time, the
actual dataset depicted a healthy state value. Another conclusion worth discussing is that in both experiments,
the results were highly accurate (above 96%) without any hyperparameter fine-tuning for each model.
Table 3. Prediction outputs prior feature selection with velocity_ISO, rms_demodulation, rms_acceleration,
peak-to-peak acceleration and temperature as input features
Algorithm Accuracy Precision Recall F1-score Cohen Kappa score
Random Forest 98.25% 96.73% 100% 98.33% 96.50%
Support Vector Machines 96.51% 97.77% 95.50% 96.66% 93.02%
Naïve Bayes 96.51 % 92% 100% 95.83% 92.84%
XGBoost 98.83 % 97.82% 100% 98.89% 97.43%
Furthermore, based on a balanced dataset, the machine learning models could recognize the critical
inputs differentiating healthy state and maintenance-prone measurements and correctly identifying the
condition of centrifugal pumps in most input measurements. Moreover, the accuracy of predictions additionally
verifies the importance of collecting vibration and temperature measurements in manufacturing processes.
Overall results confirmed the efficiency of the proposed framework following feature selection and XGBoost
algorithm implementation in industrial applications. Furthermore, authors’ findings aligned with previous
research where XGBoost was effectively applied in socioeconomical aspects namely medicine [33], [34],
economy [35], cybersecurity [36], language processing [37] and environmental applications [38]. Regarding
medical applications, feature selection and XGBoost was considered the most effective solution for heart
disease classification with 99.6% accuracy [39] improving the solution of [40] where the proposed decision
trees provided 97.75% accuracy. Additionally, a similar framework was implemented in [41] for diabetes
prediction with the presented approach resulting in an area under curve (AUC) of 82%. However, in the cases
of [42] and [43] facing glycose levels predictions, XGBoost was not considered the optimal solution and DNN
and RF were the selected algorithms respectively. Furthermore, the XGBoost algorithm was selected for
pregnancy risk monitoring with 96% accuracy [44] whereas an improvement of the proposed approach
combining CNN and XGBoost methodology was proposed for renal stone diagnosis [45], breast cancer
detection [46] and image classification [47] with accuracies of 99.5 %. Finally, feature selection combined
with ensemble learning was proposed for epileptic seizure detection and classification from
electroencephalogram signals with an effectiveness of 96% [48], [49]. Similarly, feature selection and
ensemble learning were additionally proposed in the economic sector. A light gradient boosting algorithm for
risk analysis [50] and a cost sensitive-XGBoost approach for bankruptcy prediction [51] were developed with
highly accurate results (around 95%) whilst [52] improved those outputs with an accuracy rate of 97%. On the
other hand, principal component analysis (PCA) for feature selection [53] and Bayesian hyper-parameter
optimization [54] were proposed in conjunction with XGBoost as optimal solutions for crowdfunding and
credit worthiness prediction respectively. In the case of cybersecurity and language processing applications,
the extensive appliance of XGBoost has been identified in denial service attacks distinguishing traffic requests
from malicious or not. Both [36] and [55] highlighted the effectiveness of the proposed methodology by
combining feature selection and XGBoost with overall performance accuracy of 99%. Furthermore, in [56]
multilayer perceptron (MLP) slightly outperformed the XGBoost methodology with 99.3 % precision and was
suggested by the authors as the optimal solution. Moreover, regarding language processing, XGBoost was
proposed for sentiment features selection [37] and text similarity identification [57] with an F1 score of 69%
and 89% respectively, whereas in [58] ANN was selected for human speech recognition with 77% precision.
Finally, in terms of environmental applications, [59] combined a grid search algorithm and XGBoost model
for hyperparameter fine tuning and electricity load prediction respectively, similarly [60] proved that ensemble
techniques provide an efficient solution for solar radiation forecasting. Additionally, [61], [62], highlighted the
effectiveness of ensemble methods on classification predictions combining XGBoost with ML algorithms for
land use and rice leaf disease identification respectively.
Nevertheless, collecting real-time sensor measurements and the raw data transformation in a machine-
comprehensive format can be challenging. The proposed research integrated the proprietary GX works
software tool and custom-built Python scripts to enable an interoperable real-time connection between the
framework and Mitsubishi's smart sensor kit. The mapping of the appropriate input registers and the correct
representation of sensor values in the Jupyter Notebook was a demanding and time-consuming task identifying
the complexity of real-time data collection applications. Furthermore, another challenge is implementing the
proposed method and selecting informative features for health state prediction in other industrial applications
and machinery besides centrifugal pumps. Hence, through Mitsubishi's smart sensor kit, vibration and
temperature measurements were collected from centrifugal pumps depicting the condition and health state of

859
the machine. The developed predictive model could accurately predict the machine's condition in future cases
when handling out-of-sample measurements. Therefore, it could alarm the maintenance personnel and increase
the resilience and safety of the overall system in the optimal timeline.
Table 4. Prediction outputs following feature selection, with velocity_ISO, rms_demodulation,
rms_acceleration, peak-to-peak acceleration as input features
Algorithm Accuracy Precision Recall F1-score Cohen Kappa score
Random Forest 98.83% 97.22% 100% 98.59% 98.15%
Support Vector Machines 98.26% 95.83% 100% 97.87% 96.39%
Naïve Bayes 98.25% 96.20% 100% 98.06% 96.47%
XGBoost 99.41% 98.64% 100% 99.31% 98.88%
Figure 3. Overall performance of experimented output
5. CONCLUSION AND FUTURE DIRECTIONS
PdM constitutes one of the promising future concepts for a resilient and reliable industrial
environment. The uprise of digitalization, big data analysis, and AI enhance the implementation of PdM
applications assessing the health state conditions of machinery. The proposed approach focuses on developing
a condition-based maintenance solution predicting the health state of centrifugal pumps based on vibration and
temperature measurements. Our research proposes a robust PdM framework for health state prediction using
feature selection and model optimization integrable into industrial facility layouts and infrastructures. Several
machine learning models, namely Random Forest, Support Vector Machines, Naïve Bayes, and XGBoost, were
implemented and evaluated. Additionally, feature selection was performed using Spearman correlation
analysis, leading to model optimization. XGBoost outperformed the experimented models outputting as high
as 99.41% accuracy in the validation sample. This research verified the implementation of ML algorithms,
efficiently assessing centrifugal pumps' health state, increasing the safety and productivity in a sustainable and
resilient industrial ecosystem and providing a comprehensive framework for CBM and health state prediction
applications. Additionally, experimented outputs highlighted the importance of feature selection for model
optimization and training using a balanced dataset, as suggested in previous sections depicting unbiased
measurements regarding both health state conditions. Upcomming researchers and field technicians can benefit
from the proposed framework and presented health state prediction methodology due to its resilience and
viability on similar industrial applications. Future work will focus on improving the system performance and
determining the RUL of centrifugal pumps. Additionally, the selection of the most appropriate AI algorithms
will become a challenging issue and a hot topic of interest for future research. DL models and ANN are
expected to provide accurate predictions due to their capability of handling high volume and regression data.
Moreover, model hyperparameter fine-tuning is suggested to improve the classification, as mentioned above.
The ML algorithms applied (RF, Naïve Bayes, SVM, and XGBoost) were trained based on the Scikit-learn
default parameters configurations. Thus, additional testing on various hyperparameter combinations can further
enhance the accuracy of the proposed system. Finally, a promising research direction is the automatic selection
of optimal hyperparameters using genetic algorithms. A guideline of hyperparameter selection based on each
application can provide researchers with a beneficial contribution to PdM approaches and promote the concept
of PdM in an innovative and resilient manufacturing environment.
ACKNOWLEDGEMENTS
We sincerely thank UTECO SA team, who generously provided knowledge, expertise, hardware and
software equipment for the implementation of this research. UTECO SA is a technical equipment provider in

 ISSN: 2252-8938
860
the field of automation with vast expertise in industrial, marine applications and infrastructure projects.
Additionally, we would like to thank Mr. Alexandros Kolokas and Mr. Michael Koutsiantzis for their valuable
contribution in configuring the system during their undergraduate thesis.
REFERENCES
[1] X. Xu, Y. Lu, B. Vogel-Heuser, and L. Wang, “Industry 4.0 and Industry 5.0—Inception, conception and perception,” Journal of
Manufacturing Systems, vol. 61, pp. 530–535, Oct. 2021, doi: 10.1016/j.jmsy.2021.10.006.
[2] S. Fosso Wamba, S. Akter, A. Edwards, G. Chopin, and D. Gnanzou, “How ‘big data’ can make big impact: Findings from a
systematic review and a longitudinal case study,” International Journal of Production Economics, vol. 165, pp. 234–246, Jul. 2015,
doi: 10.1016/j.ijpe.2014.12.031.
[3] J. Leukel, J. González, and M. Riekert, “Adoption of machine learning technology for failure prediction in industrial maintenance:
A systematic review,” Journal of Manufacturing Systems, vol. 61, pp. 87–96, Oct. 2021, doi: 10.1016/j.jmsy.2021.08.012.
[4] H. Nordal and I. El‐Thalji, “Modeling a predictive maintenance management architecture to meet industry 4.0 requirements: A case
study,” Systems Engineering, vol. 24, no. 1, pp. 34–50, Jan. 2021, doi: 10.1002/sys.21565.
[5] Y. Maher and B. Danouj, “Survey on Deep Learning applied to predictive maintenance,” International Journal of Electrical and
Computer Engineering (IJECE), vol. 10, no. 6, p. 5592, Dec. 2020, doi: 10.11591/ijece.v10i6.pp5592-5598.
[6] A. K. S. Jardine, D. Lin, and D. Banjevic, “A review on machinery diagnostics and prognostics implementing condition-based
maintenance,” Mechanical Systems and Signal Processing, vol. 20, no. 7, pp. 1483–1510, Oct. 2006, doi: 10.1016/j.ymssp.2005.09.012.
[7] J. Fernandes, J. Reis, N. Melão, L. Teixeira, and M. Amorim, “The Role of Industry 4.0 and BPMN in the Arise of Condition-Based
and Predictive Maintenance: A Case Study in the Automotive Industry,” Applied Sciences, vol. 11, no. 8, p. 3438, Apr. 2021, doi:
10.3390/app11083438.
[8] A. Mosallam, K. Medjaher, and N. Zerhouni, “Data-driven prognostic method based on Bayesian approaches for direct remaining useful
life prediction,” Journal of Intelligent Manufacturing, vol. 27, no. 5, pp. 1037–1048, Oct. 2016, doi: 10.1007/s10845-014-0933-4.
[9] T. Zonta, C. A. da Costa, R. da Rosa Righi, M. J. de Lima, E. S. da Trindade, and G. P. Li, “Predictive maintenance in the Industry 4.0:
A systematic literature review,” Computers & Industrial Engineering, vol. 150, p. 106889, Dec. 2020, doi: 10.1016/j.cie.2020.106889.
[10] T. P. Carvalho, F. A. A. M. N. Soares, R. Vita, R. da P. Francisco, J. P. Basto, and S. G. S. Alcalá, “A systematic literature review
of machine learning methods applied to predictive maintenance,” Computers and Industrial Engineering, vol. 137, 2019, doi:
10.1016/j.cie.2019.106024.
[11] Y. Peng, M. Dong, and M. J. Zuo, “Current status of machine prognostics in condition-based maintenance: a review,” The International
Journal of Advanced Manufacturing Technology, vol. 50, no. 1–4, pp. 297–313, Sep. 2010, doi: 10.1007/s00170-009-2482-0.
[12] Y. Lei, N. Li, L. Guo, N. Li, T. Yan, and J. Lin, “Machinery health prognostics: A systematic review from data acquisition to RUL
prediction,” Mechanical Systems and Signal Processing, vol. 104, pp. 799–834, May 2018, doi: 10.1016/j.ymssp.2017.11.016.
[13] E. Sanz, J. Blesa, and V. Puig, “BiDrac Industry 4.0 framework: Application to an Automotive Paint Shop Process,” Control
Engineering Practice, vol. 109, p. 104757, Apr. 2021, doi: 10.1016/j.conengprac.2021.104757.
[14] B. van Oudenhoven,P. Van de Calseyde, R. Basten, and E. Demerouti, “Predictive maintenance for industry 5.0: behavioural inquiries from
a work system perspective,” International Journal of Production Research, pp. 1–20, Dec. 2022, doi: 10.1080/00207543.2022.2154403.
[15] Z. Xu and J. H. Saleh, “Machine learning for reliability engineering and safety applications: Review of current status and future
opportunities,” Reliability Engineering & System Safety, vol. 211, p. 107530, Jul. 2021, doi: 10.1016/j.ress.2021.107530.
[16] F. Arellano-Espitia, M. Delgado-Prieto, V. Martinez-Viol, J. J. Saucedo-Dorantes, and R. A. Osornio-Rios, “Deep-Learning-Based
Methodology for Fault Diagnosis in Electromechanical Systems,” Sensors, vol. 20, no. 14, p. 3949, Jul. 2020, doi: 10.3390/s20143949.
[17] K. S. Kiangala and Z. Wang, “An Effective Predictive Maintenance Framework for Conveyor Motors Using Dual Time-Series
Imaging and Convolutional Neural Network in an Industry 4.0 Environment,” IEEE Access, vol. 8, pp. 121033–121049, 2020, doi:
10.1109/ACCESS.2020.3006788.
[18] S. Selvaraj, B. Prabhu Kavin, C. Kavitha, and W.-C. Lai, “A Multiclass Fault Diagnosis Framework Using Context-Based Multilayered
Bayesian Method for Centrifugal Pumps,” Electronics, vol. 11, no. 23, p. 4014, Dec. 2022, doi: 10.3390/electronics11234014.
[19] J. Osborne, Best Practices in Data Cleaning: A Complete Guide to Everything You Need to Do Before and After Collecting Your Data.
2455 Teller Road, Thousand Oaks California 91320 United States: SAGE Publications, Inc., 2013. doi: 10.4135/9781452269948.
[20] D. J. S. Chong, Y. J. Chan, S. K. Arumugasamy, S. K. Yazdi, and J. W. Lim, “Optimisation and performance evaluation of response
surface methodology (RSM), artificial neural network (ANN) and adaptive neuro-fuzzy inference system (ANFIS) in the prediction of
biogas production from palm oil mill effluent (POME),” Energy, vol. 266, p. 126449, Mar. 2023, doi: 10.1016/j.energy.2022.126449.
[21] Z. Li, E. Kristoffersen, and J. Li, “Deep transfer learning for failure prediction across failure types,” Computers & Industrial
Engineering, vol. 172, p. 108521, Oct. 2022, doi: 10.1016/j.cie.2022.108521.
[22] W. Luo, T. Hu, Y. Ye, C. Zhang, and Y. Wei, “A hybrid predictive maintenance approach for CNC machine tool driven by Digital
Twin,” Robotics and Computer-Integrated Manufacturing, vol. 65, p. 101974, Oct. 2020, doi: 10.1016/j.rcim.2020.101974.
[23] A. Lis, Z. Dworakowski, and P. Czubak, “An anomaly detection method for rotating machinery monitoring based on the most
representative data,” Journal of Vibroengineering, vol. 23, no. 4, pp. 861–876, Jun. 2021, doi: 10.21595/jve.2021.21622.
[24] D. Cardoso and L. Ferreira, “Application of Predictive Maintenance Concepts Using Artificial Intelligence Tools,” Applied
Sciences, vol. 11, no. 1, p. 18, Dec. 2020, doi: 10.3390/app11010018.
[25] M. Cakir, M. A. Guvenc, and S. Mistikoglu, “The experimental application of popular machine learning algorithms on predictive
maintenance and the design of IIoT based condition monitoring system,” Computers & Industrial Engineering, vol. 151, p. 106948,
Jan. 2021, doi: 10.1016/j.cie.2020.106948.
[26] J.-R. Ruiz-Sarmiento, J. Monroy, F.-A. Moreno, C. Galindo, J.-M. Bonelo, and J. Gonzalez-Jimenez, “A predictive model for the
maintenance of industrial machinery in the context of industry 4.0,” Engineering Applications of Artificial Intelligence, vol. 87, p.
103289, Jan. 2020, doi: 10.1016/j.engappai.2019.103289.
[27] C. Tutivén, Y. Vidal, A. Insuasty, L. Campoverde-Vilela, and W. Achicanoy, “Early Fault Diagnosis Strategy for WT Main Bearings
Based on SCADA Data and One-Class SVM,” Energies, vol. 15, no. 12, p. 4381, Jun. 2022, doi: 10.3390/en15124381.
[28] B. E. Boser, I. M. Guyon, and V. N. Vapnik, “A training algorithm for optimal margin classifiers,” in Proceedings of the fifth annual
workshop on Computational learning theory, Jul. 1992, pp. 144–152. doi: 10.1145/130385.130401.
[29] C. Cortes and V. Vapnik, “Support-vector networks,” Machine Learning, vol. 20, no. 3, pp. 273–297, Sep. 1995, doi:
10.1007/BF00994018.
[30] A. V. Dorogush, V. Ershov, and A. Gulin, “CatBoost: gradient boosting with categorical features support,” 2018, [Online].
Available: http://arxiv.org/abs/1810.11363

861
[31] Z. Allah Bukhsh, A. Saeed, I. Stipanovic, and A. G. Doree, “Predictive maintenance using tree-based classification techniques: A
case of railway switches,” Transportation Research Part C: Emerging Technologies, vol. 101, pp. 35–54, Apr. 2019, doi:
10.1016/j.trc.2019.02.001.
[32] J. Cohen, “A Coefficient of Agreement for Nominal Scales,” Educational and Psychological Measurement, vol. 20, no. 1, pp. 37–
46, Apr. 1960, doi: 10.1177/001316446002000104.
[33] R. D. Abdualjabar and O. A. Awad, “Parallel extreme gradient boosting classifier for lung cancer detection,” Indonesian Journal
of Electrical Engineering and Computer Science, vol. 24, no. 3, p. 1610, Dec. 2021, doi: 10.11591/ijeecs.v24.i3.pp1610-1617.
[34] S. Sengsri and K. Khunratchasana, “Comparison of machine learning algorithms with regression analysis to predict the COVID-19
outbreak in Thailand,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 31, no. 1, p. 299, Jul. 2023, doi:
10.11591/ijeecs.v31.i1.pp299-304.
[35] M. Vasudevan, R. S. Narayanan, S. F. Nakeeb, and A. Abhishek, “Customer churn analysis using XGBoosted decision trees,” Indonesian
Journal of Electrical Engineering and Computer Science, vol. 25, no. 1, p. 488, Jan. 2022, doi: 10.11591/ijeecs.v25.i1.pp488-495.
[36] S. A. M. Al-Juboori, F. Hazzaa, Z. S. Jabbar, S. Salih, and H. M. Gheni, “Man-in-the-middle and denial of service attacks detection
using machine learning algorithms,” Bulletin of Electrical Engineering and Informatics, vol. 12, no. 1, pp. 418–426, Feb. 2023, doi:
10.11591/eei.v12i1.4555.
[37] A. Samih, A. Ghadi, and A. Fennan, “Enhanced sentiment analysis based on improved word embeddings and XGboost,” International
Journal of Electrical and Computer Engineering (IJECE), vol. 13, no. 2, p. 1827, Apr. 2023, doi: 10.11591/ijece.v13i2.pp1827-1836.
[38] A. Mbarek, M. Jiber, A. Yahyaouy, and A. Sabri, “Black spots identification on rural roads based on extreme learning machine,”
International Journal of Electrical and Computer Engineering (IJECE), vol. 13, no. 3, p. 3149, Jun. 2023, doi:
10.11591/ijece.v13i3.pp3149-3160.
[39] T. A. Assegie, A. O. Salau, C. O. Omeje, and S. L. Braide, “Multivariate sample similarity measure for feature selection with a
resemblance model,” International Journal of Electrical and Computer Engineering (IJECE), vol. 13, no. 3, p. 3359, Jun. 2023,
doi: 10.11591/ijece.v13i3.pp3359-3366.
[40] S. Molla et al., “A predictive analysis framework of heart disease using machine learning approaches,” Bulletin of Electrical
Engineering and Informatics, vol. 11, no. 5, pp. 2705–2716, Oct. 2022, doi: 10.11591/eei.v11i5.3942.
[41] T. A. Assegie, T. Karpagam, R. Mothukuri, R. L. Tulasi, and M. Fentahun Engidaye, “Extraction of human understandable insight
from machine learning model for diabetes prediction,” Bulletin of Electrical Engineering and Informatics, vol. 11, no. 2, pp. 1126–
1133, Apr. 2022, doi: 10.11591/eei.v11i2.3391.
[42] G. Alfian, Y. M. Saputra, L. Subekti, A. D. Rahmawati, F. T. D. Atmaji, and J. Rhee, “Utilizing deep neural network for web-based
blood glucose level prediction system,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 30, no. 3, p. 1829,
Jun. 2023, doi: 10.11591/ijeecs.v30.i3.pp1829-1837.
[43] R. Mahzabin, F. H. Sifat, S. Anjum, A.-A. Nayan, and M. G. Kibria, “Blockchain associated machine learning and IoT based
hypoglycemia detection system with auto-injection feature,” Indonesian Journal of Electrical Engineering and Computer Science,
vol. 27, no. 1, p. 447, Jul. 2022, doi: 10.11591/ijeecs.v27.i1.pp447-455.
[44] M. Irfan, S. Basuki, and Y. Azhar, “Giving more insight for automatic risk prediction during pregnancy with interpretable machine
learning,” Bulletin of Electrical Engineering and Informatics, vol. 10, no. 3, Jun. 2021, doi: 10.11591/eei.v10i3.2344.
[45] N. H. Alkurdy, H. K. Aljobouri, and Z. K. Wadi, “Ultrasound renal stone diagnosis based on convolutional neural network and
VGG16 features,” International Journal of Electrical and Computer Engineering (IJECE), vol. 13, no. 3, p. 3440, Jun. 2023, doi:
10.11591/ijece.v13i3.pp3440-3448.
[46] E. Sugiharti, R. Arifudin, D. T. Wiyanti, and A. B. Susilo, “Integration of convolutional neural network and extreme gradient
boosting for breast cancer detection,” Bulletin of Electrical Engineering and Informatics, vol. 11, no. 2, pp. 803–813, Apr. 2022,
doi: 10.11591/eei.v11i2.3562.
[47] A. A. Aminu, N. N. Agwu, S. Adeshina, and M. K. Ahmed, “Detection of image manipulation with convolutional neural network
and local feature descriptors,” TELKOMNIKA (Telecommunication Computing Electronics and Control), vol. 20, no. 3, p. 629, Jun.
2022, doi: 10.12928/telkomnika.v20i3.23318.
[48] C. Rachappa, M. Kapanaiah, and V. Nagaraju, “Hybrid ensemble learning framework for epileptic seizure detection using
electroencephalograph signals,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 28, no. 3, p. 1502, Dec.
2022, doi: 10.11591/ijeecs.v28.i3.pp1502-1509.
[49] M. Panigrahi, D. K. Behera, and K. C. Patra, “Epileptic seizure classification of electroencephalogram signals using extreme
gradient boosting classifier,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 25, no. 2, p. 884, Feb. 2022,
doi: 10.11591/ijeecs.v25.i2.pp884-891.
[50] M. Munsarif, M. Sam’an, and S. Safuan, “Peer to peer lending risk analysis based on embedded technique and stacking ensemble learning,”
Bulletin of Electrical Engineering and Informatics, vol. 11, no. 6, pp. 3483–3489, Dec. 2022, doi: 10.11591/eei.v11i6.3927.
[51] W. Yotsawat, K. Phodong, T. Promrat, and P. Wattuya, “Bankruptcy prediction model using cost-sensitive extreme gradient
boosting in the context of imbalanced datasets,” International Journal of Electrical and Computer Engineering (IJECE), vol. 13,
no. 4, p. 4683, Aug. 2023, doi: 10.11591/ijece.v13i4.pp4683-4691.
[52] M. A. Muslim and Y. Dasril, “Company bankruptcy prediction framework based on the most influential features using XGBoost
and stacking ensemble learning,” International Journal of Electrical and Computer Engineering (IJECE), vol. 11, no. 6, p. 5549,
Dec. 2021, doi: 10.11591/ijece.v11i6.pp5549-5557.
[53] S. P. Raflesia, D. Lestarini, R. D. Kurnia, and D. Y. Hardiyanti, “Using machine learning approach towards successful crowdfunding
prediction,” Bulletin of Electrical Engineering and Informatics, vol. 12, no. 4, pp. 2438–2445, Aug. 2023, doi: 10.11591/eei.v12i4.5238.
[54] W. Yotsawat, P. Wattuya, and A. Srivihok, “Improved credit scoring model using XGBoost with Bayesian hyper-parameter
optimization,” International Journal of Electrical and Computer Engineering (IJECE), vol. 11, no. 6, p. 5477, Dec. 2021, doi:
10.11591/ijece.v11i6.pp5477-5487.
[55] M. K. Kareem, O. D. Aborisade, S. A. Onashoga, T. Sutikno, and O. M. Olayiwola, “Efficient model for detecting application layer
distributed denial of service attacks,” Bulletin of Electrical Engineering and Informatics, vol. 12, no. 1, pp. 441–450, Feb. 2023, doi:
10.11591/eei.v12i1.3871.
[56] S. Chimphlee and W. Chimphlee, “Machine learning to improve the performance of anomaly-based network intrusion detection in big
data,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 30, no. 2, p. 1106, May 2023, doi:
10.11591/ijeecs.v30.i2.pp1106-1119.
[57] M. Hammad, M. Al-Smadi, Q. B. Baker, and S. A. Al-Zboon, “Using deep learning models for learning semantic text similarity of
Arabic questions,” International Journal of Electrical and Computer Engineering (IJECE), vol. 11, no. 4, p. 3519, Aug. 2021, doi:
10.11591/ijece.v11i4.pp3519-3528.
[58] Z. J. Mohammed Ameen and A. Abdulrahman Kadhim, “Machine learning for Arabic phonemes recognition using electrolarynx
speech,” International Journal of Electrical and Computer Engineering (IJECE), vol. 13, no. 1, p. 400, Feb. 2023, doi:

 ISSN: 2252-8938
862
10.11591/ijece.v13i1.pp400-412.
[59] N. T. Tran, T. T. G. Tran, T. A. Nguyen, and M. B. Lam, “A new grid search algorithm based on XGBoost model for load forecasting,”
Bulletin of Electrical Engineering and Informatics, vol. 12, no. 4, pp. 1857–1866, Aug. 2023, doi: 10.11591/eei.v12i4.5016.
[60] D. P. Mishra, S. Jena, R. Senapati,A. Panigrahi, and S. R. Salkuti, “Globalsolar radiation forecast usinganensemble learningapproach,”
International Journal of Power Electronics and Drive Systems (IJPEDS), vol. 14, no. 1, p. 496, Mar. 2023, doi:
10.11591/ijpeds.v14.i1.pp496-505.
[61] S. Swetanisha, A. R. Panda, and D. K. Behera, “Land use/land cover classification using machine learning models,” International
Journal of Electrical and Computer Engineering (IJECE), vol. 12, no. 2, p. 2040, Apr. 2022, doi: 10.11591/ijece.v12i2.pp2040-2046.
[62] M. A. Azim, M. K. Islam, M. M. Rahman, and F. Jahan, “An effective feature extraction method for rice leaf disease classification,”
TELKOMNIKA (Telecommunication Computing Electronics and Control), vol. 19, no. 2, p. 463, Apr. 2021, doi:
10.12928/telkomnika.v19i2.16488.
BIOGRAPHIES OF AUTHORS
Panagiotis Mallioris holds a B.Sc. degree from the Department of Industrial
Engineering and Management, School of Engineering, International Hellenic University
(IHU). Currently, he is a Ph.D. candidate conducting his research titled “Big Data
Visualization and Analysis Methodologies in Production Systems and Value Chains”. His
work focuses on big data analysis, artificial intelligence, data mining and predictive
maintenance. He has supervised numerous undergraduate thesis projects regarding predictive
maintenance and artificial intelligence algorithms in modern industry. Mr. Mallioris
participates in a national-funded research project as a research associate named Quality
Control of Production Processes by Using an Integrated Decision Support System (ProDSS)
National Strategic Reference Framework (ESPA) 2020-2022. He can be contacted at email:
panmalliw@gmail.com.
Evangelos Diamantis is an experienced Automation & Control Engineer,
working at UTECO S.A. During the last years, he has coped with several demanding roles
such as Sr. Automation Engineer, Technical Manager and Automation, Power and Motion
Business Unit Director, participating at the same time in a significant number of research
projects across EU. He holds a B.Eng. in Automation & Control Engineering from the
Department of Automation & Control Engineering, Central Greece University of Applied
Sciences, a M.Sc. in Informatics with a specialization in ML, from the Department of
Informatics, University of Piraeus, a M.Sc. in Project Management with a specialization in
PdM, from the Department of Industrial Management and Technology, University of Piraeus,
and an Executive M.B.A. from the Department of Business Administration, Athens
University of Economic and Business. His expertise area is the application of ML and DNN
models to autonomous robotic systems as well as the PdM models development. He can be
contacted via email: vagdiam@hotmail.com.
Christos Bialas is an Assistant Professor at the Industrial Engineering and
Management Department of the International Hellenic University, Greece. He holds a Ph.D.
in SC Management from the Department of Applied Informatics, University of Macedonia,
Greece, an M.Sc. degree in Industrial Engineering and Management from RWTH Aachen,
Germany, and a Diploma in Electrical and Computer Engineering from the Department of
Electrical and Computer Engineering, Aristotle University of Thessaloniki, Greece. He has
many years of experience in managing large-scale ERP Systems implementation projects in
Europe and the US with a focus on supply chain management. His scientific interests cover
the fields of SCM, ERP systems, and project management. He has published his research
work in acknowledged peer-reviewed scientific journals. He can be contacted via email:
cbialas@ihu.gr.
Dimitrios Bechtsis is an Associate Professor (Applied Industrial Informatics in
Supply Chain Management) in the Division of Industrial Information Systems and
Management, Department of Industrial Engineering and Management, International Hellenic
University (IHU). He holds a PostDoc (Digital SC and Industrial Informatics) and Ph.D.
(Digital manufacturing methodologies and ΙΤ tools for the management and control of
autonomous vehicles in SC) from the Department of Mechanical Engineering, A.U.Th., an
M.Sc. degree in Medical Informatics from the Departments of Electrical and Computer
Engineering, Informatics and the School of Medicine, A.U.Th. and a Diploma in Electrical
and Computer Engineering from the Department of Electrical and Computer Engineering,
A.U.Th. He teaches information, database management and autonomous systems, and
industrial informatics. He can be contacted at email: dimbec@ihu.gr.

Predictive maintenance framework for assessing health state of centrifugal pumps

More Related Content

Similar to Predictive maintenance framework for assessing health state of centrifugal pumps (20)

More from IAESIJAI (20)

Recently uploaded (20)

Predictive maintenance framework for assessing health state of centrifugal pumps