SlideShare a Scribd company logo
Computer Science and Information Technologies
Vol. 5, No. 3, November 2024, pp. 205~214
ISSN: 2722-3221, DOI: 10.11591/csit.v5i3.pp205-214  205
Journal homepage: http://iaesprime.com/index.php/csit
Optimizing classification models for medical image diagnosis: a
comparative analysis on multi-class datasets
Abdul Rachman Manga, Aulia Putri Utami, Huzain Azis, Yulita Salim, Amaliah Faradibah
Department of Computer Engineering, Faculty of Computer Science, Universitas Muslim Indonesia, Makassar, Indonesia
Article Info ABSTRACT
Article history:
Received Dec 29, 2023
Revised Jul 25, 2024
Accepted Jul 29, 2024
The surge in machine learning (ML) and artificial intelligence has revolutionized
medical diagnosis, utilizing data from chest ct-scans, COVID-19, lung cancer,
brain tumor, and alzheimer parkinson diseases. However, the intricate nature of
medical data necessitates robust classification models. This study compares
support vector machine (SVM), naïve Bayes, k-nearest neighbors (K-NN),
artificial neural networks (ANN), and stochastic gradient descent on multi-class
medical datasets, employing data collection, Canny image segmentation, hu-
moment feature extraction, and oversampling/under-sampling for data balancing.
Classification algorithms are assessed via 5-fold cross-validation for accuracy,
precision, recall, and F-measure. Results indicate variable model performance
depending on datasets and sampling strategies. SVM, K-NN, ANN, and SGD
demonstrate superior performance on specific datasets, achieving accuracies
between 0.49 to 0.57. Conversely, naïve Bayes exhibits limitations, achieving
precision levels of 0.46 to 0.47 on certain datasets. The efficacy of oversampling
and under-sampling techniques in improving classification accuracy varies
inconsistently. These findings aid medical practitioners and researchers in
selecting suitable models for diagnostic applications.
Keywords:
Balancing
Machine learning
Medical images
Multiclass
Performance
This is an open access article under the CC BY-SA license.
Corresponding Author:
Aulia Putri Utami
Department of Computer Engineering, Faculty of Computer Science, Universitas Muslim Indonesia
Jl. Urip Sumohardjo No.km.5, Makassar, Sulawesi Selatan, 90231, Indonesia
Email: auliaputriutami.iclabs@umi.ac.id
1. INTRODUCTION
In the realm of medical diagnostics and patient care, the significance of accurate and timely disease
detection cannot be overstated [1], [2]. One of the pivotal tools in modern medicine is medical imaging,
particularly in the context of identifying diseases such as lung cancer, brain tumors, and chest abnormalities
[3]–[5]. These life-threatening conditions, affecting millions worldwide, require early diagnosis for effective
treatment and improved patient outcomes. Medical imaging not only aids in disease identification but also
guides medical practitioners in formulating precise treatment plans [6], [7]. The quality of healthcare
provided is significantly influenced by the robustness of the algorithms used in classifying and diagnosing
these conditions [8]. It is within this context that this research is conducted.
Despite the advances in medical imaging and the availability of diverse datasets, the classification of
medical images remains a challenging task [9]. A major challenge arises from the imbalanced distribution of
data in multi-class medical datasets. The rare occurrence of certain diseases in comparison to others often
leads to skewed class distributions, potentially affecting the performance of classification algorithms. The
need to accurately diagnose and classify instances of lung cancer, brain tumors, and chest abnormalities has
motivated this study. Furthermore, addressing the issue of class imbalance in medical datasets is crucial to
ensure that classification algorithms provide reliable results.
 ISSN: 2722-3221
Comput Sci Inf Technol, Vol. 5, No. 3, November 2024: 205-214
206
The primary objective of this research is to conduct a comprehensive performance analysis of
classification algorithms on an imbalanced multi-class medical dataset. The study aims to evaluate the
suitability and effectiveness of various classification algorithms in diagnosing medical conditions based on
medical images. The research endeavors to identify the strengths and weaknesses of these algorithms, with
the ultimate goal of enhancing the accuracy and reliability of medical image classification.
This research seeks to answer the fundamental question of how different classification algorithms
perform when applied to an imbalanced multi-class medical dataset encompassing lung cancer, brain tumors,
and chest abnormalities [10], [11]. In addition to this central inquiry, it aims to unravel the strengths and
weaknesses of individual algorithms support vector machine (SVM), machine learning (ML) in medicine:
Performance calculation of dementia prediction by SVM, k-nearest neighbors (K-NN), artificial neural
network (ANN), and stochastic gradient.
Descent (SGD) in the context of medical image classification, particularly addressing the challenges
posed by imbalanced class distributions [12]–[19]. Furthermore, the research explores the potential of K-fold
cross-validation with a value of 5 in mitigating class imbalance effects and enhancing algorithm
performance. By addressing these research questions, this study endeavors to offer valuable insights into the
performance of classification algorithms on imbalanced multi-class medical datasets, thus improving
diagnostic accuracy and healthcare quality.
The following details the methodology of this study, including the data collection process, image
segmentation techniques, feature extraction methods, and model evaluation metrics. The results will be
analyzed for each algorithm, followed by interesting conclusions and future implications.
2. METHOD
To provide a systematic and structured approach, this research adopts the methodological
framework illustrated in Figure 1. Figure 1 delineates the stages, starting from the collection of medical
image data to the classification performance evaluation. Detailed explanations for each stage are presented in
the following subsection.
Figure 1. Visualization of the research methodology flowchart
2.1. Medical issue data collection
The study used five medical image datasets with multiclass categories taken from Kaggle.com, with
varying number of classes. The Chest CT-Scan dataset has four classes with a total of 613 data, each of which has
an imbalanced data distribution. The COVID-19 data set has three classes with 251 data in total, as well as with a
imbalance in the distribution of data. The IQ-OTH/NCC-Lung Cancer dataset features three classes with a total of
1097 data points, similarly characterized by data distribution imbalance. Furthermore, the Brain Tumor
Classification (MRI) dataset is composed of four classes, including a total of 2870 data points with an imbalanced
Comput Sci Inf Technol ISSN: 2722-3221 
Optimizing classification models for medical image ... (Abdul Rachman Manga)
207
data distribution. Finally, the Alzheimer's Parkinson diseases dataset consists of three classes with a total of 6477
data, and an imbalanced distribution of data. In addition, the research applied oversampling and undersampling to
balance the data on all datasets [20]–[22]. This research begins with the data exploration stage to understand the
characteristics of the image datasets used. Medical issue data collection involves visualization as well as statistical
analysis to identify patterns, anomalies, and important information in data sets. General information on the datasets
used in this study can be found in Table 1.
Table 1. Information datasets
Datasets
Number of
cases
Number of
attribute
Number Atribute
characteristics
Missing
value
Name of class Number in each class
Chest CT-Scan 613 7 4
195
115
148
155
Numeric No
COVID-19 251
7 3
111
70
70
Numeric No
IQ- OTH/N CC
-Lung Cancer
1097 7 3
120
561
416
Numeric No
Brain Tumor Classification
(MRI)
2870 7 4
826
822
395
827
Numeric No
Alzheimer Parkins on
Diseases
6477 7 3
2561
3010
906
Numeric No
2.2. Pre-processing data
This research involves several stages of preprocessing, namely, feature segmentation, feature
extraction, and data balancing. Early stages in data preprocesing involve image segmentation using the
Canny method [23]. This step aims to separate objects from the background on the image, improve data
quality, and prepare them for the feature extract stage. The Canny algorithm belongs as a popular method in
edge detection on image processing, involving several stages such as smoothing with Gaussian filters,
gradient calculation, non-maximum suppression, and the application of thresholds to produce sharper edges
[24]. The mathematical formula underlying this method is listed in (1).
𝐸 (𝑥, 𝑦) = √𝐺𝑥 (𝑥, 𝑦)2 + 𝐺𝑦 (𝑥, 𝑦)2 (1)
Here, 𝐺𝑥 (𝑥, 𝑦)2 𝑎𝑛𝑑 𝐺𝑦(𝑥, 𝑦)2 espectively are the gradients of the image in the horizontal and vertical
directions. The results of image segmentation using the Canny method on medical datasets are shown in Figure 2.
Figure 2. Image segmentation results canny medical datasets
 ISSN: 2722-3221
Comput Sci Inf Technol, Vol. 5, No. 3, November 2024: 205-214
208
After the segmentation process, the next stage is the extracting of features using the hu-moment
method. Hu-moments is one of the methods used for extracting features of shapes or contours of objects in
images. This feature has invarian properties to translation, rotation, and scaling, so it is suitable for use in
shape recognition applications. The formula for calculating the center moment 𝜇𝑝𝑞 can be seen in (2).
ℎ𝑖𝑗 =
𝑀𝑖𝑗
𝑀00
(𝑖+𝑗)/2+1 (2)
Where 𝑥𝑐 and 𝑦𝑐 are the mass center of the image, 𝑝 + 𝑞 is the order of the moment, and 𝑓(𝑥, 𝑦) is
the pixel value on the coordinate (𝑥, 𝑦). Figure 3 shows a visualization of extracting humoment features using
Scatter Plot and Heatmap on each dataset.
Figure 3. Plot scatter visualization output extraction feature: hu-moment on each chest ct-scan dataset
Resampling, a concept in data science, refers to efforts aimed at maintaining a balance in the
distribution among different classes or labels within a dataset. This is particularly crucial in the context of
classification or data analysis involving imbalanced classes. You can observe the data resampling
visualization for under-sampling and over-sampling in Figure 4.
Under-sampling is a technique employed in machine learning to address class imbalance by
reducing the number of samples from the majority class. Conversely, over-sampling involves increasing the
number of samples in the minority class to achieve a balanced dataset. This balancing process is crucial to
prevent the model from exhibiting bias towards the majority class or disregarding the minority class. As
depicted in Table 2, implementing these strategies helps to mitigate potential biases and improve the model's
overall performance.
Comput Sci Inf Technol ISSN: 2722-3221 
Optimizing classification models for medical image ... (Abdul Rachman Manga)
209
(a) (b)
Figure 4. Data resampling visualization (a) Under-sampling and (b) Oversampling
Table 2. Data balancing
Balancing data
in class
Datasets
CHEST
CT-SCAN
COVID-19 IQ-OTH/NCC-lung
cancer
Brain tumor
classification
Alzheimer
parkinson diseases
Oversampling
195
195
195
195
111
111
111
561
561
561
827
827
827
3010
3010
3010
Undersampling
115
115
115
115
70
70
70
120
120
120
395
395
395
906
906
906
2.3. Classification
Classification is used to identify specific patterns or characteristics within data that distinguish each
class. By leveraging the information contained in the data, the classification function makes decisions
regarding the most appropriate class for new objects that have not been classified before. The classification
algorithms used in this study include SVM, naïve Bayes, K-NN, ANN, and SGD [25], [26].
SVM is a ML algorithm used for classification and regression tasks. The goal is to construct a
hyperplane that has the maximum margin between different classes in the dataset [27]. The margin is the
distance between the hyperplanes and the nearest points of each class. SVM can be used for both binary and
multi-class classification problems. SVM can also be applied to multi-class classification problems using
approaches such as one-versus- rest (OvR) or one-versus-one (OvO). Here is the basic SVM formula for the
problem of multiclass classification with the OvR approach can be seen in (3).
𝑦(𝑥) = 𝑎𝑟𝑔𝑚𝑎𝑥𝑖(𝑤𝑖 ∙ 𝑥 + 𝑏𝑖) (3)
Where, 𝑦(𝑥) is the predicted class or label for 𝑥 data, 𝑎𝑟𝑔𝑚𝑎𝑥𝑖 is the maximum argument
operation, which produces the index 𝑖 that produce the largest value among the calculated elements, 𝑤𝑖 are the
weight vectors associated with class 𝑖, 𝑥 are the vectors of the input data that are to be forecast, 𝑏𝑖 is the bias
or shift associated to class 𝑖.
naïve Bayes is a probabilistic classification algorithm based on the Bayes theorem. This algorithm
assumes that the features in the dataset are conditionally independent of the target class [28], [29]. Although
these assumptions are very simple and may not always be true, naïve Bayes often provides good performance
in many classification tasks, especially in the case of high-dimensional text and data. Naïve Bayes' basic
formula for classification can be seen in (4).
𝑃(𝐶|𝑋) =
(𝑃(𝑋|𝐶)𝑃(𝐶))
𝑃(𝑋)
(4)
Where, 𝑃(𝐶 | 𝑋) is a posterior probability, a class 𝐶 probability occurs on 𝑋 𝑑𝑎𝑡𝑎, 𝑃(𝑋|𝐶) is the
probability of the likelihood, that is, the probability of the data 𝑋 occurs if class 𝐶 occurs, 𝑃(𝐶) was a prior
probability that class 𝐶 occurred without additional information, and 𝑃(𝑋) was the Probability of data 𝑋
occurring, also called a normalization factor.
 ISSN: 2722-3221
Comput Sci Inf Technol, Vol. 5, No. 3, November 2024: 205-214
210
K-NN is a classification algorithm based on the distance between points in a feature space. To
classify a sample, this algorithm searches for the nearest sample in the exercise data set and takes the
majority of the class from those neighbours as a class prediction. The basic formula of K-NN for
classification can be seen in (5).
𝑦(𝑥) = 𝑚𝑜𝑑𝑒 ({𝑦𝑖| 𝑥𝑖 𝑖𝑠 𝑎 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑟 𝑘(𝑥)}) (5)
Where, 𝑦(𝑥) is the class prediction created for the input data 𝑥, 𝑦𝑖 is a class of the 𝑖 -neighbor of the 𝑥
input data, 𝑥𝑖 is the data neighbor of 𝑖 of the data input 𝑥, 𝑘(𝑥) is the number of nearest neighbors to be used in
the prediction for the 𝑥 entry data, and 𝑚𝑜𝑑𝑒(.) refers to the most frequently appearing value in the assembly.
ANN is a computing model inspired by biological neural tissue. It consists of layers of artificial
neurons that are interconnected [29]. Each neuron takes input, processes it, and gives its output to the next
neuron. ANN can be used for a variety of tasks, including classification. The basic ANN formula for
classification can be seen in (6).
𝑦(𝑥) = 𝑓(𝑤 ∙ 𝑥 + 𝑏) (6)
Where, 𝑦(𝑥) is the output or prediction generated by the model for input data 𝑥, 𝑓(.) is the activation
function, which transforms the input value into a more structured output, 𝑤 is the weight vector that connects
input 𝑥 to output 𝑦, and 𝑏 is the bias or shift added to the multiplication result 𝑤 ∙ 𝑥.
SGD is an optimization algorithm used to train a machine learning model, including a classification
model. This algority seeks a weight that minimizes a loss function through repeated iterations by updating the
weight using the gradient of a cost function. The basic SGD formula for the problem of multiclass
classification, in particular with the OvR approach, can be seen in (7).
𝑤𝑡+1 = 𝑤𝑡 − 𝜂 ∙ 𝛻𝐽𝑖(𝑤𝑡) (7)
Where, 𝑤𝑡+1 is the weight vector that is updated on iteration 𝑡 + 1 , 𝑤𝑡 is the weights vector on the
current iterations (iteration 𝑡) 𝜂 is the learning rate, which controls how much learning step is taken in
each iterated, and 𝛻𝐽𝑖(𝑤𝑡) is the gradient of the 𝐽𝑖(𝑤) loss function against the w -weight vector in the
training system 𝑖.
2.4. Evaluation matrics
Evaluating the performance of classification models heavily relies on evaluation metrics that
provide a comprehensive perspective. One such metric is Balanced Accuracy, which combines True Positive
Rate (accurate positive identification) and True Negative Rate (accurate negative identification), offering a
balanced view between both classes [27], [30], [31]. Additionally, Accuracy measures overall predictions,
while Precision emphasizes accurate positive identification. Recall, on the other hand, assesses the overall
identification of positive cases. Likewise, F-measure, by harmonizing Precision and Recall, provides a
holistic perspective. A strong understanding of these metrics is crucial for accurate interpretation and model
enhancement. The equations for Balanced Accuracy, Accuracy, Precision, Recall, and F-measure can be
found in (8) to (11).
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
(𝑇𝑃+𝑇𝑁)
(𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁)
(8)
𝑃𝑒𝑟𝑖𝑐𝑖𝑠𝑖𝑜𝑛 =
𝑇𝑃
(𝑇𝑃+𝐹𝑃)
(9)
𝑅𝑒𝑐𝑎𝑙𝑙 =
𝑇𝑃
(𝑇𝑃+𝐹𝑁)
(10)
𝐹 − 𝑚𝑒𝑎𝑠𝑢𝑟𝑒 =
2(𝑝𝑟𝑒𝑠𝑖𝑠𝑖×𝑟𝑒𝑐𝑎𝑙𝑙)
(𝑝𝑟𝑒𝑠𝑖𝑠𝑖+𝑟𝑒𝑐𝑎𝑙𝑙)
(11)
3. RESULTS AND DISCUSSION
The research findings provide a comprehensive performance analysis of various machine learning
algorithms on an imbalanced multi-class medical dataset. Three distinct scenarios, each employing a different
Comput Sci Inf Technol ISSN: 2722-3221 
Optimizing classification models for medical image ... (Abdul Rachman Manga)
211
data processing technique, were considered: no processing (Table 3), oversampling (Table 4), and
undersampling (Table 5). Here, we present the results and discuss their implications.
Table 3 presents the performance results of ML algorithms on the original dataset before any
processing. Notably, K-NN outperforms other algorithms across multiple metrics. It achieves the highest
balanced accuracy of 0.53, accuracy of 0.57, precision weighted of 0.66, recall weighted of 0.57, and F1
weighted of 0.56. This suggests that K-NN is well-suited for classifying lung cancer, brain tumors, and chest
abnormalities, showcasing its adaptability in a multi-class medical image classification context. On the other
hand, algorithms like SVM and naïve Bayes lag behind in performance. This may be attributed to their
limited ability to handle imbalanced datasets, resulting in suboptimal classification.
Table 3. Performance results before balancing the datasets
∑ Rata-rata SVM Naïve Bayes K-NN ANN SGD
Balencend accuracy
Accuracy
Precision weighted
Recall weighted
F1 weighted
0.43
0.52
0.43
0.52
0.42
0.44
0.47
0.48
0.47
0.4
0.53
0.57
0.56
0.57
0.56
0.52
0.56
0.56
0.56
0.55
0.52
0.56
0.56
0.56
0.55
Table 4 shows the performance results after applying oversampling to the dataset. K-NN maintains its top
position with the highest balanced accuracy (0.65), accuracy (0.65), and F1 weighted score (0.64). Oversampling
has significantly improved the performance of all algorithms by addressing the class imbalance issue. K-NN
effectively leverages the oversampled data to enhance its classification accuracy. While all algorithms benefit from
oversampling, K-NN continues to excel, highlighting its adaptability to changes in dataset characteristics.
Table 4. Performance results after oversampling the datasets
∑ Rata-rata SVM Naïve Bayes K-NN ANN SGD
Balencend accuracy
Accuracy
Precision weighted
Recall weighted
F1 weighted
0.5
0.5
0.45
0.5
0.42
0.46
0.46
0.45
0.46
0.39
0.65
0.65
0.66
0.65
0.64
0.57
0.57
0.57
0.57
0.56
0.57
0.57
0.57
0.57
0.56
Table 5 reveals the performance results after implementing undersampling. K-NN remains at the
forefront with an accuracy of 0.55 and an F1 weighted score of 0.54. Notably, other algorithms, including SVM
and naïve Bayes, show improvements compared to the original dataset, thanks to undersampling. Despite reducing
the training data volume, undersampling enhances the overall performance of these algorithms. However, K-NN
retains its superior performance, emphasizing its adaptability to different dataset characteristics.
Table 5. Performance results after undersampling the datasets
∑ Rata-rata SVM Naïve Bayes K-NN ANN SGD
Balencend accuracy
Accuracy
Precision weighted
Recall weighted
F1 weighted
0.49
0.49
0.46
0.49
0.41
0.46
0.46
0.45
0.46
0.39
0.55
0.55
0.55
0.55
0.54
0.55
0.55
0.55
0.55
0.54
0.55
0.55
0.55
0.55
0.54
Overall, these results consistently position K-NN as the top-performing algorithm in various multi-
class medical image classification scenarios, regardless of the data processing technique applied.
Oversampling and undersampling techniques prove effective in addressing class imbalance and improving
overall performance. While K-NN stands out as the most reliable choice, the findings contribute to our
understanding of the impact of different data processing strategies in medical image analysis.
The findings of this research have significant practical implications for the healthcare sector,
underscoring the importance of algorithm selection and data processing techniques in enhancing disease
diagnosis and medical image analysis. However, it is important to note that the research findings are
constrained by the use of a specific dataset, which may impact the generalizability of the results to other
medical image datasets. Additionally, the utilization of oversampling and undersampling techniques may not
entirely address the challenges posed by class imbalance. Therefore, it is recommended that future research
 ISSN: 2722-3221
Comput Sci Inf Technol, Vol. 5, No. 3, November 2024: 205-214
212
explores more advanced oversampling and undersampling techniques or incorporates deep learning models
for medical image analysis. Furthermore, expanding the research to encompass a diverse range of medical
image datasets and integrating clinical validation will provide a more comprehensive understanding of
algorithm performance in real-world healthcare settings.
4. CONCLUSION
In concluding this study, we have conducted a comprehensive examination of classification
algorithms on a multi-class medical dataset marked by imbalances, specifically concentrating on lung cancer,
brain tumors, and chest abnormalities. Our findings underscore the pivotal role of algorithm selection in the
realm of medical image analysis, with K-NN consistently emerging as a robust performer, displaying the
highest balanced accuracy and accuracy scores across diverse scenarios. This implies that K-NN may offer a
more equitable trade-off between precision and recall, a crucial consideration in medical diagnostics. The
outcomes of our research significantly contribute to the evolving knowledge landscape in medical image
analysis, emphasizing the imperative of choosing appropriate algorithms for specific classification tasks. The
practical implications are substantial, as the insights gained hold the potential to enhance the accuracy and
reliability of disease diagnosis in the healthcare sector. However, it is imperative to acknowledge the study's
limitations, particularly those associated with dataset-specific findings. We strongly recommend further
research to explore advanced techniques and extend the investigation to encompass a variety of medical
image datasets, ensuring robust and clinically validated results. This research serves as a foundational step
for future endeavors aimed at elevating healthcare quality through the integration of advanced technology
and machine learning.
ACKNOWLEDGEMENTS
We express our profound gratitude to the faculty of computer science at Universitas Muslim Indonesia.
Their guidance, expertise, and steadfast support have been pivotal in bringing this research to fruition.
REFERENCES
[1] P. Bandi et al., “From detection of individual metastases to classification of Lymph node status at the patient level: the
CAMELYON17 challenge,” IEEE Transactions on Medical Imaging, vol. 38, no. 2, pp. 550–560, Feb. 2019, doi:
10.1109/TMI.2018.2867350.
[2] S. P. Pereira et al., “Early detection of pancreatic cancer,” The Lancet Gastroenterology & Hepatology, vol. 5, no. 7, pp. 698–710,
Jul. 2020, doi: 10.1016/S2468-1253(19)30416-9.
[3] A. C. Westphalen et al., “Variability of the positive predictive value of PI-RADS for prostate MRI across 26 centers: experience
of the society of abdominal radiology prostate cancer disease-focused panel,” Radiology, vol. 296, no. 1, pp. 76–84, Jul. 2020,
doi: 10.1148/radiol.2020190646.
[4] A. Vulli, P. N. Srinivasu, M. S. K. Sashank, J. Shafi, J. Choi, and M. F. Ijaz, “Fine-tuned DenseNet-169 for breast cancer
metastasis prediction using FastAI and 1-Cycle policy,” Sensors, vol. 22, no. 8, p. 2988, Apr. 2022, doi: 10.3390/s22082988.
[5] D. Q. Zeebaree, H. Haron, A. M. Abdulazeez, and D. A. Zebari, “Machine learning and region growing for breast cancer
segmentation,” in 2019 International Conference on Advanced Science and Engineering (ICOASE), IEEE, Apr. 2019, pp. 88–93.
doi: 10.1109/ICOASE.2019.8723832.
[6] O. Oren, B. J. Gersh, and D. L. Bhatt, “Artificial intelligence in medical imaging: switching from radiographic pathological data
to clinically meaningful endpoints,” The Lancet Digital Health, vol. 2, no. 9, pp. e486–e488, Sep. 2020, doi: 10.1016/S2589-
7500(20)30160-6.
[7] V. D. P. Jasti et al., “Computational technique based on machine learning and image processing for medical image analysis of
breast cancer diagnosis,” Security and Communication Networks, vol. 2022, pp. 1–7, Mar. 2022, doi: 10.1155/2022/1918379.
[8] J. Jose et al., “An image quality enhancement scheme employing adolescent identity search algorithm in the NSST domain for
multimodal medical image fusion,” Biomedical Signal Processing and Control, vol. 66, p. 102480, Apr. 2021, doi:
10.1016/j.bspc.2021.102480.
[9] C. Tchito Tchapga et al., “Biomedical image classification in a big data architecture using machine learning algorithms,” Journal
of Healthcare Engineering, vol. 2021, pp. 1–11, May 2021, doi: 10.1155/2021/9998819.
[10] O. Razeghi et al., “CemrgApp: an interactive medical imaging application with image processing, computer vision, and machine
learning toolkits for cardiovascular research,” SoftwareX, vol. 12, p. 100570, Jul. 2020, doi: 10.1016/j.softx.2020.100570.
[11] S. M. Beram, H. Pallathadka, I. Patra, and P. Prabhu, “A machine learning based framework for preprocessing and classification
of medical images,” ECS Transactions, vol. 107, no. 1, pp. 7589–7596, Apr. 2022, doi: 10.1149/10701.7589ecst.
[12] G. Battineni, N. Chintalapudi, and F. Amenta, “Machine learning in medicine: performance calculation of dementia prediction by
support vector machines (SVM),” Informatics in Medicine Unlocked, vol. 16, p. 100200, 2019, doi: 10.1016/j.imu.2019.100200.
[13] F. Demir and Y. Akbulut, “A new deep technique using R-CNN model and L1NSR feature selection for brain MRI
classification,” Biomedical Signal Processing and Control, vol. 75, p. 103625, May 2022, doi: 10.1016/j.bspc.2022.103625.
[14] S. Z. Salas-Pilco, K. Xiao, and X. Hu, “Correction: Salas-Pilco et al. Artificial intelligence and learning analytics in teacher
education: a systematic review. Educ. Sci. 2022, 12, 569,” Education Sciences, vol. 13, no. 9, p. 897, Sep. 2023, doi:
10.3390/educsci13090897.
[15] R. N. U. Mahesh and A. Nelleri, “Deep convolutional neural network for binary regression of three-dimensional objects using
information retrieved from digital Fresnel holograms,” Applied Physics B, vol. 128, no. 8, p. 157, Aug. 2022, doi:
Comput Sci Inf Technol ISSN: 2722-3221 
Optimizing classification models for medical image ... (Abdul Rachman Manga)
213
10.1007/s00340-022-07877-w.
[16] H. C. M. Herath, “Performance evaluation of machine learning classifiers for hyperspectral images,” in 2021 IEEE 21st
International Conference on Communication Technology (ICCT), IEEE, Oct. 2021, pp. 1216–1220. doi:
10.1109/ICCT52962.2021.9657977.
[17] B. Kocak et al., “Radiogenomics of lower-grade gliomas: machine learning–based MRI texture analysis for predicting 1p/19q
codeletion status,” European Radiology, vol. 30, no. 2, pp. 877–886, Feb. 2020, doi: 10.1007/s00330-019-06492-2.
[18] K. Shirbandi et al., “Accuracy of deep learning model-assisted amyloid positron emission tomography scan in predicting
Alzheimer’s disease: a systematic review and meta-analysis,” Informatics in Medicine Unlocked, vol. 25, p. 100710, 2021, doi:
10.1016/j.imu.2021.100710.
[19] E. S. Durmaz et al., “Radiomics-based machine learning models in STEMI: a promising tool for the prediction of major adverse
cardiac events,” European Radiology, vol. 33, no. 7, pp. 4611–4620, Jan. 2023, doi: 10.1007/s00330-023-09394-6.
[20] S. P. Morozov et al., “MosMedData: data set of 1110 chest CT scans performed during the COVID-19 epidemic,” Digital
Diagnostics, vol. 1, no. 1, pp. 49–59, Dec. 2020, doi: 10.17816/DD46826.
[21] B. M. de Andrade et al., “Grid Search Optimised Artificial Neural Network for Open Stope Stability Prediction,” Chemical
Reviews, vol. 32, no. 2, pp. 600–617, 2020, doi: 10.1109/CONIT51480.2021.9498361.
[22] M. Berrimi, S. Hamdi, R. Y. Cherif, A. Moussaoui, M. Oussalah, and M. Chabane, “COVID-19 detection from X-ray and CT
scans using transfer learning,” in 2021 International Conference of Women in Data Science at Taif University (WiDSTaif ), IEEE,
Mar. 2021, pp. 1–6. doi: 10.1109/WiDSTaif52235.2021.9430229.
[23] G. Erdogan Erten, S. Bozkurt Keser, and M. Yavuz, “Grid search optimised artificial neural network for open stope stability
prediction,” International Journal of Mining, Reclamation and Environment, vol. 35, no. 8, pp. 600–617, Sep. 2021, doi:
10.1080/17480930.2021.1899404.
[24] I. D. Apostolopoulos and T. A. Mpesiana, “Covid-19: automatic detection from X-ray images utilizing transfer learning with
convolutional neural networks,” Physical and Engineering Sciences in Medicine, vol. 43, no. 2, pp. 635–640, Jun. 2020, doi:
10.1007/s13246-020-00865-4.
[25] O. Ozaltin, O. Coskun, O. Yeniay, and A. Subasi, “Classification of brain hemorrhage computed tomography images using OzNet
hybrid algorithm,” International Journal of Imaging Systems and Technology, vol. 33, no. 1, pp. 69–91, Jan. 2023, doi:
10.1002/ima.22806.
[26] L. K. Singh, Pooja, H. Garg, and M. Khanna, “Histogram of oriented gradients (HOG)-based artificial neural network (ANN)
classifier for Glaucoma detection,” International Journal of Swarm Intelligence Research, vol. 13, no. 1, pp. 1–32, Oct. 2022, doi:
10.4018/IJSIR.309940.
[27] L. Goel and J. Nagpal, “A systematic review of recent machine learning techniques for plant disease identification and
classification,” IETE Technical Review, vol. 40, no. 3, pp. 423–439, May 2023, doi: 10.1080/02564602.2022.2121772.
[28] A. T. Nagi, M. Javed Awan, R. Javed, and N. Ayesha, “A comparison of two-stage classifier algorithm with ensemble techniques
on detection of diabetic Retinopathy,” in 2021 1st International Conference on Artificial Intelligence and Data Analytics
(CAIDA), IEEE, Apr. 2021, pp. 212–215. doi: 10.1109/CAIDA51941.2021.9425129.
[29] X. Li et al., “Heart rate information-based machine learning prediction of emotions among pregnant women,” Frontiers in
Psychiatry, vol. 12, Jan. 2022, doi: 10.3389/fpsyt.2021.799029.
[30] H. Alquran, M. Alsalatie, W. A. Mustafa, R. Al Abdi, and A. R. Ismail, “Cervical Net: a novel cervical cancer classification using
feature fusion,” Bioengineering, vol. 9, no. 10, p. 578, Oct. 2022, doi: 10.3390/bioengineering9100578.
[31] R. C. Poonia et al., “Intelligent diagnostic prediction and classification models for detection of kidney disease,” Healthcare, vol.
10, no. 2, p. 371, Feb. 2022, doi: 10.3390/healthcare10020371.
BIOGRAPHIES OF AUTHORS
Abdul Rachman Manga is an educator who has served as Head Lecturer at the
Faculty of Computer Science, Universitas Muslim Indonesia since 2010. He earned his
Master's degree in Computer Science from Hasanuddin University, Makassar, Indonesia in
2017, and is currently pursuing his Doctorate in Computer Science at State University of
Malang, Malang, Indonesia. His areas of interest include natural language processing (NLP)
and artificial intelligence (AL). He is also actively involved in the editorial board of National
journals. He can be contacted at email: abdulrachman.manga@umi.ac.id.
Aulia Putri Utami is an outstanding alumnus of Universitas Muslim Indonesia,
graduating from the Department of Computer Science with a specialization in Informatics
Engineering in 2024. During college, she showed dedication in the field of computer science
and was active in several researches. His interests are focused on data science, particularly in
personalized learning. His undergraduate thesis demonstrated his proficiency in analyzing
complex data for innovative solutions in personalized learning. She can be contacted at email:
auliaputriutami.iclabs@umi.ac.id.
 ISSN: 2722-3221
Comput Sci Inf Technol, Vol. 5, No. 3, November 2024: 205-214
214
Huzain Azis is an Educator and Researcher in Informatics Engineering, who has
been part of the Faculty of Computer Science, Universitas Muslim Indonesia since 2014. He
earned his Master's degree in Computer Science from Gadjah Mada University and is
currently pursuing his Doctoral Degree at MIIT University of Kuala Lumpur. In his role as a
Lecturer, Ir. Huzain Azis teaches various specialized courses in his field, such as data
structure, data mining, and computer system security. He can be contacted at email:
huzain.azis@umi.ac.id.
Yulita Salim is a Lecturer at the Informatics Engineering Study Program at
Universitas Muslim Indonesia (UMI) Makassar. The field of science pursued in computing
science specifically in the field of data science on personalized learning. Ir. Yulita Salim,
S.Kom., M.T., MTA, completed her Bachelor's Program in Informatics Engineering Study
Program UMI Makassar, Master's Program in Electrical Engineering Study Program in the
specialization of Information Computer and Technology (ICT) UNHAS Makassar. Currently,
Yulita Salim is continuing her Doctoral Program at MIIT-UniKL by taking research in the
field of Recommender System-Personalized Learning. She can be contacted at email:
yulita.salim@umi.ac.id.
Amaliah Faradibah is a computer science lecturer at one of the Private
Universities in Eastern Indonesia with a special interest in the field of data science. She has
completed her Master's Degree at the Sepuluh November Institute of Technology Surabaya.
She has in-depth knowledge of information technology modeling and simulation, data and
database management, and information system development. In addition, he is also very
interested in the social aspects and benefits of information technology in the urban traffic
industry. She can be contacted at email: amaliah.faradibah@umi.ac.id.

More Related Content

Similar to Optimizing classification models for medical image diagnosis: a comparative analysis on multi-class datasets (20)

PDF
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...
mlaij
 
PDF
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...
mlaij
 
PDF
Enhancing stroke prediction using the waikato environment for knowledge analysis
IAESIJAI
 
PDF
A comprehensive study of machine learning for predicting cardiovascular disea...
IJECEIAES
 
PDF
An efficient convolutional neural network-based classifier for an imbalanced ...
IAESIJAI
 
PDF
AN ALGORITHM FOR PREDICTIVE DATA MINING APPROACH IN MEDICAL DIAGNOSIS
AIRCC Publishing Corporation
 
PDF
AN ALGORITHM FOR PREDICTIVE DATA MINING APPROACH IN MEDICAL DIAGNOSIS
ijcsit
 
PDF
A comparative analysis of classification techniques on medical data sets
eSAT Publishing House
 
PDF
Machine learning approach for predicting heart and diabetes diseases using da...
IAESIJAI
 
PDF
Alzheimer S Disease Brain MRI Classification Challenges And Insights
Joe Andelija
 
PDF
Clustering algorithms for analysing electronic medical record: A mapping study
IAESIJAI
 
PDF
Prognosis of Diabetes by Performing Data Mining of HbA1c
IJCSIS Research Publications
 
DOCX
Classification AlgorithmBased Analysis of Breast Cancer Data
IIRindia
 
PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
IRJET Journal
 
PDF
Comparative analysis on bayesian classification for breast cancer problem
journalBEEI
 
PDF
INTEGRATING MACHINE LEARNING IN CLINICAL DECISION SUPPORT SYSTEMS
hiij
 
PDF
INTEGRATING MACHINE LEARNING IN CLINICAL DECISION SUPPORT SYSTEMS
hiij
 
PDF
Development of Computational Tool for Lung Cancer Prediction Using Data Mining
Editor IJCATR
 
PDF
IRJET- Cancer Disease Prediction using Machine Learning over Big Data
IRJET Journal
 
PPTX
Comparative Analysis of Machine Learning Models for Predicting Heart Disease
KONGUVEL ELANGO
 
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...
mlaij
 
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...
mlaij
 
Enhancing stroke prediction using the waikato environment for knowledge analysis
IAESIJAI
 
A comprehensive study of machine learning for predicting cardiovascular disea...
IJECEIAES
 
An efficient convolutional neural network-based classifier for an imbalanced ...
IAESIJAI
 
AN ALGORITHM FOR PREDICTIVE DATA MINING APPROACH IN MEDICAL DIAGNOSIS
AIRCC Publishing Corporation
 
AN ALGORITHM FOR PREDICTIVE DATA MINING APPROACH IN MEDICAL DIAGNOSIS
ijcsit
 
A comparative analysis of classification techniques on medical data sets
eSAT Publishing House
 
Machine learning approach for predicting heart and diabetes diseases using da...
IAESIJAI
 
Alzheimer S Disease Brain MRI Classification Challenges And Insights
Joe Andelija
 
Clustering algorithms for analysing electronic medical record: A mapping study
IAESIJAI
 
Prognosis of Diabetes by Performing Data Mining of HbA1c
IJCSIS Research Publications
 
Classification AlgorithmBased Analysis of Breast Cancer Data
IIRindia
 
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
IRJET Journal
 
Comparative analysis on bayesian classification for breast cancer problem
journalBEEI
 
INTEGRATING MACHINE LEARNING IN CLINICAL DECISION SUPPORT SYSTEMS
hiij
 
INTEGRATING MACHINE LEARNING IN CLINICAL DECISION SUPPORT SYSTEMS
hiij
 
Development of Computational Tool for Lung Cancer Prediction Using Data Mining
Editor IJCATR
 
IRJET- Cancer Disease Prediction using Machine Learning over Big Data
IRJET Journal
 
Comparative Analysis of Machine Learning Models for Predicting Heart Disease
KONGUVEL ELANGO
 

More from CSITiaesprime (20)

PDF
Vector space model, term frequency-inverse document frequency with linear sea...
CSITiaesprime
 
PDF
Electro-capacitive cancer therapy using wearable electric field detector: a r...
CSITiaesprime
 
PDF
Technology adoption model for smart urban farming-a proposed conceptual model
CSITiaesprime
 
PDF
Optimizing development and operations from the project success perspective us...
CSITiaesprime
 
PDF
Unraveling Indonesian heritage through pattern recognition using YOLOv5
CSITiaesprime
 
PDF
Capabilities of cellebrite universal forensics extraction device in mobile de...
CSITiaesprime
 
PDF
Company clustering based on financial report data using k-means
CSITiaesprime
 
PDF
Securing DNS over HTTPS traffic: a real-time analysis tool
CSITiaesprime
 
PDF
Adversarial attacks in signature verification: a deep learning approach
CSITiaesprime
 
PDF
Acoustic echo cancellation system based on Laguerre method and neural network
CSITiaesprime
 
PDF
Clustering man in the middle attack on chain and graph-based blockchain in in...
CSITiaesprime
 
PDF
Smart irrigation system using node microcontroller unit ESP8266 and Ubidots c...
CSITiaesprime
 
PDF
Development of learning videos for natural science subjects in junior high sc...
CSITiaesprime
 
PDF
Clustering of uninhabitable houses using the optimized apriori algorithm
CSITiaesprime
 
PDF
Improving support vector machine and backpropagation performance for diabetes...
CSITiaesprime
 
PDF
Video shot boundary detection based on frames objects comparison and scale-in...
CSITiaesprime
 
PDF
Machine learning-based anomaly detection for smart home networks under advers...
CSITiaesprime
 
PDF
Transfer learning: classifying balanced and imbalanced fungus images using in...
CSITiaesprime
 
PDF
Implementation of automation configuration of enterprise networks as software...
CSITiaesprime
 
PDF
Hybrid model for detection of brain tumor using convolution neural networks
CSITiaesprime
 
Vector space model, term frequency-inverse document frequency with linear sea...
CSITiaesprime
 
Electro-capacitive cancer therapy using wearable electric field detector: a r...
CSITiaesprime
 
Technology adoption model for smart urban farming-a proposed conceptual model
CSITiaesprime
 
Optimizing development and operations from the project success perspective us...
CSITiaesprime
 
Unraveling Indonesian heritage through pattern recognition using YOLOv5
CSITiaesprime
 
Capabilities of cellebrite universal forensics extraction device in mobile de...
CSITiaesprime
 
Company clustering based on financial report data using k-means
CSITiaesprime
 
Securing DNS over HTTPS traffic: a real-time analysis tool
CSITiaesprime
 
Adversarial attacks in signature verification: a deep learning approach
CSITiaesprime
 
Acoustic echo cancellation system based on Laguerre method and neural network
CSITiaesprime
 
Clustering man in the middle attack on chain and graph-based blockchain in in...
CSITiaesprime
 
Smart irrigation system using node microcontroller unit ESP8266 and Ubidots c...
CSITiaesprime
 
Development of learning videos for natural science subjects in junior high sc...
CSITiaesprime
 
Clustering of uninhabitable houses using the optimized apriori algorithm
CSITiaesprime
 
Improving support vector machine and backpropagation performance for diabetes...
CSITiaesprime
 
Video shot boundary detection based on frames objects comparison and scale-in...
CSITiaesprime
 
Machine learning-based anomaly detection for smart home networks under advers...
CSITiaesprime
 
Transfer learning: classifying balanced and imbalanced fungus images using in...
CSITiaesprime
 
Implementation of automation configuration of enterprise networks as software...
CSITiaesprime
 
Hybrid model for detection of brain tumor using convolution neural networks
CSITiaesprime
 
Ad

Recently uploaded (20)

PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PDF
Impact of IEEE Computer Society in Advancing Emerging Technologies including ...
Hironori Washizaki
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PPT
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PDF
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
PDF
CloudStack GPU Integration - Rohit Yadav
ShapeBlue
 
PPTX
Darren Mills The Migration Modernization Balancing Act: Navigating Risks and...
AWS Chicago
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
Smart Air Quality Monitoring with Serrax AQM190 LITE
SERRAX TECHNOLOGIES LLP
 
PDF
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
PDF
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
PDF
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
Why Orbit Edge Tech is a Top Next JS Development Company in 2025
mahendraalaska08
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PDF
Persuasive AI: risks and opportunities in the age of digital debate
Speck&Tech
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
Impact of IEEE Computer Society in Advancing Emerging Technologies including ...
Hironori Washizaki
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
CloudStack GPU Integration - Rohit Yadav
ShapeBlue
 
Darren Mills The Migration Modernization Balancing Act: Navigating Risks and...
AWS Chicago
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
Smart Air Quality Monitoring with Serrax AQM190 LITE
SERRAX TECHNOLOGIES LLP
 
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
Why Orbit Edge Tech is a Top Next JS Development Company in 2025
mahendraalaska08
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
Persuasive AI: risks and opportunities in the age of digital debate
Speck&Tech
 
Ad

Optimizing classification models for medical image diagnosis: a comparative analysis on multi-class datasets

  • 1. Computer Science and Information Technologies Vol. 5, No. 3, November 2024, pp. 205~214 ISSN: 2722-3221, DOI: 10.11591/csit.v5i3.pp205-214  205 Journal homepage: http://iaesprime.com/index.php/csit Optimizing classification models for medical image diagnosis: a comparative analysis on multi-class datasets Abdul Rachman Manga, Aulia Putri Utami, Huzain Azis, Yulita Salim, Amaliah Faradibah Department of Computer Engineering, Faculty of Computer Science, Universitas Muslim Indonesia, Makassar, Indonesia Article Info ABSTRACT Article history: Received Dec 29, 2023 Revised Jul 25, 2024 Accepted Jul 29, 2024 The surge in machine learning (ML) and artificial intelligence has revolutionized medical diagnosis, utilizing data from chest ct-scans, COVID-19, lung cancer, brain tumor, and alzheimer parkinson diseases. However, the intricate nature of medical data necessitates robust classification models. This study compares support vector machine (SVM), naïve Bayes, k-nearest neighbors (K-NN), artificial neural networks (ANN), and stochastic gradient descent on multi-class medical datasets, employing data collection, Canny image segmentation, hu- moment feature extraction, and oversampling/under-sampling for data balancing. Classification algorithms are assessed via 5-fold cross-validation for accuracy, precision, recall, and F-measure. Results indicate variable model performance depending on datasets and sampling strategies. SVM, K-NN, ANN, and SGD demonstrate superior performance on specific datasets, achieving accuracies between 0.49 to 0.57. Conversely, naïve Bayes exhibits limitations, achieving precision levels of 0.46 to 0.47 on certain datasets. The efficacy of oversampling and under-sampling techniques in improving classification accuracy varies inconsistently. These findings aid medical practitioners and researchers in selecting suitable models for diagnostic applications. Keywords: Balancing Machine learning Medical images Multiclass Performance This is an open access article under the CC BY-SA license. Corresponding Author: Aulia Putri Utami Department of Computer Engineering, Faculty of Computer Science, Universitas Muslim Indonesia Jl. Urip Sumohardjo No.km.5, Makassar, Sulawesi Selatan, 90231, Indonesia Email: [email protected] 1. INTRODUCTION In the realm of medical diagnostics and patient care, the significance of accurate and timely disease detection cannot be overstated [1], [2]. One of the pivotal tools in modern medicine is medical imaging, particularly in the context of identifying diseases such as lung cancer, brain tumors, and chest abnormalities [3]–[5]. These life-threatening conditions, affecting millions worldwide, require early diagnosis for effective treatment and improved patient outcomes. Medical imaging not only aids in disease identification but also guides medical practitioners in formulating precise treatment plans [6], [7]. The quality of healthcare provided is significantly influenced by the robustness of the algorithms used in classifying and diagnosing these conditions [8]. It is within this context that this research is conducted. Despite the advances in medical imaging and the availability of diverse datasets, the classification of medical images remains a challenging task [9]. A major challenge arises from the imbalanced distribution of data in multi-class medical datasets. The rare occurrence of certain diseases in comparison to others often leads to skewed class distributions, potentially affecting the performance of classification algorithms. The need to accurately diagnose and classify instances of lung cancer, brain tumors, and chest abnormalities has motivated this study. Furthermore, addressing the issue of class imbalance in medical datasets is crucial to ensure that classification algorithms provide reliable results.
  • 2.  ISSN: 2722-3221 Comput Sci Inf Technol, Vol. 5, No. 3, November 2024: 205-214 206 The primary objective of this research is to conduct a comprehensive performance analysis of classification algorithms on an imbalanced multi-class medical dataset. The study aims to evaluate the suitability and effectiveness of various classification algorithms in diagnosing medical conditions based on medical images. The research endeavors to identify the strengths and weaknesses of these algorithms, with the ultimate goal of enhancing the accuracy and reliability of medical image classification. This research seeks to answer the fundamental question of how different classification algorithms perform when applied to an imbalanced multi-class medical dataset encompassing lung cancer, brain tumors, and chest abnormalities [10], [11]. In addition to this central inquiry, it aims to unravel the strengths and weaknesses of individual algorithms support vector machine (SVM), machine learning (ML) in medicine: Performance calculation of dementia prediction by SVM, k-nearest neighbors (K-NN), artificial neural network (ANN), and stochastic gradient. Descent (SGD) in the context of medical image classification, particularly addressing the challenges posed by imbalanced class distributions [12]–[19]. Furthermore, the research explores the potential of K-fold cross-validation with a value of 5 in mitigating class imbalance effects and enhancing algorithm performance. By addressing these research questions, this study endeavors to offer valuable insights into the performance of classification algorithms on imbalanced multi-class medical datasets, thus improving diagnostic accuracy and healthcare quality. The following details the methodology of this study, including the data collection process, image segmentation techniques, feature extraction methods, and model evaluation metrics. The results will be analyzed for each algorithm, followed by interesting conclusions and future implications. 2. METHOD To provide a systematic and structured approach, this research adopts the methodological framework illustrated in Figure 1. Figure 1 delineates the stages, starting from the collection of medical image data to the classification performance evaluation. Detailed explanations for each stage are presented in the following subsection. Figure 1. Visualization of the research methodology flowchart 2.1. Medical issue data collection The study used five medical image datasets with multiclass categories taken from Kaggle.com, with varying number of classes. The Chest CT-Scan dataset has four classes with a total of 613 data, each of which has an imbalanced data distribution. The COVID-19 data set has three classes with 251 data in total, as well as with a imbalance in the distribution of data. The IQ-OTH/NCC-Lung Cancer dataset features three classes with a total of 1097 data points, similarly characterized by data distribution imbalance. Furthermore, the Brain Tumor Classification (MRI) dataset is composed of four classes, including a total of 2870 data points with an imbalanced
  • 3. Comput Sci Inf Technol ISSN: 2722-3221  Optimizing classification models for medical image ... (Abdul Rachman Manga) 207 data distribution. Finally, the Alzheimer's Parkinson diseases dataset consists of three classes with a total of 6477 data, and an imbalanced distribution of data. In addition, the research applied oversampling and undersampling to balance the data on all datasets [20]–[22]. This research begins with the data exploration stage to understand the characteristics of the image datasets used. Medical issue data collection involves visualization as well as statistical analysis to identify patterns, anomalies, and important information in data sets. General information on the datasets used in this study can be found in Table 1. Table 1. Information datasets Datasets Number of cases Number of attribute Number Atribute characteristics Missing value Name of class Number in each class Chest CT-Scan 613 7 4 195 115 148 155 Numeric No COVID-19 251 7 3 111 70 70 Numeric No IQ- OTH/N CC -Lung Cancer 1097 7 3 120 561 416 Numeric No Brain Tumor Classification (MRI) 2870 7 4 826 822 395 827 Numeric No Alzheimer Parkins on Diseases 6477 7 3 2561 3010 906 Numeric No 2.2. Pre-processing data This research involves several stages of preprocessing, namely, feature segmentation, feature extraction, and data balancing. Early stages in data preprocesing involve image segmentation using the Canny method [23]. This step aims to separate objects from the background on the image, improve data quality, and prepare them for the feature extract stage. The Canny algorithm belongs as a popular method in edge detection on image processing, involving several stages such as smoothing with Gaussian filters, gradient calculation, non-maximum suppression, and the application of thresholds to produce sharper edges [24]. The mathematical formula underlying this method is listed in (1). 𝐸 (𝑥, 𝑦) = √𝐺𝑥 (𝑥, 𝑦)2 + 𝐺𝑦 (𝑥, 𝑦)2 (1) Here, 𝐺𝑥 (𝑥, 𝑦)2 𝑎𝑛𝑑 𝐺𝑦(𝑥, 𝑦)2 espectively are the gradients of the image in the horizontal and vertical directions. The results of image segmentation using the Canny method on medical datasets are shown in Figure 2. Figure 2. Image segmentation results canny medical datasets
  • 4.  ISSN: 2722-3221 Comput Sci Inf Technol, Vol. 5, No. 3, November 2024: 205-214 208 After the segmentation process, the next stage is the extracting of features using the hu-moment method. Hu-moments is one of the methods used for extracting features of shapes or contours of objects in images. This feature has invarian properties to translation, rotation, and scaling, so it is suitable for use in shape recognition applications. The formula for calculating the center moment 𝜇𝑝𝑞 can be seen in (2). ℎ𝑖𝑗 = 𝑀𝑖𝑗 𝑀00 (𝑖+𝑗)/2+1 (2) Where 𝑥𝑐 and 𝑦𝑐 are the mass center of the image, 𝑝 + 𝑞 is the order of the moment, and 𝑓(𝑥, 𝑦) is the pixel value on the coordinate (𝑥, 𝑦). Figure 3 shows a visualization of extracting humoment features using Scatter Plot and Heatmap on each dataset. Figure 3. Plot scatter visualization output extraction feature: hu-moment on each chest ct-scan dataset Resampling, a concept in data science, refers to efforts aimed at maintaining a balance in the distribution among different classes or labels within a dataset. This is particularly crucial in the context of classification or data analysis involving imbalanced classes. You can observe the data resampling visualization for under-sampling and over-sampling in Figure 4. Under-sampling is a technique employed in machine learning to address class imbalance by reducing the number of samples from the majority class. Conversely, over-sampling involves increasing the number of samples in the minority class to achieve a balanced dataset. This balancing process is crucial to prevent the model from exhibiting bias towards the majority class or disregarding the minority class. As depicted in Table 2, implementing these strategies helps to mitigate potential biases and improve the model's overall performance.
  • 5. Comput Sci Inf Technol ISSN: 2722-3221  Optimizing classification models for medical image ... (Abdul Rachman Manga) 209 (a) (b) Figure 4. Data resampling visualization (a) Under-sampling and (b) Oversampling Table 2. Data balancing Balancing data in class Datasets CHEST CT-SCAN COVID-19 IQ-OTH/NCC-lung cancer Brain tumor classification Alzheimer parkinson diseases Oversampling 195 195 195 195 111 111 111 561 561 561 827 827 827 3010 3010 3010 Undersampling 115 115 115 115 70 70 70 120 120 120 395 395 395 906 906 906 2.3. Classification Classification is used to identify specific patterns or characteristics within data that distinguish each class. By leveraging the information contained in the data, the classification function makes decisions regarding the most appropriate class for new objects that have not been classified before. The classification algorithms used in this study include SVM, naïve Bayes, K-NN, ANN, and SGD [25], [26]. SVM is a ML algorithm used for classification and regression tasks. The goal is to construct a hyperplane that has the maximum margin between different classes in the dataset [27]. The margin is the distance between the hyperplanes and the nearest points of each class. SVM can be used for both binary and multi-class classification problems. SVM can also be applied to multi-class classification problems using approaches such as one-versus- rest (OvR) or one-versus-one (OvO). Here is the basic SVM formula for the problem of multiclass classification with the OvR approach can be seen in (3). 𝑦(𝑥) = 𝑎𝑟𝑔𝑚𝑎𝑥𝑖(𝑤𝑖 ∙ 𝑥 + 𝑏𝑖) (3) Where, 𝑦(𝑥) is the predicted class or label for 𝑥 data, 𝑎𝑟𝑔𝑚𝑎𝑥𝑖 is the maximum argument operation, which produces the index 𝑖 that produce the largest value among the calculated elements, 𝑤𝑖 are the weight vectors associated with class 𝑖, 𝑥 are the vectors of the input data that are to be forecast, 𝑏𝑖 is the bias or shift associated to class 𝑖. naïve Bayes is a probabilistic classification algorithm based on the Bayes theorem. This algorithm assumes that the features in the dataset are conditionally independent of the target class [28], [29]. Although these assumptions are very simple and may not always be true, naïve Bayes often provides good performance in many classification tasks, especially in the case of high-dimensional text and data. Naïve Bayes' basic formula for classification can be seen in (4). 𝑃(𝐶|𝑋) = (𝑃(𝑋|𝐶)𝑃(𝐶)) 𝑃(𝑋) (4) Where, 𝑃(𝐶 | 𝑋) is a posterior probability, a class 𝐶 probability occurs on 𝑋 𝑑𝑎𝑡𝑎, 𝑃(𝑋|𝐶) is the probability of the likelihood, that is, the probability of the data 𝑋 occurs if class 𝐶 occurs, 𝑃(𝐶) was a prior probability that class 𝐶 occurred without additional information, and 𝑃(𝑋) was the Probability of data 𝑋 occurring, also called a normalization factor.
  • 6.  ISSN: 2722-3221 Comput Sci Inf Technol, Vol. 5, No. 3, November 2024: 205-214 210 K-NN is a classification algorithm based on the distance between points in a feature space. To classify a sample, this algorithm searches for the nearest sample in the exercise data set and takes the majority of the class from those neighbours as a class prediction. The basic formula of K-NN for classification can be seen in (5). 𝑦(𝑥) = 𝑚𝑜𝑑𝑒 ({𝑦𝑖| 𝑥𝑖 𝑖𝑠 𝑎 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑟 𝑘(𝑥)}) (5) Where, 𝑦(𝑥) is the class prediction created for the input data 𝑥, 𝑦𝑖 is a class of the 𝑖 -neighbor of the 𝑥 input data, 𝑥𝑖 is the data neighbor of 𝑖 of the data input 𝑥, 𝑘(𝑥) is the number of nearest neighbors to be used in the prediction for the 𝑥 entry data, and 𝑚𝑜𝑑𝑒(.) refers to the most frequently appearing value in the assembly. ANN is a computing model inspired by biological neural tissue. It consists of layers of artificial neurons that are interconnected [29]. Each neuron takes input, processes it, and gives its output to the next neuron. ANN can be used for a variety of tasks, including classification. The basic ANN formula for classification can be seen in (6). 𝑦(𝑥) = 𝑓(𝑤 ∙ 𝑥 + 𝑏) (6) Where, 𝑦(𝑥) is the output or prediction generated by the model for input data 𝑥, 𝑓(.) is the activation function, which transforms the input value into a more structured output, 𝑤 is the weight vector that connects input 𝑥 to output 𝑦, and 𝑏 is the bias or shift added to the multiplication result 𝑤 ∙ 𝑥. SGD is an optimization algorithm used to train a machine learning model, including a classification model. This algority seeks a weight that minimizes a loss function through repeated iterations by updating the weight using the gradient of a cost function. The basic SGD formula for the problem of multiclass classification, in particular with the OvR approach, can be seen in (7). 𝑤𝑡+1 = 𝑤𝑡 − 𝜂 ∙ 𝛻𝐽𝑖(𝑤𝑡) (7) Where, 𝑤𝑡+1 is the weight vector that is updated on iteration 𝑡 + 1 , 𝑤𝑡 is the weights vector on the current iterations (iteration 𝑡) 𝜂 is the learning rate, which controls how much learning step is taken in each iterated, and 𝛻𝐽𝑖(𝑤𝑡) is the gradient of the 𝐽𝑖(𝑤) loss function against the w -weight vector in the training system 𝑖. 2.4. Evaluation matrics Evaluating the performance of classification models heavily relies on evaluation metrics that provide a comprehensive perspective. One such metric is Balanced Accuracy, which combines True Positive Rate (accurate positive identification) and True Negative Rate (accurate negative identification), offering a balanced view between both classes [27], [30], [31]. Additionally, Accuracy measures overall predictions, while Precision emphasizes accurate positive identification. Recall, on the other hand, assesses the overall identification of positive cases. Likewise, F-measure, by harmonizing Precision and Recall, provides a holistic perspective. A strong understanding of these metrics is crucial for accurate interpretation and model enhancement. The equations for Balanced Accuracy, Accuracy, Precision, Recall, and F-measure can be found in (8) to (11). 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (𝑇𝑃+𝑇𝑁) (𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁) (8) 𝑃𝑒𝑟𝑖𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑇𝑃 (𝑇𝑃+𝐹𝑃) (9) 𝑅𝑒𝑐𝑎𝑙𝑙 = 𝑇𝑃 (𝑇𝑃+𝐹𝑁) (10) 𝐹 − 𝑚𝑒𝑎𝑠𝑢𝑟𝑒 = 2(𝑝𝑟𝑒𝑠𝑖𝑠𝑖×𝑟𝑒𝑐𝑎𝑙𝑙) (𝑝𝑟𝑒𝑠𝑖𝑠𝑖+𝑟𝑒𝑐𝑎𝑙𝑙) (11) 3. RESULTS AND DISCUSSION The research findings provide a comprehensive performance analysis of various machine learning algorithms on an imbalanced multi-class medical dataset. Three distinct scenarios, each employing a different
  • 7. Comput Sci Inf Technol ISSN: 2722-3221  Optimizing classification models for medical image ... (Abdul Rachman Manga) 211 data processing technique, were considered: no processing (Table 3), oversampling (Table 4), and undersampling (Table 5). Here, we present the results and discuss their implications. Table 3 presents the performance results of ML algorithms on the original dataset before any processing. Notably, K-NN outperforms other algorithms across multiple metrics. It achieves the highest balanced accuracy of 0.53, accuracy of 0.57, precision weighted of 0.66, recall weighted of 0.57, and F1 weighted of 0.56. This suggests that K-NN is well-suited for classifying lung cancer, brain tumors, and chest abnormalities, showcasing its adaptability in a multi-class medical image classification context. On the other hand, algorithms like SVM and naïve Bayes lag behind in performance. This may be attributed to their limited ability to handle imbalanced datasets, resulting in suboptimal classification. Table 3. Performance results before balancing the datasets ∑ Rata-rata SVM Naïve Bayes K-NN ANN SGD Balencend accuracy Accuracy Precision weighted Recall weighted F1 weighted 0.43 0.52 0.43 0.52 0.42 0.44 0.47 0.48 0.47 0.4 0.53 0.57 0.56 0.57 0.56 0.52 0.56 0.56 0.56 0.55 0.52 0.56 0.56 0.56 0.55 Table 4 shows the performance results after applying oversampling to the dataset. K-NN maintains its top position with the highest balanced accuracy (0.65), accuracy (0.65), and F1 weighted score (0.64). Oversampling has significantly improved the performance of all algorithms by addressing the class imbalance issue. K-NN effectively leverages the oversampled data to enhance its classification accuracy. While all algorithms benefit from oversampling, K-NN continues to excel, highlighting its adaptability to changes in dataset characteristics. Table 4. Performance results after oversampling the datasets ∑ Rata-rata SVM Naïve Bayes K-NN ANN SGD Balencend accuracy Accuracy Precision weighted Recall weighted F1 weighted 0.5 0.5 0.45 0.5 0.42 0.46 0.46 0.45 0.46 0.39 0.65 0.65 0.66 0.65 0.64 0.57 0.57 0.57 0.57 0.56 0.57 0.57 0.57 0.57 0.56 Table 5 reveals the performance results after implementing undersampling. K-NN remains at the forefront with an accuracy of 0.55 and an F1 weighted score of 0.54. Notably, other algorithms, including SVM and naïve Bayes, show improvements compared to the original dataset, thanks to undersampling. Despite reducing the training data volume, undersampling enhances the overall performance of these algorithms. However, K-NN retains its superior performance, emphasizing its adaptability to different dataset characteristics. Table 5. Performance results after undersampling the datasets ∑ Rata-rata SVM Naïve Bayes K-NN ANN SGD Balencend accuracy Accuracy Precision weighted Recall weighted F1 weighted 0.49 0.49 0.46 0.49 0.41 0.46 0.46 0.45 0.46 0.39 0.55 0.55 0.55 0.55 0.54 0.55 0.55 0.55 0.55 0.54 0.55 0.55 0.55 0.55 0.54 Overall, these results consistently position K-NN as the top-performing algorithm in various multi- class medical image classification scenarios, regardless of the data processing technique applied. Oversampling and undersampling techniques prove effective in addressing class imbalance and improving overall performance. While K-NN stands out as the most reliable choice, the findings contribute to our understanding of the impact of different data processing strategies in medical image analysis. The findings of this research have significant practical implications for the healthcare sector, underscoring the importance of algorithm selection and data processing techniques in enhancing disease diagnosis and medical image analysis. However, it is important to note that the research findings are constrained by the use of a specific dataset, which may impact the generalizability of the results to other medical image datasets. Additionally, the utilization of oversampling and undersampling techniques may not entirely address the challenges posed by class imbalance. Therefore, it is recommended that future research
  • 8.  ISSN: 2722-3221 Comput Sci Inf Technol, Vol. 5, No. 3, November 2024: 205-214 212 explores more advanced oversampling and undersampling techniques or incorporates deep learning models for medical image analysis. Furthermore, expanding the research to encompass a diverse range of medical image datasets and integrating clinical validation will provide a more comprehensive understanding of algorithm performance in real-world healthcare settings. 4. CONCLUSION In concluding this study, we have conducted a comprehensive examination of classification algorithms on a multi-class medical dataset marked by imbalances, specifically concentrating on lung cancer, brain tumors, and chest abnormalities. Our findings underscore the pivotal role of algorithm selection in the realm of medical image analysis, with K-NN consistently emerging as a robust performer, displaying the highest balanced accuracy and accuracy scores across diverse scenarios. This implies that K-NN may offer a more equitable trade-off between precision and recall, a crucial consideration in medical diagnostics. The outcomes of our research significantly contribute to the evolving knowledge landscape in medical image analysis, emphasizing the imperative of choosing appropriate algorithms for specific classification tasks. The practical implications are substantial, as the insights gained hold the potential to enhance the accuracy and reliability of disease diagnosis in the healthcare sector. However, it is imperative to acknowledge the study's limitations, particularly those associated with dataset-specific findings. We strongly recommend further research to explore advanced techniques and extend the investigation to encompass a variety of medical image datasets, ensuring robust and clinically validated results. This research serves as a foundational step for future endeavors aimed at elevating healthcare quality through the integration of advanced technology and machine learning. ACKNOWLEDGEMENTS We express our profound gratitude to the faculty of computer science at Universitas Muslim Indonesia. Their guidance, expertise, and steadfast support have been pivotal in bringing this research to fruition. REFERENCES [1] P. Bandi et al., “From detection of individual metastases to classification of Lymph node status at the patient level: the CAMELYON17 challenge,” IEEE Transactions on Medical Imaging, vol. 38, no. 2, pp. 550–560, Feb. 2019, doi: 10.1109/TMI.2018.2867350. [2] S. P. Pereira et al., “Early detection of pancreatic cancer,” The Lancet Gastroenterology & Hepatology, vol. 5, no. 7, pp. 698–710, Jul. 2020, doi: 10.1016/S2468-1253(19)30416-9. [3] A. C. Westphalen et al., “Variability of the positive predictive value of PI-RADS for prostate MRI across 26 centers: experience of the society of abdominal radiology prostate cancer disease-focused panel,” Radiology, vol. 296, no. 1, pp. 76–84, Jul. 2020, doi: 10.1148/radiol.2020190646. [4] A. Vulli, P. N. Srinivasu, M. S. K. Sashank, J. Shafi, J. Choi, and M. F. Ijaz, “Fine-tuned DenseNet-169 for breast cancer metastasis prediction using FastAI and 1-Cycle policy,” Sensors, vol. 22, no. 8, p. 2988, Apr. 2022, doi: 10.3390/s22082988. [5] D. Q. Zeebaree, H. Haron, A. M. Abdulazeez, and D. A. Zebari, “Machine learning and region growing for breast cancer segmentation,” in 2019 International Conference on Advanced Science and Engineering (ICOASE), IEEE, Apr. 2019, pp. 88–93. doi: 10.1109/ICOASE.2019.8723832. [6] O. Oren, B. J. Gersh, and D. L. Bhatt, “Artificial intelligence in medical imaging: switching from radiographic pathological data to clinically meaningful endpoints,” The Lancet Digital Health, vol. 2, no. 9, pp. e486–e488, Sep. 2020, doi: 10.1016/S2589- 7500(20)30160-6. [7] V. D. P. Jasti et al., “Computational technique based on machine learning and image processing for medical image analysis of breast cancer diagnosis,” Security and Communication Networks, vol. 2022, pp. 1–7, Mar. 2022, doi: 10.1155/2022/1918379. [8] J. Jose et al., “An image quality enhancement scheme employing adolescent identity search algorithm in the NSST domain for multimodal medical image fusion,” Biomedical Signal Processing and Control, vol. 66, p. 102480, Apr. 2021, doi: 10.1016/j.bspc.2021.102480. [9] C. Tchito Tchapga et al., “Biomedical image classification in a big data architecture using machine learning algorithms,” Journal of Healthcare Engineering, vol. 2021, pp. 1–11, May 2021, doi: 10.1155/2021/9998819. [10] O. Razeghi et al., “CemrgApp: an interactive medical imaging application with image processing, computer vision, and machine learning toolkits for cardiovascular research,” SoftwareX, vol. 12, p. 100570, Jul. 2020, doi: 10.1016/j.softx.2020.100570. [11] S. M. Beram, H. Pallathadka, I. Patra, and P. Prabhu, “A machine learning based framework for preprocessing and classification of medical images,” ECS Transactions, vol. 107, no. 1, pp. 7589–7596, Apr. 2022, doi: 10.1149/10701.7589ecst. [12] G. Battineni, N. Chintalapudi, and F. Amenta, “Machine learning in medicine: performance calculation of dementia prediction by support vector machines (SVM),” Informatics in Medicine Unlocked, vol. 16, p. 100200, 2019, doi: 10.1016/j.imu.2019.100200. [13] F. Demir and Y. Akbulut, “A new deep technique using R-CNN model and L1NSR feature selection for brain MRI classification,” Biomedical Signal Processing and Control, vol. 75, p. 103625, May 2022, doi: 10.1016/j.bspc.2022.103625. [14] S. Z. Salas-Pilco, K. Xiao, and X. Hu, “Correction: Salas-Pilco et al. Artificial intelligence and learning analytics in teacher education: a systematic review. Educ. Sci. 2022, 12, 569,” Education Sciences, vol. 13, no. 9, p. 897, Sep. 2023, doi: 10.3390/educsci13090897. [15] R. N. U. Mahesh and A. Nelleri, “Deep convolutional neural network for binary regression of three-dimensional objects using information retrieved from digital Fresnel holograms,” Applied Physics B, vol. 128, no. 8, p. 157, Aug. 2022, doi:
  • 9. Comput Sci Inf Technol ISSN: 2722-3221  Optimizing classification models for medical image ... (Abdul Rachman Manga) 213 10.1007/s00340-022-07877-w. [16] H. C. M. Herath, “Performance evaluation of machine learning classifiers for hyperspectral images,” in 2021 IEEE 21st International Conference on Communication Technology (ICCT), IEEE, Oct. 2021, pp. 1216–1220. doi: 10.1109/ICCT52962.2021.9657977. [17] B. Kocak et al., “Radiogenomics of lower-grade gliomas: machine learning–based MRI texture analysis for predicting 1p/19q codeletion status,” European Radiology, vol. 30, no. 2, pp. 877–886, Feb. 2020, doi: 10.1007/s00330-019-06492-2. [18] K. Shirbandi et al., “Accuracy of deep learning model-assisted amyloid positron emission tomography scan in predicting Alzheimer’s disease: a systematic review and meta-analysis,” Informatics in Medicine Unlocked, vol. 25, p. 100710, 2021, doi: 10.1016/j.imu.2021.100710. [19] E. S. Durmaz et al., “Radiomics-based machine learning models in STEMI: a promising tool for the prediction of major adverse cardiac events,” European Radiology, vol. 33, no. 7, pp. 4611–4620, Jan. 2023, doi: 10.1007/s00330-023-09394-6. [20] S. P. Morozov et al., “MosMedData: data set of 1110 chest CT scans performed during the COVID-19 epidemic,” Digital Diagnostics, vol. 1, no. 1, pp. 49–59, Dec. 2020, doi: 10.17816/DD46826. [21] B. M. de Andrade et al., “Grid Search Optimised Artificial Neural Network for Open Stope Stability Prediction,” Chemical Reviews, vol. 32, no. 2, pp. 600–617, 2020, doi: 10.1109/CONIT51480.2021.9498361. [22] M. Berrimi, S. Hamdi, R. Y. Cherif, A. Moussaoui, M. Oussalah, and M. Chabane, “COVID-19 detection from X-ray and CT scans using transfer learning,” in 2021 International Conference of Women in Data Science at Taif University (WiDSTaif ), IEEE, Mar. 2021, pp. 1–6. doi: 10.1109/WiDSTaif52235.2021.9430229. [23] G. Erdogan Erten, S. Bozkurt Keser, and M. Yavuz, “Grid search optimised artificial neural network for open stope stability prediction,” International Journal of Mining, Reclamation and Environment, vol. 35, no. 8, pp. 600–617, Sep. 2021, doi: 10.1080/17480930.2021.1899404. [24] I. D. Apostolopoulos and T. A. Mpesiana, “Covid-19: automatic detection from X-ray images utilizing transfer learning with convolutional neural networks,” Physical and Engineering Sciences in Medicine, vol. 43, no. 2, pp. 635–640, Jun. 2020, doi: 10.1007/s13246-020-00865-4. [25] O. Ozaltin, O. Coskun, O. Yeniay, and A. Subasi, “Classification of brain hemorrhage computed tomography images using OzNet hybrid algorithm,” International Journal of Imaging Systems and Technology, vol. 33, no. 1, pp. 69–91, Jan. 2023, doi: 10.1002/ima.22806. [26] L. K. Singh, Pooja, H. Garg, and M. Khanna, “Histogram of oriented gradients (HOG)-based artificial neural network (ANN) classifier for Glaucoma detection,” International Journal of Swarm Intelligence Research, vol. 13, no. 1, pp. 1–32, Oct. 2022, doi: 10.4018/IJSIR.309940. [27] L. Goel and J. Nagpal, “A systematic review of recent machine learning techniques for plant disease identification and classification,” IETE Technical Review, vol. 40, no. 3, pp. 423–439, May 2023, doi: 10.1080/02564602.2022.2121772. [28] A. T. Nagi, M. Javed Awan, R. Javed, and N. Ayesha, “A comparison of two-stage classifier algorithm with ensemble techniques on detection of diabetic Retinopathy,” in 2021 1st International Conference on Artificial Intelligence and Data Analytics (CAIDA), IEEE, Apr. 2021, pp. 212–215. doi: 10.1109/CAIDA51941.2021.9425129. [29] X. Li et al., “Heart rate information-based machine learning prediction of emotions among pregnant women,” Frontiers in Psychiatry, vol. 12, Jan. 2022, doi: 10.3389/fpsyt.2021.799029. [30] H. Alquran, M. Alsalatie, W. A. Mustafa, R. Al Abdi, and A. R. Ismail, “Cervical Net: a novel cervical cancer classification using feature fusion,” Bioengineering, vol. 9, no. 10, p. 578, Oct. 2022, doi: 10.3390/bioengineering9100578. [31] R. C. Poonia et al., “Intelligent diagnostic prediction and classification models for detection of kidney disease,” Healthcare, vol. 10, no. 2, p. 371, Feb. 2022, doi: 10.3390/healthcare10020371. BIOGRAPHIES OF AUTHORS Abdul Rachman Manga is an educator who has served as Head Lecturer at the Faculty of Computer Science, Universitas Muslim Indonesia since 2010. He earned his Master's degree in Computer Science from Hasanuddin University, Makassar, Indonesia in 2017, and is currently pursuing his Doctorate in Computer Science at State University of Malang, Malang, Indonesia. His areas of interest include natural language processing (NLP) and artificial intelligence (AL). He is also actively involved in the editorial board of National journals. He can be contacted at email: [email protected]. Aulia Putri Utami is an outstanding alumnus of Universitas Muslim Indonesia, graduating from the Department of Computer Science with a specialization in Informatics Engineering in 2024. During college, she showed dedication in the field of computer science and was active in several researches. His interests are focused on data science, particularly in personalized learning. His undergraduate thesis demonstrated his proficiency in analyzing complex data for innovative solutions in personalized learning. She can be contacted at email: [email protected].
  • 10.  ISSN: 2722-3221 Comput Sci Inf Technol, Vol. 5, No. 3, November 2024: 205-214 214 Huzain Azis is an Educator and Researcher in Informatics Engineering, who has been part of the Faculty of Computer Science, Universitas Muslim Indonesia since 2014. He earned his Master's degree in Computer Science from Gadjah Mada University and is currently pursuing his Doctoral Degree at MIIT University of Kuala Lumpur. In his role as a Lecturer, Ir. Huzain Azis teaches various specialized courses in his field, such as data structure, data mining, and computer system security. He can be contacted at email: [email protected]. Yulita Salim is a Lecturer at the Informatics Engineering Study Program at Universitas Muslim Indonesia (UMI) Makassar. The field of science pursued in computing science specifically in the field of data science on personalized learning. Ir. Yulita Salim, S.Kom., M.T., MTA, completed her Bachelor's Program in Informatics Engineering Study Program UMI Makassar, Master's Program in Electrical Engineering Study Program in the specialization of Information Computer and Technology (ICT) UNHAS Makassar. Currently, Yulita Salim is continuing her Doctoral Program at MIIT-UniKL by taking research in the field of Recommender System-Personalized Learning. She can be contacted at email: [email protected]. Amaliah Faradibah is a computer science lecturer at one of the Private Universities in Eastern Indonesia with a special interest in the field of data science. She has completed her Master's Degree at the Sepuluh November Institute of Technology Surabaya. She has in-depth knowledge of information technology modeling and simulation, data and database management, and information system development. In addition, he is also very interested in the social aspects and benefits of information technology in the urban traffic industry. She can be contacted at email: [email protected].