SlideShare a Scribd company logo
Advanced Computing: An International Journal (ACIJ), Vol.16, No.3, May 2025
DOI:10.5121/acij.2025.16301 1
APPLICATION AND ANALYSIS OF ENSEMBLE
ALGORITHMS IN SOLVING REGRESSION PROBLEMS
Khojiakbar Abdulkhakimov, Nodir Rakhimov, Dilmurod Khasanov
and Oybek Primqulov
Department of Software of Information Technologies, Tashkent University of
Information Technologies named after Muhammad al-Khwarizmi, Tashkent, Uzbekistan
ABSTRACT
Accurate crop yield prediction is critical for enhancing food security, particularly in agrarian economies
prone to soil degradation and climatic uncertainties. This study explores the application of Support Vector
Regression (SVR) for forecasting wheat yields in Uzbekistan, utilizing soil fertility indicators as key
predictive features. Unlike conventional linear regression models, SVR effectively captures complex non-
linear interactions between soil physicochemical properties and crop productivity, thereby offering
improved adaptability to real-world agricultural conditions. The dataset comprises essential soil
attributes, including nitrogen (N), phosphorus (P), potassium (K), pH, organic carbon (OC), electrical
conductivity (EC), and micro-nutrient concentrations. Data preprocessing involved feature
standardization, K-nearest neighbor (KNN) imputation for handling missing values, and correlation
analysis to select the most influential variables. The dataset was partitioned using an 80/20 stratified split,
and the SVR model with a radial basis function (RBF) kernel was optimized through 5-fold cross-
validation and exhaustive grid search for hyperparameter tuning. The optimized SVR model achieved a
coefficient of determination (𝑅2) of 0.87 and demonstrated a low root mean square error (RMSE),
outperforming baseline regression methods. Model interpretability was enhanced using SHapley Additive
exPlanations (SHAP), which identified soil pH, organic carbon, and available phosphorus as the most
significant predictors of wheat yield—findings consistent with established agronomic principles. Overall,
the results confirm SVR’s potential as a robust, scalable, and interpretable tool for precision agriculture,
offering practical insights for site-specific yield forecasting and promoting sustainable land management
practices in Uzbekistan.
KEYWORDS
Support Vector Regression (SVR), Wheat Yield Prediction, Precision Agriculture, Machine Learning, Non-
linear Models, Uzbekistan, Model Evaluation.
1. INTRODUCTION
Globally, over 40% of the Earth's land surface is already classified as degraded, posing a
profound threat to agricultural productivity and global food security. Soil fertility—the inherent
capacity of soil to supply essential nutrients and sustain healthy plant growth—is a critical
determinant of sustainable agricultural systems. Soil fertility is shaped by a range of
physicochemical and biological factors, including the availability of key macronutrients (notably
nitrogen, phosphorus, and potassium), soil pH, organic matter content, microbial biomass, and
soil structural stability.
However, modern agricultural practices have exacerbated stress on these natural systems.
Activities such as monocropping, intensive tillage, and the over-application of chemical
fertilizers—which alone accounted for approximately 195 million metric tons in 2021—have
Advanced Computing: An International Journal (ACIJ), Vol.16, No.3, May 2025
2
accelerated nutrient depletion, soil acidification, and environmental degradation at a global scale.
Accurately and promptly assessing soil fertility is fundamental for a variety of agronomic
decision-making processes, including site-specific fertilizer application, crop selection, and pre-
season yield forecasting. Empirical evidence from major cereal-producing regions attributes over
40% of yield variability directly to variations in soil nutrient status, exceeding the impacts of
cultivar choice or pest management interventions. Despite its critical importance, traditional soil
fertility assessment methods—relying primarily on manual sampling and laboratory-based
chemical analysis—remain time-consuming, costly, and spatially restricted. These limitations are
particularly pronounced in smallholder farming systems and resource-constrained environments,
where access to laboratory infrastructure is limited. In response, artificial intelligence (AI) and
machine learning (ML) technologies have gained attention as scalable alternatives. ML
algorithms trained on enriched datasets—combining in-situ laboratory measurements, remote
sensing observations, and geo-referenced soil surveys—have demonstrated remarkable potential
in modeling soil fertility. However, a majority of these studies focus on classification approaches,
discretizing fertility into categorical classes. In contrast, this study investigates regression-based
machine learning methods for continuous soil fertility prediction, leveraging open-access
agricultural datasets. Our objective is to evaluate whether regression models trained on publicly
available, heterogeneous datasets can deliver sufficiently accurate predictions to inform data-
driven land management strategies, particularly in smallholder and low-resource settings. By
addressing this challenge, the study aims to lower technical barriers and expand the accessibility
of precision agriculture technologies across the Global South.
2. LITERATURE REVIEW
In recent years, machine learning (ML) has emerged as a transformative tool in soil science,
enabling advanced modeling of soil nutrient concentrations, pH levels, organic matter content,
and other key soil properties. Regression-based ML models, in particular, have demonstrated
strong capabilities in predicting continuous variables such as nitrogen (N), phosphorus (P), and
potassium (K) concentrations, offering valuable insights into soil fertility dynamics. Despite their
technical success, the direct practical utility of these raw numerical predictions for agronomists
and farmers remains somewhat limited. Interpreting continuous outputs often requires domain-
specific expertise, and numerical estimates alone do not always provide actionable guidance for
decisions related to fertilization strategies, soil amendments, or crop selection. To bridge this gap,
initiatives like the Decision Support for Sustainable Land Management (DS-SLM) framework—
developed under the guidance of the FAO—promote simplifying complex soil metrics into more
user-friendly and practical categories, such as fertility levels labeled as low, medium, or high.
These simplified outputs enhance accessibility for smallholder advisory systems, where technical
resources and agronomic expertise may be scarce. However, only a small portion of research has
methodically employed advanced machine learning methods—such as Support Vector
Regression (SVR), gradient-boosted models like XGBoost, and other non-linear approaches—for
predicting soil fertility based on publicly available datasets. Many earlier efforts relied
predominantly on linear models or basic decision trees, which often struggled to capture the non-
linear relationships inherent in complex soil processes.
Moreover, few studies have integrated explainability frameworks such as SHapley Additive
exPlanations (SHAP) to validate that model outputs align with established agronomic knowledge.
Ensuring model transparency is particularly critical for fostering trust and adoption among
stakeholders in precision agriculture.
This study addresses these research gaps by evaluating the effectiveness of regression-based ML
models for soil fertility prediction using a publicly available, heterogeneous soil dataset. In
addition to assessing predictive accuracy, we employ SHAP analysis to enhance interpretability
Advanced Computing: An International Journal (ACIJ), Vol.16, No.3, May 2025
3
and explore the extent to which modeled fertility indices correlate with observed yield outcomes.
Ultimately, this work seeks to contribute to the development of scalable, interpretable decision-
support tools for sustainable land management in data-scarce environments.
3. METHODOLGY
This study employs a structured, multi-stage methodology to predict soil fertility indices as
continuous variables using supervised machine learning regression techniques. The approach
integrates open-access agronomic data, rigorous preprocessing, feature engineering, model
tuning, and interpretability analysis, aiming to validate the practical feasibility of ML-based
models in real-world agricultural advisory systems.
Dataset Description
The dataset utilized in this study was obtained from the "Soil Classification" repository available
on Kaggle, consisting of 33,497 samples collected from diverse agro-ecological regions. Each
sample is described by 22 physico-chemical soil attributes, encompassing both macronutrients,
micronutrients, and environmental indicators. The key features include:
1. Macronutrients: Nitrogen (N), Phosphorus (P), Potassium (K)
2. Micronutrients: Sulfur (S), Zinc (Zn), Iron (Fe), Copper (Cu), Manganese (Mn), Boron (B)
3. Environmental indicators: pH, Electrical Conductivity (EC), Organic Carbon (OC), Cation
Exchange Capacity (CEC)
Each instance was associated with a continuous fertility output variable, reflecting a composite
soil fertility score derived based on agronomic thresholds.
Data quality was high, with missing values accounting for less than 0.5% of the total records.
Missing entries were imputed using K-Nearest Neighbors (KNN) imputation, which preserves
local similarity structures and prevents distortion of feature distributions — a significant
advantage over conventional mean or median imputation methods in structured tabular datasets.
Feature Engineering and Correlation Analysis
All numerical variables were standardized to zero mean and unit variance to ensure that features
with different natural scales (e.g., pH vs. micronutrient concentrations) contributed proportionally
during model training. This standardization is especially critical for kernel-based regressors like
Support Vector Regression (SVR), where feature scaling impacts distance calculations in high-
dimensional space.
Advanced Computing: An International Journal (ACIJ), Vol.16, No.3, May 2025
4
Figure 1. Pearson Correlation Matrix of Soil Attributes and Wheat Yield.
To investigate inter-feature dependencies and mitigate multicollinearity, a Pearson correlation
matrix was constructed and visualized using a heatmap (Figure 1). The analysis revealed several
expected relationships — such as a negative correlation between soil pH and phosphorus
availability, and strong positive correlations between electrical conductivity (EC) and salt-
associated elements.
These insights supported two critical objectives:
a. Enhancing dimensionality awareness and model interpretability
b. Validating the agronomic consistency and reliability of the dataset
While some features exhibited high collinearity, they were retained due to their independent
predictive potential, with caution applied during feature attribution and SHAP-based
interpretability assessments.
An analysis of the target variable distribution indicated moderate skewness, with approximately:
a. 50% of the samples representing medium fertility levels
b. 25% representing low fertility levels
c. 25% representing high fertility levels
To address potential bias during model training, the Synthetic Minority Over-sampling
Technique (SMOTE) was applied exclusively to the training set. SMOTE generates synthetic
samples for minority groups by interpolating between existing observations, improving model
robustness without duplicating records or introducing noise.
Finally, the dataset was partitioned into training and testing subsets using an 80/20 stratified split
to maintain proportional representation of fertility levels across both sets.
Advanced Computing: An International Journal (ACIJ), Vol.16, No.3, May 2025
5
Figure 2. Model Training and Validation Accuracy and Loss over Epochs
Figure 3. Distribution of Weights and Biases in the First Dense Layer
Model Training and Evaluation
During the training phase, the machine learning models were assessed by tracking their predictive
accuracy and convergence patterns over several epochs. As illustrated in Figure 2, the left section
displays how training and validation accuracy changed throughout the epochs, whereas the right
section shows the associated loss curves over time.
Initially, both accuracy and loss values exhibited rapid improvements, with training and
validation curves closely aligned. After approximately 40 epochs, accuracy stabilized around
93% for training data and 91% for validation data, while the loss values continued to decline
gradually. The small gap between the training and validation curves suggests minimal overfitting,
indicating that the model generalizes well to unseen data. Furthermore, Figure 3 illustrates the
distribution of learned weight and bias values in the first dense layer of the network. The weight
histogram shows an approximately Gaussian distribution centered around zero, reflecting
balanced weight initialization and healthy convergence. In contrast, the bias distribution reveals
both positive and negative shifts, enabling the model to compensate for systematic input–output
imbalances.
This bar chart presents the bias values of neurons in the final dense layer (dense_3) responsible
for predicting wheat yields based on soil attributes. The presence of both positive and negative
bias values highlights the model's internal adjustments for systematic variations, such as region-
Advanced Computing: An International Journal (ACIJ), Vol.16, No.3, May 2025
6
specific anomalies, soil type heterogeneity, or non-linear nutrient responses. For example,
positive biases may amplify sensitivity to favorable conditions (e.g., high nitrogen or optimal
pH), while negative biases may counterbalance overly optimistic predictions in poor fertility
contexts. These internal compensations are crucial for achieving robust generalization across
diverse agro-ecological zones.
Overall, the observed weight and bias patterns confirm that the model has effectively learned
realistic agronomic relationships, enhancing both predictive performance and biological
interpretability.
Model Development and Hyperparameter Tuning
The model development pipeline followed a structured multi-model evaluation strategy, aiming
to predict soil fertility scores and crop yield outcomes using supervised machine learning
regression techniques. Four well-established models were selected for comparative analysis, each
offering distinct algorithmic advantages:
I. Random Forest (RF): A robust ensemble method based on bootstrap aggregation of
decision trees, renowned for its capability to model feature interactions and mitigate
overfitting through averaging. Given its strong baseline performance and interpretability,
𝑅2 and RMSE were used as primary evaluation metrics.
II. Support Vector Regression (SVR) with an RBF Kernel: SVR is particularly suited for
capturing non-linear relationships between soil attributes and yield outcomes. Due to
SVR’s sensitivity to feature scaling and margin violations, both RMSE and MAE were
selected as core evaluation metrics to balance absolute and squared error considerations.
III. Gradient Boosting Machine (GBM): A sequential ensemble that constructs trees
iteratively to minimize residual errors. GBM often provides high predictive accuracy but
requires careful regularization to avoid overfitting. Model evaluation focused on $R^2$,
RMSE, and hyperparameter sensitivity.
IV. XGBoost: An optimized gradient boosting framework featuring column subsampling,
regularization (L1 and L2), and early stopping, which enhance generalization
performance, particularly on structured, tabular data. Performance was primarily assessed
through 𝑅2 and RMSE, given the focus on continuous output prediction.
Each model underwent grid-based hyperparameter tuning, leveraging 5-fold stratified cross-
validation to ensure balanced distribution of fertility levels across folds. This stratification
preserved representativeness and avoided overfitting to minority regions of the input space.
Through this multi-model comparative approach, the study aimed not only to identify the best-
performing regressor based on predictive accuracy and generalization capability but also to
evaluate trade-offs in interpretability, computational efficiency, and robustness to hyperparameter
variations. Evaluation criteria were carefully tailored to each algorithm's characteristics, ensuring
that final model selection was both technically rigorous and practically aligned with the study’s
agronomic objectives.
4. RESULT AND DISCUSSION
Following the core evaluation of model accuracy and class-level performance metrics (as shown
in Table 1), we further analyzed the role of hyperparameters in shaping model behavior and
outcomes. The hyperparameter optimization process was carried out using grid and random
search strategies, depending on the algorithm. For neural and gradient-based methods, we
additionally employed the Hyperband tuner, which balances exploration and exploitation by
adjusting the number of epochs and trial configurations across a bracketed search space. As
Advanced Computing: An International Journal (ACIJ), Vol.16, No.3, May 2025
7
depicted in Figure 6, the most influential hyperparameters were the number of training epochs
and the bracket configuration, both of which achieved optimal values of 2 in the final model.
Interestingly, the learning rate had minimal impact in this setup, which may be attributed to the
stabilization effect of early stopping and dynamic learning adjustment mechanisms during
training. These findings underscore that not all hyperparameters contribute equally to model
performance, and that automated tuning frameworks like Hyperband can efficiently identify high-
impact parameters in complex search spaces. Understanding these relationships is vital for
reproducibility and for adapting models to new datasets in practical agricultural deployments.
Figure 4. Visualization of Model Evaluation Metrics
Figure 5. Wheat yields result model performance comparison
As illustrated in Figure 5, the Gradient Boosting and Random Forest models demonstrated the
strongest predictive performance among the tested classifiers, achieving the highest 𝑅2
scores of
0.877 and 0.873, respectively. Both models also exhibited relatively low MAE and RMSE
values—Random Forest with a MAE of 0.111 and RMSE of 0.248, and Gradient Boosting with a
MAE of 0.127 and RMSE of 0.244. These results highlight the robust generalization capabilities
of tree-based ensemble methods, particularly when applied to moderately noisy and slightly
imbalanced agricultural datasets.
Interestingly, although XGBoost attained comparable performance with a 𝑅2of 0.861 and an
MAE of 0.113, it slightly underperformed compared to Gradient Boosting and Random Forest in
this specific context. The SVR (Support Vector Regression) model lagged behind, producing the
lowest 𝑅2
(0.826) and the highest MAE (0.218), which may be attributed to its sensitivity to
feature scaling and inability to automatically capture complex, non-linear feature interactions as
effectively as tree-based approaches.
Advanced Computing: An International Journal (ACIJ), Vol.16, No.3, May 2025
8
To further enhance interpretability, SHAP (SHapley Additive exPlanations) values were
computed for the Gradient Boosting model, revealing that soil pH, organic carbon content,
available phosphorus, and electrical conductivity (EC) were the primary determinants of soil
fertility class. These variables are well-recognized in soil science as critical factors influencing
nutrient availability, microbial diversity, and overall soil health, thus lending biological
credibility to the model’s outputs.
Additionally, the classification framework employed in this study discretizes continuous soil
indicators into actionable fertility categories—low, medium, and high—which mirrors the
decision-support structure utilized in operational platforms such as the FAO’s Decision Support
for Sustainable Land Management (DS-SLM) system. This design significantly enhances the
real-world applicability and interpretability of the models, providing practical decision-making
insights at the field level.
Overall, the findings from this research reaffirm the feasibility of applying machine learning
classifiers to open-access agricultural datasets for soil fertility assessment. They also establish a
replicable methodology for developing lightweight, scalable advisory tools that can support
smallholder farmers, especially in resource-constrained regions where rapid, site-specific
diagnostics are critical for improving yields and promoting sustainable soil management.
5. CONCLUSIONS
This study demonstrates that machine learning classifiers, particularly ensemble-based
approaches, are highly effective in predicting soil fertility classes using open-access, multi-
feature agricultural datasets. Among the evaluated models, Gradient Boosting and Random Forest
consistently outperformed SVR and XGBoost, achieving superior predictive accuracy with
$R^2$ values exceeding 0.87 and maintaining low MAE and RMSE scores. Their strong
performance highlights the advantages of ensemble learning techniques in handling moderately
noisy and complex agronomic data without extensive preprocessing. The feature importance
analysis, based on SHAP (SHapley Additive exPlanations) values, revealed that soil pH, organic
carbon content, available phosphorus, and electrical conductivity (EC) are the most influential
predictors of fertility class. These findings are well aligned with established agronomic
knowledge, confirming that the models not only offer accurate predictions but also maintain
biological plausibility, thereby enhancing their suitability for real-world decision-support
applications. From a practical perspective, the proposed machine learning framework offers
significant potential to: Integrate low-cost, data-driven soil classification tools into smallholder
advisory systems, Strengthen digital agriculture platforms through AI-based diagnostic
functionalities, Facilitate evidence-based land management in resource-limited and data-scarce
environments. Looking forward, future research will focus on expanding the input datasets to
capture spatial and temporal variability in soil properties, integrating remote sensing-derived
indices (e.g., NDVI, EVI) to enrich feature representations, and embedding the trained models
into interactive, GIS-based advisory platforms. Moreover, the development of hybrid models that
combine machine learning approaches with mechanistic, process-based soil models could further
enhance the robustness and transferability of the predictions across diverse agro-ecological zones.
REFERENCES
[1] N. Raximov, O. Primqulov, B. Daminova. "Basic concepts and stages of research development on
artificial intelligence." 2021 International Conference on Information Science and Communications.
2021.
[2] D. Khasanov, M. Tojiyev, O. Primqulov. "Gradient descent in machine learning." 2021
International Conference on Information Science and Communications. 2021.
Advanced Computing: An International Journal (ACIJ), Vol.16, No.3, May 2025
9
[3] K. Dilmurod, M. Tojiyev, O. Primqulov. "Gradient Descent In Machine Learning." 2015.
[4] N. Raximov, M. Doshchanova, O. Primqulov, O. Qurbonov. "Development of architecture of
intellectual information system supporting decision-making for health of sportsmen." 2022
International Congress on Human- Computer Interaction, Optimization and Information Systems.
2022.
[5] M. Tojiyev, O. Primqulov, D. Xasanov. "Image segmentation in OpenCV and Python." Scienceweb
academic papers collection. 2021.
[6] О.Д.У Примкулов, МР Тожиев, ДРУ Хасанов. "Компьютерное зрение как средство
извлечения информации из видеоряда." Academic Research in Educational Sciences, 2021.
[7] O. Primqulov. "The pursuit of quantum supremacy: challenges and implications." Innovative
Development and Scientific Activity Journal. 2023.
[8] Н. Рахимов, Б. Эсановна, О. Примкулов. "Ахборот тизимларида мантиқий хулосалаш
самарадорлигини ошириш ёндошуви." International Scientific and Practical Conference on
Algorithms and Current Issues. 2023.
[9] Polovko A.M., Gurov S.V. "Основы теории надежности." БХВ-Петербург Publishers, 2006.
[10] G.P. Zakharov, G.P. Zakharenko. "Детерминированная модель оценки живучести и уязвимости
сетей." АН СССР Publishers, Техническая кибернетика, No. 2, 1989.
[11] Yu.Yu. Gromov. "Надежность информационных систем." ГОУ ВПО ТГТУ Publishers, 2010.
[12] V.F. Guzik, A.P. Samoylenko. "Принципы проектирования интегральной модели оценки
надежности информационно-вычислительных систем." ЮФУ. Технические науки Publishers,
2008.
[13] N.V. Vasilenko, V.A. Makarov. "Модели оценки надежности программного обеспечения."
Вестник Новгородского государственного университета Publishers, No. 2, 2004.
[14] E.G. Chekal, A.A. Chichev. "Надежность информационных систем." УлГУ Publishers, 2012.
[15] O. Primqulov. "PARALLELISM AND SUPERPOSITION: REASONS FOR THE SUPERIORITY
OF QUBIT OVER CLASSICAL BIT." DTAI–2024, 2024.
[16] N.Raximov, J.Kuvandikov, D.Khasanov, “The importance of loss function in artificial intelligence”,
International Conference on Information Science and Communications Technologies (ICISCT
20222), DOI: 10.1109/ICISCT55600.2022.10146883.
[17] Rahimov Nodir, Khasanov Dilmurod. (2022). The Mathematical Essence Of Logistic Regression
For Machine Learning. https://doi.org/10.5281/zenodo.7239169
AUTHOR
Nodir Rakhimov was born on September 16, 1982, in the Samarqand region of the
Republic of Uzbekistan. He holds a D.Sc. in Technical Sciences. Currently, he serves
as the Head of Department of Software of Information Technologies, Tashkent
University of Information Technologies named after Muhammad al-Khwarizmi.

More Related Content

Similar to APPLICATION AND ANALYSIS OF ENSEMBLE ALGORITHMS IN SOLVING REGRESSION PROBLEMS (20)

PPTX
ICDATE PPT (2).pptx
ssuser356d4d
 
PDF
Intelligent Chemical Fertilizer Recommendation System for Rice Fields
IIJSRJournal
 
PPTX
BE-IT-Group 17-1.pptx
ShivamPrasad41
 
PDF
Soil Profile Based Crop Prediction System
IRJET Journal
 
PDF
An Efficient and Novel Crop Yield Prediction Method using Machine Learning Al...
IIJSRJournal
 
PPTX
soil analysis and classification of micro nutrients
ballamounika95
 
PPTX
Crop recommendation system powerpoint.pptx
yusufkantai130
 
PPTX
Crop cultivation the environmental prediction
DIVYADHARSHINISDIVYA
 
PDF
Crop Yield Prediction using Machine Learning
IRJET Journal
 
PDF
RECOMMENDATION OF CROP AND PESTICIDES USING MACHINE LEARNING
IRJET Journal
 
PDF
An Overview of Crop Yield Prediction using Machine Learning Approach
IRJET Journal
 
PDF
Crop Recommendation System Using Machine Learning
IRJET Journal
 
PDF
Farmer's Analytical assistant
IJSRED
 
PDF
Predicting yield of crop type and water requirement for a given plot of land...
International Journal of Reconfigurable and Embedded Systems
 
PDF
Random Forest Application for Crop Yield Prediction
sipij
 
PDF
RANDOM FOREST APPLICATION FOR CROP YIELD PREDICTION
sipij
 
PDF
IRJET- Crop Prediction System using Machine Learning Algorithms
IRJET Journal
 
PDF
Cultivation of Crops using Machine Learning and Deep Learning
YogeshIJTSRD
 
PDF
IRJET- Agricultural Productivity System
IRJET Journal
 
PDF
Famer assistant and crop recommendation system
IRJET Journal
 
ICDATE PPT (2).pptx
ssuser356d4d
 
Intelligent Chemical Fertilizer Recommendation System for Rice Fields
IIJSRJournal
 
BE-IT-Group 17-1.pptx
ShivamPrasad41
 
Soil Profile Based Crop Prediction System
IRJET Journal
 
An Efficient and Novel Crop Yield Prediction Method using Machine Learning Al...
IIJSRJournal
 
soil analysis and classification of micro nutrients
ballamounika95
 
Crop recommendation system powerpoint.pptx
yusufkantai130
 
Crop cultivation the environmental prediction
DIVYADHARSHINISDIVYA
 
Crop Yield Prediction using Machine Learning
IRJET Journal
 
RECOMMENDATION OF CROP AND PESTICIDES USING MACHINE LEARNING
IRJET Journal
 
An Overview of Crop Yield Prediction using Machine Learning Approach
IRJET Journal
 
Crop Recommendation System Using Machine Learning
IRJET Journal
 
Farmer's Analytical assistant
IJSRED
 
Predicting yield of crop type and water requirement for a given plot of land...
International Journal of Reconfigurable and Embedded Systems
 
Random Forest Application for Crop Yield Prediction
sipij
 
RANDOM FOREST APPLICATION FOR CROP YIELD PREDICTION
sipij
 
IRJET- Crop Prediction System using Machine Learning Algorithms
IRJET Journal
 
Cultivation of Crops using Machine Learning and Deep Learning
YogeshIJTSRD
 
IRJET- Agricultural Productivity System
IRJET Journal
 
Famer assistant and crop recommendation system
IRJET Journal
 

More from acijjournal (20)

PDF
15th International Conference on Computer Science, Engineering and Applicatio...
acijjournal
 
PDF
4th International Conference on Computer Science and Information Technology (...
acijjournal
 
PDF
4th International Conference on Computer Science and Information Technology (...
acijjournal
 
PDF
Application and Analysis of Ensemble Algorithms in Solving Regression Problems
acijjournal
 
PDF
17th International Conference on Networks & Communications (NeTCoM 2025)
acijjournal
 
PDF
METHODS AND ALGORITHMS FOR ASSESSING COMPUTER NETWORK PERFORMANCE
acijjournal
 
PDF
Advanced Computing: An International Journal (ACIJ)
acijjournal
 
PDF
6 th International Conference on Data Mining and Software Engineering (DMSE 2...
acijjournal
 
PDF
ARTICLE :OVERVIEW OF STRUCTURE FROM MOTION
acijjournal
 
PDF
14th International Conference on Advanced Information Technologies and Applic...
acijjournal
 
PDF
2nd International Conference on Information Technology Convergence Services &...
acijjournal
 
PDF
Advanced Computing: An International Journal ( ACIJ )
acijjournal
 
PDF
3rd International Conference on Computer Science, Engineering and Artificia...
acijjournal
 
PDF
6th International Conference on Big Data and Machine Learning (BDML 2025)
acijjournal
 
PDF
METHODS AND ALGORITHMS FOR ASSESSING COMPUTER NETWORK PERFORMANCE
acijjournal
 
PDF
4th International Conference on Computing and Information Technology Trends (...
acijjournal
 
PDF
METHODS AND ALGORITHMS FOR ASSESSING COMPUTER NETWORK PERFORMANCE
acijjournal
 
PDF
MODELING OF THE AUTOMATED SYSTEM OF TRAINING SPECIALISTS FOR THE JUDICIAL SYSTEM
acijjournal
 
PDF
5th International Conference on Data Science and Machine Learning (DSML 2024)
acijjournal
 
PDF
MODEL AND ALGORITHM FOR INCREASING THE EFFICIENCY OF REMOTE SERVICE SYSTEMS ...
acijjournal
 
15th International Conference on Computer Science, Engineering and Applicatio...
acijjournal
 
4th International Conference on Computer Science and Information Technology (...
acijjournal
 
4th International Conference on Computer Science and Information Technology (...
acijjournal
 
Application and Analysis of Ensemble Algorithms in Solving Regression Problems
acijjournal
 
17th International Conference on Networks & Communications (NeTCoM 2025)
acijjournal
 
METHODS AND ALGORITHMS FOR ASSESSING COMPUTER NETWORK PERFORMANCE
acijjournal
 
Advanced Computing: An International Journal (ACIJ)
acijjournal
 
6 th International Conference on Data Mining and Software Engineering (DMSE 2...
acijjournal
 
ARTICLE :OVERVIEW OF STRUCTURE FROM MOTION
acijjournal
 
14th International Conference on Advanced Information Technologies and Applic...
acijjournal
 
2nd International Conference on Information Technology Convergence Services &...
acijjournal
 
Advanced Computing: An International Journal ( ACIJ )
acijjournal
 
3rd International Conference on Computer Science, Engineering and Artificia...
acijjournal
 
6th International Conference on Big Data and Machine Learning (BDML 2025)
acijjournal
 
METHODS AND ALGORITHMS FOR ASSESSING COMPUTER NETWORK PERFORMANCE
acijjournal
 
4th International Conference on Computing and Information Technology Trends (...
acijjournal
 
METHODS AND ALGORITHMS FOR ASSESSING COMPUTER NETWORK PERFORMANCE
acijjournal
 
MODELING OF THE AUTOMATED SYSTEM OF TRAINING SPECIALISTS FOR THE JUDICIAL SYSTEM
acijjournal
 
5th International Conference on Data Science and Machine Learning (DSML 2024)
acijjournal
 
MODEL AND ALGORITHM FOR INCREASING THE EFFICIENCY OF REMOTE SERVICE SYSTEMS ...
acijjournal
 
Ad

Recently uploaded (20)

PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
PDF
July Patch Tuesday
Ivanti
 
PPTX
MSP360 Backup Scheduling and Retention Best Practices.pptx
MSP360
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PPTX
Q2 Leading a Tableau User Group - Onboarding
lward7
 
PPT
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
Predicting the unpredictable: re-engineering recommendation algorithms for fr...
Speck&Tech
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
July Patch Tuesday
Ivanti
 
MSP360 Backup Scheduling and Retention Best Practices.pptx
MSP360
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Q2 Leading a Tableau User Group - Onboarding
lward7
 
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
Predicting the unpredictable: re-engineering recommendation algorithms for fr...
Speck&Tech
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
Ad

APPLICATION AND ANALYSIS OF ENSEMBLE ALGORITHMS IN SOLVING REGRESSION PROBLEMS

  • 1. Advanced Computing: An International Journal (ACIJ), Vol.16, No.3, May 2025 DOI:10.5121/acij.2025.16301 1 APPLICATION AND ANALYSIS OF ENSEMBLE ALGORITHMS IN SOLVING REGRESSION PROBLEMS Khojiakbar Abdulkhakimov, Nodir Rakhimov, Dilmurod Khasanov and Oybek Primqulov Department of Software of Information Technologies, Tashkent University of Information Technologies named after Muhammad al-Khwarizmi, Tashkent, Uzbekistan ABSTRACT Accurate crop yield prediction is critical for enhancing food security, particularly in agrarian economies prone to soil degradation and climatic uncertainties. This study explores the application of Support Vector Regression (SVR) for forecasting wheat yields in Uzbekistan, utilizing soil fertility indicators as key predictive features. Unlike conventional linear regression models, SVR effectively captures complex non- linear interactions between soil physicochemical properties and crop productivity, thereby offering improved adaptability to real-world agricultural conditions. The dataset comprises essential soil attributes, including nitrogen (N), phosphorus (P), potassium (K), pH, organic carbon (OC), electrical conductivity (EC), and micro-nutrient concentrations. Data preprocessing involved feature standardization, K-nearest neighbor (KNN) imputation for handling missing values, and correlation analysis to select the most influential variables. The dataset was partitioned using an 80/20 stratified split, and the SVR model with a radial basis function (RBF) kernel was optimized through 5-fold cross- validation and exhaustive grid search for hyperparameter tuning. The optimized SVR model achieved a coefficient of determination (𝑅2) of 0.87 and demonstrated a low root mean square error (RMSE), outperforming baseline regression methods. Model interpretability was enhanced using SHapley Additive exPlanations (SHAP), which identified soil pH, organic carbon, and available phosphorus as the most significant predictors of wheat yield—findings consistent with established agronomic principles. Overall, the results confirm SVR’s potential as a robust, scalable, and interpretable tool for precision agriculture, offering practical insights for site-specific yield forecasting and promoting sustainable land management practices in Uzbekistan. KEYWORDS Support Vector Regression (SVR), Wheat Yield Prediction, Precision Agriculture, Machine Learning, Non- linear Models, Uzbekistan, Model Evaluation. 1. INTRODUCTION Globally, over 40% of the Earth's land surface is already classified as degraded, posing a profound threat to agricultural productivity and global food security. Soil fertility—the inherent capacity of soil to supply essential nutrients and sustain healthy plant growth—is a critical determinant of sustainable agricultural systems. Soil fertility is shaped by a range of physicochemical and biological factors, including the availability of key macronutrients (notably nitrogen, phosphorus, and potassium), soil pH, organic matter content, microbial biomass, and soil structural stability. However, modern agricultural practices have exacerbated stress on these natural systems. Activities such as monocropping, intensive tillage, and the over-application of chemical fertilizers—which alone accounted for approximately 195 million metric tons in 2021—have
  • 2. Advanced Computing: An International Journal (ACIJ), Vol.16, No.3, May 2025 2 accelerated nutrient depletion, soil acidification, and environmental degradation at a global scale. Accurately and promptly assessing soil fertility is fundamental for a variety of agronomic decision-making processes, including site-specific fertilizer application, crop selection, and pre- season yield forecasting. Empirical evidence from major cereal-producing regions attributes over 40% of yield variability directly to variations in soil nutrient status, exceeding the impacts of cultivar choice or pest management interventions. Despite its critical importance, traditional soil fertility assessment methods—relying primarily on manual sampling and laboratory-based chemical analysis—remain time-consuming, costly, and spatially restricted. These limitations are particularly pronounced in smallholder farming systems and resource-constrained environments, where access to laboratory infrastructure is limited. In response, artificial intelligence (AI) and machine learning (ML) technologies have gained attention as scalable alternatives. ML algorithms trained on enriched datasets—combining in-situ laboratory measurements, remote sensing observations, and geo-referenced soil surveys—have demonstrated remarkable potential in modeling soil fertility. However, a majority of these studies focus on classification approaches, discretizing fertility into categorical classes. In contrast, this study investigates regression-based machine learning methods for continuous soil fertility prediction, leveraging open-access agricultural datasets. Our objective is to evaluate whether regression models trained on publicly available, heterogeneous datasets can deliver sufficiently accurate predictions to inform data- driven land management strategies, particularly in smallholder and low-resource settings. By addressing this challenge, the study aims to lower technical barriers and expand the accessibility of precision agriculture technologies across the Global South. 2. LITERATURE REVIEW In recent years, machine learning (ML) has emerged as a transformative tool in soil science, enabling advanced modeling of soil nutrient concentrations, pH levels, organic matter content, and other key soil properties. Regression-based ML models, in particular, have demonstrated strong capabilities in predicting continuous variables such as nitrogen (N), phosphorus (P), and potassium (K) concentrations, offering valuable insights into soil fertility dynamics. Despite their technical success, the direct practical utility of these raw numerical predictions for agronomists and farmers remains somewhat limited. Interpreting continuous outputs often requires domain- specific expertise, and numerical estimates alone do not always provide actionable guidance for decisions related to fertilization strategies, soil amendments, or crop selection. To bridge this gap, initiatives like the Decision Support for Sustainable Land Management (DS-SLM) framework— developed under the guidance of the FAO—promote simplifying complex soil metrics into more user-friendly and practical categories, such as fertility levels labeled as low, medium, or high. These simplified outputs enhance accessibility for smallholder advisory systems, where technical resources and agronomic expertise may be scarce. However, only a small portion of research has methodically employed advanced machine learning methods—such as Support Vector Regression (SVR), gradient-boosted models like XGBoost, and other non-linear approaches—for predicting soil fertility based on publicly available datasets. Many earlier efforts relied predominantly on linear models or basic decision trees, which often struggled to capture the non- linear relationships inherent in complex soil processes. Moreover, few studies have integrated explainability frameworks such as SHapley Additive exPlanations (SHAP) to validate that model outputs align with established agronomic knowledge. Ensuring model transparency is particularly critical for fostering trust and adoption among stakeholders in precision agriculture. This study addresses these research gaps by evaluating the effectiveness of regression-based ML models for soil fertility prediction using a publicly available, heterogeneous soil dataset. In addition to assessing predictive accuracy, we employ SHAP analysis to enhance interpretability
  • 3. Advanced Computing: An International Journal (ACIJ), Vol.16, No.3, May 2025 3 and explore the extent to which modeled fertility indices correlate with observed yield outcomes. Ultimately, this work seeks to contribute to the development of scalable, interpretable decision- support tools for sustainable land management in data-scarce environments. 3. METHODOLGY This study employs a structured, multi-stage methodology to predict soil fertility indices as continuous variables using supervised machine learning regression techniques. The approach integrates open-access agronomic data, rigorous preprocessing, feature engineering, model tuning, and interpretability analysis, aiming to validate the practical feasibility of ML-based models in real-world agricultural advisory systems. Dataset Description The dataset utilized in this study was obtained from the "Soil Classification" repository available on Kaggle, consisting of 33,497 samples collected from diverse agro-ecological regions. Each sample is described by 22 physico-chemical soil attributes, encompassing both macronutrients, micronutrients, and environmental indicators. The key features include: 1. Macronutrients: Nitrogen (N), Phosphorus (P), Potassium (K) 2. Micronutrients: Sulfur (S), Zinc (Zn), Iron (Fe), Copper (Cu), Manganese (Mn), Boron (B) 3. Environmental indicators: pH, Electrical Conductivity (EC), Organic Carbon (OC), Cation Exchange Capacity (CEC) Each instance was associated with a continuous fertility output variable, reflecting a composite soil fertility score derived based on agronomic thresholds. Data quality was high, with missing values accounting for less than 0.5% of the total records. Missing entries were imputed using K-Nearest Neighbors (KNN) imputation, which preserves local similarity structures and prevents distortion of feature distributions — a significant advantage over conventional mean or median imputation methods in structured tabular datasets. Feature Engineering and Correlation Analysis All numerical variables were standardized to zero mean and unit variance to ensure that features with different natural scales (e.g., pH vs. micronutrient concentrations) contributed proportionally during model training. This standardization is especially critical for kernel-based regressors like Support Vector Regression (SVR), where feature scaling impacts distance calculations in high- dimensional space.
  • 4. Advanced Computing: An International Journal (ACIJ), Vol.16, No.3, May 2025 4 Figure 1. Pearson Correlation Matrix of Soil Attributes and Wheat Yield. To investigate inter-feature dependencies and mitigate multicollinearity, a Pearson correlation matrix was constructed and visualized using a heatmap (Figure 1). The analysis revealed several expected relationships — such as a negative correlation between soil pH and phosphorus availability, and strong positive correlations between electrical conductivity (EC) and salt- associated elements. These insights supported two critical objectives: a. Enhancing dimensionality awareness and model interpretability b. Validating the agronomic consistency and reliability of the dataset While some features exhibited high collinearity, they were retained due to their independent predictive potential, with caution applied during feature attribution and SHAP-based interpretability assessments. An analysis of the target variable distribution indicated moderate skewness, with approximately: a. 50% of the samples representing medium fertility levels b. 25% representing low fertility levels c. 25% representing high fertility levels To address potential bias during model training, the Synthetic Minority Over-sampling Technique (SMOTE) was applied exclusively to the training set. SMOTE generates synthetic samples for minority groups by interpolating between existing observations, improving model robustness without duplicating records or introducing noise. Finally, the dataset was partitioned into training and testing subsets using an 80/20 stratified split to maintain proportional representation of fertility levels across both sets.
  • 5. Advanced Computing: An International Journal (ACIJ), Vol.16, No.3, May 2025 5 Figure 2. Model Training and Validation Accuracy and Loss over Epochs Figure 3. Distribution of Weights and Biases in the First Dense Layer Model Training and Evaluation During the training phase, the machine learning models were assessed by tracking their predictive accuracy and convergence patterns over several epochs. As illustrated in Figure 2, the left section displays how training and validation accuracy changed throughout the epochs, whereas the right section shows the associated loss curves over time. Initially, both accuracy and loss values exhibited rapid improvements, with training and validation curves closely aligned. After approximately 40 epochs, accuracy stabilized around 93% for training data and 91% for validation data, while the loss values continued to decline gradually. The small gap between the training and validation curves suggests minimal overfitting, indicating that the model generalizes well to unseen data. Furthermore, Figure 3 illustrates the distribution of learned weight and bias values in the first dense layer of the network. The weight histogram shows an approximately Gaussian distribution centered around zero, reflecting balanced weight initialization and healthy convergence. In contrast, the bias distribution reveals both positive and negative shifts, enabling the model to compensate for systematic input–output imbalances. This bar chart presents the bias values of neurons in the final dense layer (dense_3) responsible for predicting wheat yields based on soil attributes. The presence of both positive and negative bias values highlights the model's internal adjustments for systematic variations, such as region-
  • 6. Advanced Computing: An International Journal (ACIJ), Vol.16, No.3, May 2025 6 specific anomalies, soil type heterogeneity, or non-linear nutrient responses. For example, positive biases may amplify sensitivity to favorable conditions (e.g., high nitrogen or optimal pH), while negative biases may counterbalance overly optimistic predictions in poor fertility contexts. These internal compensations are crucial for achieving robust generalization across diverse agro-ecological zones. Overall, the observed weight and bias patterns confirm that the model has effectively learned realistic agronomic relationships, enhancing both predictive performance and biological interpretability. Model Development and Hyperparameter Tuning The model development pipeline followed a structured multi-model evaluation strategy, aiming to predict soil fertility scores and crop yield outcomes using supervised machine learning regression techniques. Four well-established models were selected for comparative analysis, each offering distinct algorithmic advantages: I. Random Forest (RF): A robust ensemble method based on bootstrap aggregation of decision trees, renowned for its capability to model feature interactions and mitigate overfitting through averaging. Given its strong baseline performance and interpretability, 𝑅2 and RMSE were used as primary evaluation metrics. II. Support Vector Regression (SVR) with an RBF Kernel: SVR is particularly suited for capturing non-linear relationships between soil attributes and yield outcomes. Due to SVR’s sensitivity to feature scaling and margin violations, both RMSE and MAE were selected as core evaluation metrics to balance absolute and squared error considerations. III. Gradient Boosting Machine (GBM): A sequential ensemble that constructs trees iteratively to minimize residual errors. GBM often provides high predictive accuracy but requires careful regularization to avoid overfitting. Model evaluation focused on $R^2$, RMSE, and hyperparameter sensitivity. IV. XGBoost: An optimized gradient boosting framework featuring column subsampling, regularization (L1 and L2), and early stopping, which enhance generalization performance, particularly on structured, tabular data. Performance was primarily assessed through 𝑅2 and RMSE, given the focus on continuous output prediction. Each model underwent grid-based hyperparameter tuning, leveraging 5-fold stratified cross- validation to ensure balanced distribution of fertility levels across folds. This stratification preserved representativeness and avoided overfitting to minority regions of the input space. Through this multi-model comparative approach, the study aimed not only to identify the best- performing regressor based on predictive accuracy and generalization capability but also to evaluate trade-offs in interpretability, computational efficiency, and robustness to hyperparameter variations. Evaluation criteria were carefully tailored to each algorithm's characteristics, ensuring that final model selection was both technically rigorous and practically aligned with the study’s agronomic objectives. 4. RESULT AND DISCUSSION Following the core evaluation of model accuracy and class-level performance metrics (as shown in Table 1), we further analyzed the role of hyperparameters in shaping model behavior and outcomes. The hyperparameter optimization process was carried out using grid and random search strategies, depending on the algorithm. For neural and gradient-based methods, we additionally employed the Hyperband tuner, which balances exploration and exploitation by adjusting the number of epochs and trial configurations across a bracketed search space. As
  • 7. Advanced Computing: An International Journal (ACIJ), Vol.16, No.3, May 2025 7 depicted in Figure 6, the most influential hyperparameters were the number of training epochs and the bracket configuration, both of which achieved optimal values of 2 in the final model. Interestingly, the learning rate had minimal impact in this setup, which may be attributed to the stabilization effect of early stopping and dynamic learning adjustment mechanisms during training. These findings underscore that not all hyperparameters contribute equally to model performance, and that automated tuning frameworks like Hyperband can efficiently identify high- impact parameters in complex search spaces. Understanding these relationships is vital for reproducibility and for adapting models to new datasets in practical agricultural deployments. Figure 4. Visualization of Model Evaluation Metrics Figure 5. Wheat yields result model performance comparison As illustrated in Figure 5, the Gradient Boosting and Random Forest models demonstrated the strongest predictive performance among the tested classifiers, achieving the highest 𝑅2 scores of 0.877 and 0.873, respectively. Both models also exhibited relatively low MAE and RMSE values—Random Forest with a MAE of 0.111 and RMSE of 0.248, and Gradient Boosting with a MAE of 0.127 and RMSE of 0.244. These results highlight the robust generalization capabilities of tree-based ensemble methods, particularly when applied to moderately noisy and slightly imbalanced agricultural datasets. Interestingly, although XGBoost attained comparable performance with a 𝑅2of 0.861 and an MAE of 0.113, it slightly underperformed compared to Gradient Boosting and Random Forest in this specific context. The SVR (Support Vector Regression) model lagged behind, producing the lowest 𝑅2 (0.826) and the highest MAE (0.218), which may be attributed to its sensitivity to feature scaling and inability to automatically capture complex, non-linear feature interactions as effectively as tree-based approaches.
  • 8. Advanced Computing: An International Journal (ACIJ), Vol.16, No.3, May 2025 8 To further enhance interpretability, SHAP (SHapley Additive exPlanations) values were computed for the Gradient Boosting model, revealing that soil pH, organic carbon content, available phosphorus, and electrical conductivity (EC) were the primary determinants of soil fertility class. These variables are well-recognized in soil science as critical factors influencing nutrient availability, microbial diversity, and overall soil health, thus lending biological credibility to the model’s outputs. Additionally, the classification framework employed in this study discretizes continuous soil indicators into actionable fertility categories—low, medium, and high—which mirrors the decision-support structure utilized in operational platforms such as the FAO’s Decision Support for Sustainable Land Management (DS-SLM) system. This design significantly enhances the real-world applicability and interpretability of the models, providing practical decision-making insights at the field level. Overall, the findings from this research reaffirm the feasibility of applying machine learning classifiers to open-access agricultural datasets for soil fertility assessment. They also establish a replicable methodology for developing lightweight, scalable advisory tools that can support smallholder farmers, especially in resource-constrained regions where rapid, site-specific diagnostics are critical for improving yields and promoting sustainable soil management. 5. CONCLUSIONS This study demonstrates that machine learning classifiers, particularly ensemble-based approaches, are highly effective in predicting soil fertility classes using open-access, multi- feature agricultural datasets. Among the evaluated models, Gradient Boosting and Random Forest consistently outperformed SVR and XGBoost, achieving superior predictive accuracy with $R^2$ values exceeding 0.87 and maintaining low MAE and RMSE scores. Their strong performance highlights the advantages of ensemble learning techniques in handling moderately noisy and complex agronomic data without extensive preprocessing. The feature importance analysis, based on SHAP (SHapley Additive exPlanations) values, revealed that soil pH, organic carbon content, available phosphorus, and electrical conductivity (EC) are the most influential predictors of fertility class. These findings are well aligned with established agronomic knowledge, confirming that the models not only offer accurate predictions but also maintain biological plausibility, thereby enhancing their suitability for real-world decision-support applications. From a practical perspective, the proposed machine learning framework offers significant potential to: Integrate low-cost, data-driven soil classification tools into smallholder advisory systems, Strengthen digital agriculture platforms through AI-based diagnostic functionalities, Facilitate evidence-based land management in resource-limited and data-scarce environments. Looking forward, future research will focus on expanding the input datasets to capture spatial and temporal variability in soil properties, integrating remote sensing-derived indices (e.g., NDVI, EVI) to enrich feature representations, and embedding the trained models into interactive, GIS-based advisory platforms. Moreover, the development of hybrid models that combine machine learning approaches with mechanistic, process-based soil models could further enhance the robustness and transferability of the predictions across diverse agro-ecological zones. REFERENCES [1] N. Raximov, O. Primqulov, B. Daminova. "Basic concepts and stages of research development on artificial intelligence." 2021 International Conference on Information Science and Communications. 2021. [2] D. Khasanov, M. Tojiyev, O. Primqulov. "Gradient descent in machine learning." 2021 International Conference on Information Science and Communications. 2021.
  • 9. Advanced Computing: An International Journal (ACIJ), Vol.16, No.3, May 2025 9 [3] K. Dilmurod, M. Tojiyev, O. Primqulov. "Gradient Descent In Machine Learning." 2015. [4] N. Raximov, M. Doshchanova, O. Primqulov, O. Qurbonov. "Development of architecture of intellectual information system supporting decision-making for health of sportsmen." 2022 International Congress on Human- Computer Interaction, Optimization and Information Systems. 2022. [5] M. Tojiyev, O. Primqulov, D. Xasanov. "Image segmentation in OpenCV and Python." Scienceweb academic papers collection. 2021. [6] О.Д.У Примкулов, МР Тожиев, ДРУ Хасанов. "Компьютерное зрение как средство извлечения информации из видеоряда." Academic Research in Educational Sciences, 2021. [7] O. Primqulov. "The pursuit of quantum supremacy: challenges and implications." Innovative Development and Scientific Activity Journal. 2023. [8] Н. Рахимов, Б. Эсановна, О. Примкулов. "Ахборот тизимларида мантиқий хулосалаш самарадорлигини ошириш ёндошуви." International Scientific and Practical Conference on Algorithms and Current Issues. 2023. [9] Polovko A.M., Gurov S.V. "Основы теории надежности." БХВ-Петербург Publishers, 2006. [10] G.P. Zakharov, G.P. Zakharenko. "Детерминированная модель оценки живучести и уязвимости сетей." АН СССР Publishers, Техническая кибернетика, No. 2, 1989. [11] Yu.Yu. Gromov. "Надежность информационных систем." ГОУ ВПО ТГТУ Publishers, 2010. [12] V.F. Guzik, A.P. Samoylenko. "Принципы проектирования интегральной модели оценки надежности информационно-вычислительных систем." ЮФУ. Технические науки Publishers, 2008. [13] N.V. Vasilenko, V.A. Makarov. "Модели оценки надежности программного обеспечения." Вестник Новгородского государственного университета Publishers, No. 2, 2004. [14] E.G. Chekal, A.A. Chichev. "Надежность информационных систем." УлГУ Publishers, 2012. [15] O. Primqulov. "PARALLELISM AND SUPERPOSITION: REASONS FOR THE SUPERIORITY OF QUBIT OVER CLASSICAL BIT." DTAI–2024, 2024. [16] N.Raximov, J.Kuvandikov, D.Khasanov, “The importance of loss function in artificial intelligence”, International Conference on Information Science and Communications Technologies (ICISCT 20222), DOI: 10.1109/ICISCT55600.2022.10146883. [17] Rahimov Nodir, Khasanov Dilmurod. (2022). The Mathematical Essence Of Logistic Regression For Machine Learning. https://doi.org/10.5281/zenodo.7239169 AUTHOR Nodir Rakhimov was born on September 16, 1982, in the Samarqand region of the Republic of Uzbekistan. He holds a D.Sc. in Technical Sciences. Currently, he serves as the Head of Department of Software of Information Technologies, Tashkent University of Information Technologies named after Muhammad al-Khwarizmi.