PLANT DISEASE DETECTS BASED ON
MACHINE LEARNING ALOGRITHMS
Koteswararao Yenni Research Scholar,
Department of Computer Science & Technology Dravidian Kiran Kumar. V Professor,
University, of Computer Science & Technology Dravidian University,
Kuppam -517426, AP, India Kuppam -517426, AP, India
E- Mail E-mail :
[email protected] :
[email protected] Abstract: Rapid advances in agricultural technology
measures such methods of detection have a number
created opportunities for machine learning (ML)
applications to detect plant diseases. The proposed
of major challenges.
framework implements an efficient scalable approach
Manual inspection of plant disease by farmers and
to plant disease detection by using machine learning
experts is expensive and takes a long time as well
algorithms which focus on image-based analysis.
Through a diverse leaf image plant dataset conduct as provides imprecise results due to human error
evaluations of supervised learning systems that and varied experience. The large extent of the plant
combine Support Vector Machines (SVMs), Random quantity does not allow independent inspection in
Forests and Convolutional Neural Networks (CNNs). large scale agriculture. The major limitation
A new system was developed to detect strong features indicate that there is a crucial requirement of an
and precisely diagnose the plant disease among automated system that can flexibly increase its
different plant varieties using a CNN detection system range and is able to give out precise results of the
with several convolutional layers. Contrast
disease diagnosis.
adjustment reductions and data augmentation
preprocessing techniques were used to boost system Current advancement in machine learning reveals
performance, because they assisted in models
that most of the challenges facing agriculture can
generalizing on unobserved data. We show our study
be solved to a very high degree. The advancement
that the CNN model gives out excellent 96.8%
accuracy superior to the conventional algorithms in of the convolution neural networks in image
all performance metrics such as precision, recall and processing makes it possible to implement
F1-score. It is a system for an early accurate and automated feature extraction that results in high
economically feasible disease detection that would level classification. Here, automated systems are
support precision farming by keeping the crops safe superior in analyzing disease indicators which are
and the food supply steady. Field level disease hard to be detected by human since they do not fall
surveillance is implemented with an investigation of within the realm of normal human observation.
real-time deployment strategies that use IoT and edge
computing technologies. Results from this articulated Hybridization of machine learning with advanced
framework are shown to have great potential for picture repositories creates powerful ways to
converting agricultural operations to sustainable diagnose plant diseases at the preliminary level.
farming practices.
The proposed machine learning models make it
Keywords: Plant Disease Detection, Machine easier and efficient in categorizing large sets of
Learning, Convolutional Neural Networks, Precision plant images due to the comparison between the
Agriculture, IoT, Edge Computing, Image time taken by the models and traditional techniques
Classification and the ability to classify them correctly. Farmers
I. INTRODUCTION who apply the IoT-smart devices get real-time
information on the field that enables them to assess
It is important to note that Agriculture plays both the situation and decide on the actions to take
food security and economic power at national level concerning their crops.
where a given community’s income depends on
farming. Diseases are still significant challenges for The current research produces a comprehensive
plant health due to the rapid growth of diseases plant disease recognition system which employs
under favourable conditions that result in huge advanced machine learning algorithms to identify
losses to farmers and numerous adverse effects to and classify diseases through leaf image analysis.
the environment. However, while quick detection The study investigates multiple image-based
processes combined with quick intervention apply disease detection systems starting with CNNs and
the historical strength of plant disease control enhancing understanding with evaluations of
Support Vector Machines (SVMs) and Random
Forests performance. Through this updated of overfitting. The practical side of disease
methodology human intervention will be reduced detection employs device and drone combinations
while delivering essential support to both precision to obtain real-time field measurements according to
farming and sustainable agriculture methodologies. Ramcharan et al. [8] alongside Li et al.'s [9] system
This study examines essential aspects of machine which uses edge-based disease processing to
learning model scalability for agricultural use and decrease latency.
analyzes preprocessing methods to gauge their The access to extensive large datasets accelerated
effect on efficiency which leads to exploring ways multiple advancements in plant disease research.
to deploy models in agricultural contexts with PlantVillage dataset from Hughes and Salathé [10]
limited resources. The research results create serves as the reference standard for model training
important connections between scientific and assessment regarding plant disease detection.
development and practical utility to provide The research published by Behmann et al. [11]
farmers with resources for improving crop health demonstrated how hyperspectral imaging combined
and increasing agricultural production. with early disease detection represents possible
trends in non-visible spectral analysis.
II. RELATED WORKS
Grad-CAM by Selvaraju et al. [12] developed a
Machine learning methods generate substantial
visualization technique for CNN predictions that
interest for plant disease detection which leads and
focuses on relevant areas thereby establishing
transforms modern agricultural practices. The study
trustworthiness in automated systems. The research
by Mohanty et al. [1] proves that CNNs can detect
by Pantazi et al. [13-16] demonstrates how to
multiple plant diseases via analysis of leaf images.
merge these technologies into precision agriculture
The method attained fantastic accuracy
systems so they can optimize resources through
performance which proved deep learning
automated decision protocols.
methodologies surpass conventional evaluation
techniques. Camargo with Smith recognized Research conducted in this field creates substantial
handcrafted features of both texture and color to knowledge but this work focuses on resolving
direct the training of traditional machine learning scalability and real-time delivery and adaptability
models including SVMs and Random Forests. The issues.
examined solutions effectively handled small
datasets but showed inadequate performance with
large and varied data collections.
3. METHODOLOGY
From Zhang et al. [3] came a lightweight CNN
The methodology involves several key stages: data
architecture built to run efficiently on mobile
collection, preprocessing, model design, training,
systems. The developed system highlighted the
evaluation, and deployment shown in the Figure 1.
importance of designing efficient computational
solutions for real-world implementation. Brahimi et 3.1 Data Collection
al. [4] implemented transfer learning methods
through their use of pre-trained models termed The research data originates from plant leaf photo
AlexNet and ResNet while benefiting from reduced repositories where the PlantVillage dataset serves
training duration and strong accuracy performance. as the main collection source. The widely used
dataset acts as a benchmark for plant disease
An additional main research direction focuses on detection research through its 54,000 images which
resolving issues related to multiclass classification cover 14 plant species and 38 different diseases.
tasks. A deep learning pathway established by The study expanded disease identification
Sladojevic et al. [5] allowed simultaneous detection robustness through integration of data sources that
of various diseases across different environmental included field images, aerial images captured by
circumstances. The combination of traditional drones and real-time sensory data from IoT
feature engineering methods with deep learning led platforms. The collected images were categorized
to better accuracy and robustness according to into three major groups: Our data collection
Ferentinos in his study [6]. included three distinct groups: healthy plant leaves
American scientists Lu et al. [7] applied data with normal appearance and diseased plant leaves
augmentation methods that included rotation and alongside ambiguous samples subject to
scaling as well as flipping to make their models environmental influences that might compromise
more generally applicable while lowering chances plant wellness. Multiple data preconditioning
techniques were employed during preprocessing to .
optimize dataset quality through 224 × 224 pixel
resizing and [0,1] normalization alongside 3.2 Preprocessing
Gaussian and median filtering to reduce noise. The
Adaptive Histogram Equalization (AHE) method The requirement for preprocessing in plant disease
added contrast enhancement to improve detection through the use of machine learning is
observation of features. The dataset enhancement paramount as its duty lies in creating clean
process included using data augmentation standardized images suitable for deep learning
mechanisms among random rotations together with models. The research preprocessing pipeline is
horizontal flip operations and vertical flip constructed upon several critical preprocessing
operations and scaling and cropping and brightness stages that perform image resizing combined with
modifications and Gaussian noise injection. By normalization, noise reduction, and contrast
applying these transformations to the images the enhancement using data augmentation to increase
approach could generate various lighting scenarios the quality and variety of a dataset.
together with scale changes and position alterations The preprocessing starts with image resizing where
to minimize the possibility of model overfitting. all image is resized to a standard 224 × 224 pixels
Appropriate model training alongside evaluation based on the required CNN execution.
was feasible through the allocation of 80% data to Furthermore, standardization enhances the model
training purposes while combining smoking efficiency, moreover creating a consistent input
processes assigned 10% to validation and testing dimension framework. Through pixel value scaling
[17-19]. Certified agricultural experts manually to a constant [0,1] interval, model training
examined each dataset image to provide explicit stabilization is achieved by normalizing image
disease classification. Without expert annotations feature to prevent divergent pixel intensity
the research combined semi-supervised learning fluctuations.
techniques with crowdsourcing during the data
labeling stage. Structured organization was Gaussian filtering and median filtering techniques
executed on the final dataset providing capabilities gives the image quality a good extra margin. In
for model development and deployment into real- addition to sensor induced imperfections, the
time agricultural systems methods help remove artificially derived artifacts in
the environment, enabling the model to discover
authentic image features. Adaptive Histogram
Equalization (AHE) is the next important milestone
reached by image preprocessing through the
application of contrast enhancement methods.
Image contrast is expanded by AHE refocusing
brightness values throughout pixel spaces making
diagnostic characteristics more noticeable.
Data augmentation has two fold benefits of
improving the model generalization and reducing
outcomes of overfitting. We used multiple data
augmentation techniques, from random rotations at
every 0 to 360 degrees and vertical and horizontal
flips, to random crop and zoom and brightness
controls and addition of Gaussian noise. The
dataset transformations bring artificial variation
into the dataset which allows the model to learn
persistent features fit for deployment to the real
world.
A handful leverage background segmentation to
remove unwanted image components so that
servers zero in on plant leaves for superior disease
identification. Background removal leads to
considerable improvement of model accuracy in
Figure 1: Proposed Architecture realistic field conditions which might introduce
other types of noise elements coming from the soil set and other improvements that enhance the
or water drops or neighbor plants. process of avoiding over fitting thus attaining good
This combination of methods is leveraged in generalization when it comes to real plant disease
producing the optimized dataset so that deep identification.
learning models can efficiently detect plant The dataset, preprocessed as described in the
diseases. previous section, is split into three subsets: The
splitting of data is in the ratio of four-fifths for
3.3 Model Design
training consumption, one-fifth for validation
The Convolutional Neural Network (CNN) system testing, and one fifth for final testing. The model
performs automatic feature extraction to achieve gets the parameters update from the training set and
classification through its design. The system has then the hyperparameters of the model are tuned as
various convolutional layers followed by max the validation set is processed and finally the model
pooling layers and the output layers include fully is evaluated with the help of the test data. It is
connected layers.. understood that visualization transformation of the
training data such as rotation, flipping sequences
CNN Architecture: and brightness adjustment enhance learning
Input layer: 224×224×3 performance.
Convolutional Neural Networks (CNN) serve as the
Convolutional layer (Conv1): W1∗I+b1
chosen architecture for training since they
Max-pooling layer (Pool1): Reduces demonstrate powerful features for image
spatial dimensions classification. A CNN model contains several parts
which start with convolutional layers and include
Fully connected layer: Converts features max-pooling layers followed by fully-connected
into class probabilities using a Softmax layers and terminate with an output Softmax layer.
function. ReLU (Rectified Linear Unit) activation functions
Convolution Operation: The convolution operation enable hidden layers to gain non-linearity yet batch
is defined as: normalization provides stability benefits with faster
convergence capabilities. In the output Softmax
𝑦[𝑖, 𝑗] =𝑚=− ∑𝑘
𝑛=− 𝑥[𝑖 + 𝑚, 𝑗 + 𝑛] ⋅ 𝑤[𝑚,
layer the function determines how probabilities are
∑𝑘 𝑘 𝑘 𝑛] +
distributed across several different disease
𝑏 ---1
categories.
Each model parameter receives learning rate
where:
x[i,j] is the input pixel at position (i,j), adjustments through the implementation of the
Adam optimizer. The learning rate (η) begins at
w[m,n] is the kernel weight at position
0.001 until the scheduler reduces it step by step.
(m,n),
The model is trained using the categorical cross-
b is the bias term. entropy loss function, given by:
𝐿=− 𝑦𝑖 log(𝑦̂𝜄)---4
∑𝐶
Softmax Function: The final output layer computes
probabilities for each class: 𝑖=
1
where:
𝑃
C is the number of classes,
( 𝑐 ∣ 𝑥 ) =𝑒𝑧𝑘 ---2 yi represents the true class label (one-hot
𝑘 𝐶
∑𝑗= 𝑒𝑧
𝑗
1
encoded),
3.4 Model Training
y^i is the predicted probability for class i.
The detection model's execution path includes four
decisive stages: From this paper, the process is During training dropout regularization (0.5
described in a step-by-step manner from the data probability) actively deactivates neurons from fully
set creation to the model selection, then the connected layers to boost model generality and
parameter tuning and finally the performance minimize overfitting. The model stops training
evaluation. During the training phase the model through early stopping when the validation loss
acquires adequate learning features from the data maintains stability for a predetermined number of
training cycles.
On-the-fly data augmentation occurred throughout Recall (Sensitivity or True Positive Rate)
50 epoch training with a 32 sample batch size. The
training process tracks key performance metrics Recall evaluates the model’s ability to detect actual
such as accuracy and precision and recall and F1- diseased samples. It is calculated as:
Recall =
score together with learning curve analysis done 𝑇𝑃
𝑇𝑃+𝐹
---7
𝑁
through loss and accuracy plots.
3.5 Evaluation Metrics A high recall score is crucial in plant disease
The performance assessment of the plant disease detection, as missing diseased samples can lead to
detection model utilizes multiple essential the spread of plant infections, causing severe
evaluation metrics. Measurement data reveals how agricultural losses.
accurately the model differentiates between
F1-Score (Harmonic Mean of Precision and
unhealthy and healthy plant leaves. The main
evaluation tools employed in this investigation
Recall)
consist of accuracy, precision, recall, F1-score, F1-score is the harmonic average of Precision and
confusion matrix and area under the ROC curve Recall and hence, it is a balanced measure. It is
(AUC-ROC). particularly useful when working with a dataset
Accuracy that has a different number of classes.
Precision×Recall
F1-Score = 2
×
A fundamental measure of model prediction quality ---8
Precision+Recall
called accuracy determines how much information A high F1-score indicates that the model maintains
is successfully predicted. It is defined as the ratio of a good balance between precision and recall.
correctly classified samples to the total number of
samples: The confusion matrix can be used to show the
Accuracy =
𝑇𝑃+𝑇𝑁
accuracy of the model as it will show the number of
correct and incorrect predictions for each class.
𝑇𝑃+𝑇𝑁+𝐹𝑃
----5
+𝐹𝑁
Table 1: Analysing the confusion matrix
where:
TP (True Positives): Correctly predicted Predicted Predicted
diseased samples.
Healthy Diseased
TN (True Negatives): Correctly predicted TN FP
Actual
healthy samples.
Healthy
FP (False Positives): Healthy samples FN TP
Actual
incorrectly classified as diseased.
Diseased
FN (False Negatives): Diseased samples
Table 1 helps identify specific misclassifications
incorrectly classified as healthy.
and areas where the model needs improvement.
Although accuracy is a useful metric, it may not be
Area Under the ROC Curve (AUC-ROC)
sufficient for imbalanced datasets where one class
dominates the other. The Receiver Operating Characteristic (ROC)
curve plots the true positive rate (recall) against the
Precision (Positive Predictive Value) false positive rate (1 - specificity). The AUC-ROC
Precision measures how many of the positively score quantifies the model’s ability to distinguish
predicted instances are actually correct. It is between classes:
AUC = ∫0 TPR 𝑑(FPR)---9
defined as: 1
Precision =
𝑇𝑃
𝑇𝑃+𝐹𝑃 ---6 A strong and effective model for leaf disease
classification produces high AUC values
A high precision value indicates that the model has approaching 1 as it demonstrates robust
a low false positive rate, making it reliable for discriminatory capacity to distinguish healthy from
applications where false alarms should be diseased leaves. Model stability and generalization
minimized. capabilities are confirmed by applying k-fold cross-
validation technique at k=5 to validate model
stability. The selected final model achieves the
Healthy
peak performance using F1-score and AUC-ROC
measures for evaluation. Actual 3 42
Diseased
4. RESULTS AND DISCUSSION
The performance of the proposed plant disease
detection model was evaluated using multiple
metrics, including accuracy, precision, recall, and
F1-score. A comparative analysis was conducted
between the CNN-based model, Support Vector
Machine (SVM), and Random Forest classifiers.
Table 2: Model Performance Comparison
Model Accuracy Precision Recall
(%) (%) (%)
CNN 96.8 97.2 96.5
(Proposed)
SVM 89.5 88.9 90.1
Random 85.7 86.0 85.4
Forest
Figure 3: Confusion Matrix
Minimum misidentification occurs because the true
The CNN-based model significantly outperforms positive rate achieves high accuracy in disease leaf
traditional machine learning models, achieving an classification. The model effectively identifies
accuracy of 96.8%, followed by SVM (89.5%) and diseased leaves because it produces extremely low
Random Forest (85.7%). The CNN model also numbers of false negative classifications.
achieved the highest F1-score (96.8%), indicating a
well-balanced model in terms of precision and V. CONCLUSION
recall. Plant disease detection using machine learning
frameworks depends on Convolutional Neural
Networks (CNNs) to examine images which leads
to proper classification. The proposed CNN-based
model outperformed traditional machine learning
approaches consisting of Support Vector Machines
(SVM) and Random Forest by producing increased
accuracy and precision as well as recall and F1-
score values. Plant disease identification
capabilities shown by CNN yielded unprecedented
results with an accuracy rate reaching 96.8%.
Results emerged from research how effectively
Figure 2: Model Performance Comparison processed data with deep learning techniques and
data augmentation methods produce substantial
Table 3: The confusion matrix provides a enhancements in plant disease detection capacities
detailed view of the model’s classification and operational dependability levels. The combined
performance: use of Plant Village data and in-field collected
imagery improved model generalization enabling it
Predicted Predicted to process real-world situations including lighting
Healthy Diseased variations and orientation adjustments as well as
background noises.
Actual 50 5
reation procedures improve model generalization
capabilities according to testing data while
simultaneously illustrating CNNs' feature [15]. Selvaraju, R. R., et al. (2017). Grad-CAM: Visual
extraction benefits beyond standard methods and explanations from deep networks. IEEE
documenting upcoming applications of machine International Conference on Computer Vision
learning with IoT devices for in-field sickness (ICCV).
detection. Analysis through confusion matrices [16]. S. K. Suman, D. Kumar, L. Bhagyalakshmi, "SINR
Pricing in Non Cooperative Power Control Game
proved the proposed system reliable due to minimal
for Wireless Ad Hoc Networks," KSII Transactions
counts of false positives and false negatives. on Internet and Information Systems, vol. 8, no. 7,
pp. 2281-2301, 2014. DOI:
REFERENCES 10.3837/tiis.2014.07.005
[17]. Bhagyalakshmi, L., Suman, S.K. & Sujeethadevi,
[1]. Mohanty, S. P., Hughes, D. P., & Salathé, M.
T. Joint Routing and Resource Allocation for
(2016). Using deep learning for image-based plant
Cluster Based Isolated Nodes in Cognitive Radio
disease detection. Frontiers in Plant Science, 7,
Wireless Sensor Networks. Wireless Pers
1419.
Commun 114, 3477–3488 (2020).
[2]. Camargo, A., & Smith, J. S. (2009). Image pattern https://doi.org/10.1007/s11277-020-07543-4
classification for the identification of disease-
[18]. K. Ramu, Gopu Sreenivasulu, Rinku Sharma Dixit,
causing agents in plants. Computers and
Shailee Lohmor Choudhary, Katakam
Electronics in Agriculture, 66(2), 121-125.
Venkateswara Rao, Sanjay Kumar Suman,
[3]. Aouthu, S., Suman, S.K., Anuradha, S. et Mohammed shuaib, A. Rajaram, (2025). Smart
al. Automated Diagnosis of Acute Cerebral solar power Conversion: Leveraging Deep learning
Ischemic Stroke Lesions using Capsule Graph MPPT and hybrid cascaded h-bridge multilevel
Neural Networks from Diffusion-weighted MRI inverters for optimal efficiency. Biomedical Signal
Scans. J. Electr. Eng. Technol. (2025). Processing and Control, Volume 105, 2025,
https://doi.org/10.1007/s42835-024-02120-2 107582, ISSN 1746-8094,
[4]. Zhang, S., et al. (2018). Deep learning-based https://doi.org/10.1016/j.bspc.2025.107582.
detection of plant diseases with a novel lightweight [19]. Pantazi, X. E., et al. (2019). A review of crop
CNN architecture. Computers and Electronics in disease detection and monitoring. Biosystems
Agriculture, 151, 31-39. Engineering, 184, 110-123.
[5]. Brahimi, M., et al. (2017). Deep learning for plant
diseases: Detection and saliency map visualization.
Human Journal, 9(2), 1-7.
[6]. Sladojevic, S., et al. (2016). Deep neural networks
based recognition of plant diseases by leaf image
classification. Computational Intelligence and
Neuroscience.
[7]. Ferentinos, K. P. (2018). Deep learning models for
plant disease detection and diagnosis. Computers
and Electronics in Agriculture, 145, 311-318.
[8]. Lu, Y., et al. (2017). An in-field automatic wheat
disease diagnosis system. Computers and
Electronics in Agriculture, 142, 369-379.
[9]. Ramcharan, A., et al. (2019). A mobile-based deep
learning model for cassava disease diagnosis.
Frontiers in Plant Science, 10, 272.
[10]. Singh, A.K., Deepika, C.L.N., Shahnaz, K.V. et
al. Hybrid Xception-LSTM Model for Remote
Sensing: Advanced Urban Heat Island and Land
Use Analysis. Remote Sens Earth Syst Sci (2024).
https://doi.org/10.1007/s41976-024-00182-4.
[11]. Sharada, K., Choudhary, S.L., Harikrishna, T. et
al. GeoAgriGuard: AI-Driven Pest and Disease
Management with Remote Sensing for Global Food
Security. Remote Sens Earth Syst Sci (2025).
https://doi.org/10.1007/s41976-025-00192-w
[12]. Li, Y., et al. (2020). Edge computing-based plant
disease detection system. Sensors, 20(11), 3103.
[13]. Hughes, D. P., & Salathé, M. (2015). An open
access repository of images on plant health. arXiv
preprint arXiv:1511.08060.
[14]. Behmann, J., et al. (2015). SpecHyperspectral
imaging for precision agriculture. Precision
Agriculture, 16, 239-246.