SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 09 | Sep 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 448
Agricultural Data Modeling and Yield Forecasting using Data
Mining Techniques
Rithesh Pakkala P.1, Akhila Thejaswi R2
1,2Assistant Professor, Department of Information Science of Engineering, Sahyadri College of Engineering &
Management, Mangaluru, Karnataka, India
---------------------------------------------------------------------***----------------------------------------------------------------------
Abstract - Agriculture encompasses a great impact on the
economy of developing countries. The Global change in
climatic conditions and the cost of investment in agriculture
are major obstacle for small-holder farmers. The proposed
work intends to design a predictive model that provides a
cultivation plan for the farmers to high yield of paddy crop
using data mining techniques. Data miningtechniquesextract
hidden knowledge through data analysis, unlike statistical
approaches. The dataset is collected from the agricultural
department. K- Means clustering and various classifiers like
Support Vector Machine, Naïve Bayes are applied to
meteorological and agronomic data for the paddy crop. The
performance of various classifiers are validated and
compared. The result of the work is the accurate prediction of
crop yield. The final rules extracted by this work are useful for
farmers to make proactive and knowledge-driven decisions
before harvest.
Key Words: Data mining, Predictive Model, K - Means
Clustering, Support Vector Machine, Naïve Bayes
classifiers
1. INTRODUCTION
Data mining is the process of analysing various hidden
patterns of data according to different views for
categorization into required information. This data is been
collected and gathered from common area, such as
agriculture department, for efficient analysis, data mining
algorithms, improving business decision making and other
information requirementsto ultimately reducethecosts and
increase revenue.
Data mining technique is intended in extracting the hidden,
useful and interesting patterns from raw data. Data mining
tools predict future trends and behaviours, allowing
businesses to make proactive knowledge based decision.
Data mining accompanies the use of complicated statistics,
analysis tools to find previously unknown, suitable unseen
structure and interaction in the huge dataset. It helps to
develop a predictive model that provides a cultivation plan
for farmers to get high yield of paddy crops.
Descriptive data mining tasks featurize the general
properties of the data in the database while predictive data
mining is used to predict explicit values based on patterns
determined from known results. Prediction also involves
usage of some fields or variables in the database to predict
unknown or future values of other variables that are of
concern. As far as data mining technique are concerned in
most of cases, predictive data mining approaches are been
used. Predictive data mining technique is used to predict
future crop, pesticides, weather forecastingandfertilizersto
be used, revenue to be generated and so on.
Forecasting crop productivity is one of the scientific
techniques of predicting crop yield before harvest. Data
mining techniques like clustering and classification are
performed in order to maximize the crop yield prediction. A
final prediction model is developed and implemented, that
protects farmers from agricultural risks by providing a
framework that helps them in scientific decision making in
agriculture.
Using this predictive model, farmers can plan the cultivation
process well in advance. To prevent loss, farmers can
identify suitable combinations of varying factors like seed
quality, rainfall, temperature and sowing procedure. It is a
scientific model that provides suitable cultivation plans to
farmers in accordance with the changing agronomic factors.
Paddy is a pivotal crop in south India. Yield of paddy crop
depends on various meteorological and agronomic factors
such as seed quality, rainfall, temperature and sowing
procedure. In order to evaluate the relationship between
these factors and crop yield and to identify the input
variables effecting the output of paddy crop, a realtime data
set is collected from farmers cultivating paddy isusedinthis
research.
Raw agricultural data are pre-processed and only the
necessary factors are establishedby filtering.Themajordata
mining techniques used in this research are K-means
clustering and classifiers such as Support Vector Machine,
Naïve Bayes. Performances of theabovearecomparedbased
on classier accuracy measures. The final knowledge
regarding the cultivation plan is discovered, evaluated, and
presented. The result of the desire models will help
agribusiness associations in equip agriculturists with
necessary information as to which factors add to high yield.
2. RELATED WORK
This section describes the various works carried out in the
relevant fields.
Jharna Majumdar et al.[1] proposed a data mining model
which is applied on agriculture dataset using different
clustering algorithms such as DBSCAN, PAM and CLARA.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 09 | Sep 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 449
Clustering is considered as an unsupervised classification
process. Clustering techniques can be divided into
Partitioningclustering,Hierarchicalclustering,Densitybased
methods, Grid-based methods and Model based clustering
methods. Clustering methods are contrasted using quality
metrics. According to the analysis of clustering quality
metrics, DBSCAN gives the better clustering quality when
compared to PAM and CLARA, CLARA gives the better
clustering quality than the PAM. At the end comparison
through different factors which includes Root Mean Squared
Error(RMSE), Mean Absolute Error (MAE), etc.
A predictive model that provides a cultivation plan for
farmers to get high yield of paddy crops using data mining
techniques is proposed by Anitha Arumugam[2]. Data is
collected from farmers cultivating paddy along the
Thamirabarani river basin. K-means and various decision
tree classifiers are applied. The performance of various
classifiers is validated and compared. In this research, K-
means clustering is integratedwithdecisiontreeclassifiersin
order to improve classification accuracy. Even with worst-
case performance using decision stump, integration of k-
means clustering has improved the accuracy from 63.5.
Shruti Mishra et al. [3] describes the use of data mining in
crop yield prediction. Classification is a technique in data
mining that assigns item in acollection to targetcategoriesof
classes. Different classifiers are used namely J48, LWL, LAD
Tree and IBK forprediction andthentheperformanceofeach
is compared using WEKA tool. The classifiers are compared
with the values of accuracy,rootmeansquarederror(RMSE),
mean absolute error (MAE) and relative absolute error
(RAE). Lesser the value of error, more accuratethealgorithm
will work.
Akanksha Verma et al. [4] proposed trim forecastframework
utilizing fuzzy bunching strategies and neural system. In the
proposed approachinitiallytherawdatasetistakenandthen
clustering and classificationwill be performed inordertoget
the results in MATLAB. To perform clustering,Fuzzycmeans
clustering approach is taken. Fuzzy C Means clustering is a
precised learning algorithm that provides less error rate
probabilityand arranges thedata in the hierarchicalmanner.
The framework encourages ranchers to do right things at
opportune time. The model will help the ranchers in
expanding their profitability by choosing the proper harvest
for their land, soil temperature, humidity and few other
conditions.
Pooja M C et al. [5] designed a model to forecast crop yield
using data mining technique J48of C4.5 algorithm. A modelis
built which uses past information like soil type, soil pH, ESP,
EC of a particular region to give better crop yield estimation
for that region. This model can be used to select the most
excellent crops for the region and also its yield there by
improving the values and gain of farming also. This aids
farmers to decide on the crop they would like to plant for the
forthcoming year. Prediction will help the associated
industries for planning the logistics of their business.
Rupinder Singh and Gurpreeth Singh[6] proposed a model
using decision tree algorithm to predict accurate yield of
crop. The work done is based on detecting the influence of
rainfall and relative humidity on wheat crop yield. Decision
tree results indicated that rainfall and relative humidity has
more influence over wheat crop yield during vegetative
period as compared to reproduction and maturation period.
Rules generated from decision tree analysis will help the
users to predict theconditionsresponsibleforvariablewheat
crop yield under given meteorological parameters.
Ramesh Babu Palepu and Rajesh Reddy Muley[7] used data
mining for forecasting the future trends of agricultural
processes. This paper presents about the role of data mining
in perspective of soil analysis in the field of agriculture and
also confers about several data mining techniques and their
related work by different authors in context to domainofsoil
analysis. The data mining techniques are of very up-to-the-
minute in the area of soil analysis. In this model fuzzy
algorithmareappliedformanagingcrops,K-meansalgorithm
used for classify the soils and Support Vector Machine
technique applied to predict the crop yield.
U. Kumar Dey et al.[8] in their study analyzed crop yield
prediction by using Support Vector Machine (SVM), Multiple
LinearRegression (MLR), AdaBoostandModifiedNonLinear
Regression forthe regions in Bangladesh. Rice is dividedinto
three categories: Aman, Aus and Boro. Prediction is done
during the aforementioned seasons. The different training
techniques were judged based on the Root Means Square
Error(RMSE) and Mean Absolute Error(MAE) values.
R.Sujatha and Dr P. Isakki[9] proposed the model to predict
crop yield using classification techniques. The paper
describeshowimprovingagricultureefficientbyprophesying
and improves yields by previous agriculture information. It
also used to select a best crop by farmer, to plant depending
on the weather situation and provides required information
to prefer the suitable season to do excellence farming.
Monali Paul et al. [10] proposed the model to predict crop
yield using two classification techniques i.e., K-Nearest
Neighbor and Naive Bayes algorithms. These algorithms are
applied to the soil dataset which is taken from the soiltesting
laboratory Jabalpur, M.P. There accuracy is obtained by
evaluating the datasets. Classification of soil into low,
medium and high categories are done by adopting data
mining techniques in order to predict the crop yield using
available dataset. This study can help the soil analysts and
farmers to decide farmers to decide sowing in which land
may result in better crop production.
Researchers namely Ramesh and Vishnu Vardhan [11] are
analysed the agriculture data for the years 1965–2009 in the
district East Godavari of Andhra Pradesh, India.Rainfalldata
is clustered into 4 clusters by adopting the K means
clustering method. Multiple linear regression(MLR)isoneof
the data mining technique that is used to model the linear
relationship existing between a dependent variable and one
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 09 | Sep 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 450
or more independent variables. The dependent variable is
rainfall and independent variables are year, area of sowing,
production. Purpose of the work is to obtain many suitable
data models that achieve high accuracy in terms of yield
prediction capabilities.
Geraldin B Dela Cruz et al. [12] focus of this study is to
implement an efficient data mining mechanism based on the
combination of Principal ComponentAnalysis(PCA)asapre-
processing method and a modifiedGeneticAlgorithm(GA)as
the learningalgorithm, in order toreducecomputationalcost
and time by keeping a number of features as discriminating
and small as possible. In doing so, generating agricultural
cropsclassification models isefficientandcharacterizationis
improved. The Principle Component Analysis-Genetic
Algorithm(PCA-GA) data mining mechanism will be
implemented for agricultural crops dataset to identify key
attribute combinations and characteristics that determine
crop performance.
Anshu Bharadwaj et al. [13] in their research attempt to
extend the boundaries of discretization and to evaluate its
effect on other machinelearningtechniquesforclassification,
support vector machines with and without discretization of
the datasets. On comparing the results obtained by the
algorithms which was previously mentioned, it was inferred
that Discretization based Support Vector Machine (D-SVM)
produced the model with highest accuracy for all the
datasets. The results clearly indicate that the accuracies of
discretization based SVM are better when compared to that
of the classification accuracy without SVM of the same
datasets when they were classified without getting
discretized.
Shruthi Ramdas and RPakkala discussed about how Web
mining makes use of data mining classification techniques to
automatically discover Web documents andservices, extract
information from Web resources, and uncover general
patterns on the Web for analyzing visitor activities in
network-based systems in [14]. The objective of this study is
to classify the users into interested or non interested for
particular website based on their access pattern using web
server log’s by making use of decision tree classification
technique.
3. PROPOSED METHODOLOGY
The main objectives of the proposed work are as follows:
 To perform classification using differentalgorithms
like Support Vector Machine, Naïve Bayes
 Predict the crop yield.
The agricultural datasetinvolvingattributessuchasseed
quality, rainfall, sowing procedure and temperature is
collected from different sources. The raw data collected is
pre-processed using pre-processing techniques which
removes the outliers and irrelevant data. This is followed by
data mining techniques such as clustering and classification
techniques in order to determine the yield of the crop. The
proposed framework is shown below.
Fig -1: Schematic representation of Proposed Framework
Pre-processing is a technique to improve the quality of data
to be presented to the mining process. In the proposed
system data pre-processing is dine in 3 major ways: (i) data
cleaning (ii) attribute selection (iii) transformation.
Data cleaning is a method of replacing incomplete,
inconsistent, and noisy data. Some attributes have null data
or missing data. To obtain high quality knowledge, cleaning
is done by eliminating null values. Attributes that contribute
more to mining process are identified in this process.
Attribute selection is done by eliminating irrelevant and
redundant attributes. The process of converting the data to
the form suitable for mining task is called transformation.
Attributes with numeric values are converted to categorical
attributes.
Basically data mining techniques areusedinordertopredict
the crop yield. The farmer gives different attributes or
parameters as an input, and later by using support vector
machine the possible yield of crop is been predictedinterms
of low, medium or high. The output of support vector
machine is been directed to the Naïve Bayes which gives the
accuracy of prediction.
Classification is a supervised learningprocessbywhichdata
objects are grouped into classes of known labels.
Classification algorithms uses classifiers to classify a group
of similar objects under one type and when a new object is
introduced, prediction is made so as to put that object into
one of the class. It involves two phases: learning phase and
classification phase. In the learning phase, training data are
analyzed and a classifier model is built. In the classification
phase, the test data are used to estimate the accuracy of
classification.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 09 | Sep 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 451
The support vector machines(SVM) are supervised
learning models with associated learning algorithm that
examines the data that have been used for the purpose of
classification and clustering analysis. A set of training
examples are given which includes the dataset, eachmarked
as belonging to one or the other of two categories, an SVM
training algorithm builds a model thatassignsnewexamples
to one category or the other, making it a non-probabilistic
binary linear classifiers. An SVM model isa representationof
the given examples as points in space, mapped in order that
the examples of the separate categories are divided by a
clear gap that is as wide enough. New examples are then
mapped and adapted in such a way that the same space and
predicted value to belong to a category based on which side
of the gap they will appear.
Naive Bayes classifiers are a community of simple
probabilistic classifiers in data mining that are based on
applying Bayes theorem with intensive and strong
independence considerations that are existing between
their characteristics. Naive Bayes classifiers are highly
scalable, highly changeable, requiring a number of
parameters that are linear in the number of variables
(features/predictors) defined in a learning
problem. Maximum likelihood training can be done by
evaluating a closed form expression, whichtakeslineartime,
rather than by any other cost inefficient iterative
approximation that are used for other types of classifiers.
Psuedo Code:
1. Preprocess the raw data and prepare it for
classification
2. Input the preprocessed data for Support Vector
Machine
if the dataset is separable
then create a hyperplane using equation
ax+b=0
else
then create a hyperplane using equation
y(ax+b)>=1
3. Use the classes formed from hyperplane for further
classification
4. Input the data for classification
5. Classify it with SVM into different classes
6. Predict the output
4. RESULTS
From the raw data collected from various agricultural
departments, a pre-processing operation is carried out. The
classification method identifies suitable classes such as low
depicted as 0, medium depicted as 1 and high depicted as 2.
Fig -2: Classification of Data based on Temperature and
Seed Quality
Fig -3: Clustering of Data based on Rainfall and Seed
Quality
Fig -4: Classification of Data based on Temperature and
Rainfall
The attribute which affects the crop yield are taken on
different axes on the graph and their respective crop yield
result is shown in Figure 2, 3 and 4. Figure 2 depicts the
graph which is plotted by taking seed quality along the
horizontal plane and temperature along the vertical plane.
Figure 3 depicts the graph which is plotted by taking seed
quality along the horizontal plane and rainfall along the
vertical plane. Figure 4 depicts the graph which is plotted by
taking rainfall along the horizontal plane and temperature
along the vertical plane.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 09 | Sep 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 452
5. CONCLUSION
The major outcome of this work is the prediction of the rice
crop yield based on the input given by the farmers. There is
also an accurate extraction of hidden knowledge about a
cultivation plan involving major agronomic and
meteorological factors. This knowledge thus obtained is
useful for getting high yield of rice crop. In the project, since
there is the usage of both clustering as well as classification
the accuracy attained is high. This accuracy is been
measured by using Naïve Bayes algorithm. The result shows
that different kinds of estimations that can be done on a
crop.
REFERENCES
[1] Jharna Majumdar, Sneha Naraseeyappa and Shilpa
Ankalaki, “Analysis of Agriculture Data using Data
Mining Techniques: Application of Big Data”, Journal of
Big Data, Springer, 2017.
[2] Anitha Arumugam, ”A PredictiveModelingApproachfor
Improving Paddy Crop Productivity using Data Mining
Techniques“, Turkish Journal of Electrical Engineering
Computer Sciences 2017.
[3] Shruti Mishra, Priyanka Paygude,Snehal Chaudharyand
Sonali Idate, “Use of Data Mining in Crop Yield
Prediction”, IEEE International ConferenceonInventive
Systems and Control (ICISC 2018).
[4] Akanksha Verma, Aman Jatain and Shalini Bajaj, “Crop
Yield Prediction of Wheat using Fuzzy C Means
Clustering and Neural Network”,International Journal of
Applied Engineering Research 2017.
[5] Pooja M C, Sangeetha M, Shreyaswi J Salian, Veena
Kamath and Mithun Naik,“ImplementationofCrop Yield
Forecasting using Data Mining”, International Research
Journal of Engineering and Technology 2017.
[6] Rupinder Singh and Gurpreeth Singh,“WheatCrop Yield
Assessment using Decision Tree Algorithms”,
International Journal ofAdvancedResearchinComputer
Science 2017.
[7] Ramesh Babu Palepu and Rajesh Reddy Muley, “An
Analysis of Agricultural Soils by using Data Mining
Techniques”, International Journal of Engineering
Science and Computing 2017.
[8] Umid Kumar Dey, Abdulla Hassan Masud and
Mohammed Nazim Uddin, “Rice Yield Prediction Model
using Data Mining”, International Conference on
Electrical, Computer and Communication Engineering
(ECCE) 2017.
[9] R.Sujatha and Dr.P.Isakki, “A Study on Crop Yield
Forecasting using Classification Techniques”, IEEE
International Conference on Computing Technologies
and Intelligent Data Engineering (ICCTIDE'16), 2016.
[10] Monali Paul, Santosh K. Vishwakarma andAshok Verma,
“Analysis of Soil Behaviour and Prediction of Crop Yield
Using Data Mining Approach”, IEEE International
Conference on Computational Intelligence and
Communication Networks (CICN), 2015.
[11] D Ramesh and B Vishnu Vardhan,“Analysisof CropYield
Prediction using Data Mining Techniques”,International
Journal of Research in Engineering and Technology
2015.
[12] Geraldin B Dela Cruz, Bobby D Gerardo, Bartolome T,
“Agricultural Crops Classification ModelsBasedonPCA-
GA Implementation in Data Mining”, International
Journal of Modeling and Optimization 2014.
[13] Anshu Bharadwaj, Shashi Dahiya and Shashi Dahiya,
”Discretization based Support Vector Machine(D-SVM)
for Classification of Agricultural Datasets”,International
Journal of Computer Applications 2014.
[14] Shruthi Ramdas, Rithesh Pakkala P., Akhila Thejaswi R,
“Determination and Classification of InterestingVisitors
of Websites using Web Logs”, International Journal of
Computer Science and Mobile Computing, Vol.5,Issue.1,
January-2016, pg.01-09.

More Related Content

PDF
IRJET- Analysis of Crop Yield Prediction using Data Mining Technique to Predi...
IRJET Journal
 
PDF
IRJET- Smart Farming Crop Yield Prediction using Machine Learning
IRJET Journal
 
PDF
IRJET- Agricultural Crop Yield Prediction using Deep Learning Approach
IRJET Journal
 
PDF
Analysis of crop yield prediction using data mining techniques
eSAT Journals
 
PDF
IRJET - Analysis of Crop Yield Prediction by using Machine Learning Algorithms
IRJET Journal
 
PDF
Crop Recommendation System to Maximize Crop Yield using Machine Learning Tech...
IRJET Journal
 
PDF
IRJET - Enlightening Farmers on Crop Yield
IRJET Journal
 
PPTX
Crop predction ppt using ANN
Astha Jain
 
IRJET- Analysis of Crop Yield Prediction using Data Mining Technique to Predi...
IRJET Journal
 
IRJET- Smart Farming Crop Yield Prediction using Machine Learning
IRJET Journal
 
IRJET- Agricultural Crop Yield Prediction using Deep Learning Approach
IRJET Journal
 
Analysis of crop yield prediction using data mining techniques
eSAT Journals
 
IRJET - Analysis of Crop Yield Prediction by using Machine Learning Algorithms
IRJET Journal
 
Crop Recommendation System to Maximize Crop Yield using Machine Learning Tech...
IRJET Journal
 
IRJET - Enlightening Farmers on Crop Yield
IRJET Journal
 
Crop predction ppt using ANN
Astha Jain
 

What's hot (15)

PDF
IRJET- Survey on Crop Suggestion using Weather Analysis
IRJET Journal
 
PDF
Decision support system for precision agriculture
eSAT Publishing House
 
PDF
IRJET- Survey of Estimation of Crop Yield using Agriculture Data
IRJET Journal
 
PDF
Crop Selection Method Based on Various Environmental Factors Using Machine Le...
IRJET Journal
 
PDF
IRJET - Agricultural Analysis using Data Mining Techniques
IRJET Journal
 
PDF
IRJET- Agricultural Productivity System
IRJET Journal
 
PDF
Farmer's Analytical assistant
IJSRED
 
PDF
Feed forward back propagation neural
ijcseit
 
PDF
Forest Area Estimation in Kutai Nasional Park of East Kalimantan Using Comput...
Editor IJCATR
 
PPT
Predicting food demand in food courts by decision tree approaches
Selman Bozkır
 
PDF
IRJET- Weather Prediction for Tourism Application using ARIMA
IRJET Journal
 
PPT
Jianqiang Ren_Simulation of regional winter wheat yield by EPIC model.ppt
grssieee
 
PDF
IRJET- Estimation of Nitrogen Content in Maize Leaves using Image Processing ...
IRJET Journal
 
PDF
IRJET- Price Forecasting System for Crops at the Time of Sowing
IRJET Journal
 
PDF
Optimum combination of farm enterprises among smallholder farmers in umuahia ...
Alexander Decker
 
IRJET- Survey on Crop Suggestion using Weather Analysis
IRJET Journal
 
Decision support system for precision agriculture
eSAT Publishing House
 
IRJET- Survey of Estimation of Crop Yield using Agriculture Data
IRJET Journal
 
Crop Selection Method Based on Various Environmental Factors Using Machine Le...
IRJET Journal
 
IRJET - Agricultural Analysis using Data Mining Techniques
IRJET Journal
 
IRJET- Agricultural Productivity System
IRJET Journal
 
Farmer's Analytical assistant
IJSRED
 
Feed forward back propagation neural
ijcseit
 
Forest Area Estimation in Kutai Nasional Park of East Kalimantan Using Comput...
Editor IJCATR
 
Predicting food demand in food courts by decision tree approaches
Selman Bozkır
 
IRJET- Weather Prediction for Tourism Application using ARIMA
IRJET Journal
 
Jianqiang Ren_Simulation of regional winter wheat yield by EPIC model.ppt
grssieee
 
IRJET- Estimation of Nitrogen Content in Maize Leaves using Image Processing ...
IRJET Journal
 
IRJET- Price Forecasting System for Crops at the Time of Sowing
IRJET Journal
 
Optimum combination of farm enterprises among smallholder farmers in umuahia ...
Alexander Decker
 
Ad

Similar to IRJET- Agricultural Data Modeling and Yield Forecasting using Data Mining Techniques (20)

PDF
A COMPREHENSIVE SURVEY ON AGRICULTURE ADVISORY SYSTEM
IRJET Journal
 
PDF
IMPLEMENTATION PAPER ON AGRICULTURE ADVISORY SYSTEM
IRJET Journal
 
PDF
An Efficient and Novel Crop Yield Prediction Method using Machine Learning Al...
IIJSRJournal
 
PDF
IRJET - Agrotech: Soil Analysis and Crop Prediction
IRJET Journal
 
PDF
An Overview of Crop Yield Prediction using Machine Learning Approach
IRJET Journal
 
PPTX
ZEROTH REVIEW - Predictive Analytics for Crop Yield Using Data Mining Techniq...
vignesh106121
 
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
IRJET Journal
 
PDF
Datamining 4
ruledbyrobotics2080
 
PDF
Crop yield prediction.pdf
ssuserb22f5a
 
PDF
IRJET-Precision Farming and Big Data
IRJET Journal
 
PDF
journalism research paper
chaitanya451336
 
PDF
Crop Yield Prediction using Machine Learning
IRJET Journal
 
PDF
IRJET- Crop Prediction System using Machine Learning Algorithms
IRJET Journal
 
PDF
IRJET- Rice Yield Prediction using Data Mining Technique
IRJET Journal
 
PDF
RECOMMENDATION OF CROP AND PESTICIDES USING MACHINE LEARNING
IRJET Journal
 
PDF
Crop Prediction System using Machine Learning
ijtsrd
 
PDF
Crop yield prediction using data mining techniques.pdf
ssuserb22f5a
 
PDF
Predicting yield of crop type and water requirement for a given plot of land...
International Journal of Reconfigurable and Embedded Systems
 
PDF
RANDOM FOREST APPLICATION FOR CROP YIELD PREDICTION
sipij
 
PDF
Random Forest Application for Crop Yield Prediction
sipij
 
A COMPREHENSIVE SURVEY ON AGRICULTURE ADVISORY SYSTEM
IRJET Journal
 
IMPLEMENTATION PAPER ON AGRICULTURE ADVISORY SYSTEM
IRJET Journal
 
An Efficient and Novel Crop Yield Prediction Method using Machine Learning Al...
IIJSRJournal
 
IRJET - Agrotech: Soil Analysis and Crop Prediction
IRJET Journal
 
An Overview of Crop Yield Prediction using Machine Learning Approach
IRJET Journal
 
ZEROTH REVIEW - Predictive Analytics for Crop Yield Using Data Mining Techniq...
vignesh106121
 
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
IRJET Journal
 
Datamining 4
ruledbyrobotics2080
 
Crop yield prediction.pdf
ssuserb22f5a
 
IRJET-Precision Farming and Big Data
IRJET Journal
 
journalism research paper
chaitanya451336
 
Crop Yield Prediction using Machine Learning
IRJET Journal
 
IRJET- Crop Prediction System using Machine Learning Algorithms
IRJET Journal
 
IRJET- Rice Yield Prediction using Data Mining Technique
IRJET Journal
 
RECOMMENDATION OF CROP AND PESTICIDES USING MACHINE LEARNING
IRJET Journal
 
Crop Prediction System using Machine Learning
ijtsrd
 
Crop yield prediction using data mining techniques.pdf
ssuserb22f5a
 
Predicting yield of crop type and water requirement for a given plot of land...
International Journal of Reconfigurable and Embedded Systems
 
RANDOM FOREST APPLICATION FOR CROP YIELD PREDICTION
sipij
 
Random Forest Application for Crop Yield Prediction
sipij
 
Ad

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
IRJET Journal
 
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
IRJET Journal
 
PDF
Kiona – A Smart Society Automation Project
IRJET Journal
 
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
IRJET Journal
 
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
IRJET Journal
 
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
IRJET Journal
 
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
IRJET Journal
 
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
IRJET Journal
 
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
IRJET Journal
 
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
IRJET Journal
 
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
IRJET Journal
 
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
IRJET Journal
 
PDF
Breast Cancer Detection using Computer Vision
IRJET Journal
 
PDF
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
PDF
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
PDF
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
PDF
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
PDF
FIR filter-based Sample Rate Convertors and its use in NR PRACH
IRJET Journal
 
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
IRJET Journal
 
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
IRJET Journal
 
Kiona – A Smart Society Automation Project
IRJET Journal
 
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
IRJET Journal
 
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
IRJET Journal
 
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
IRJET Journal
 
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
IRJET Journal
 
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
IRJET Journal
 
BRAIN TUMOUR DETECTION AND CLASSIFICATION
IRJET Journal
 
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
IRJET Journal
 
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
IRJET Journal
 
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
IRJET Journal
 
Breast Cancer Detection using Computer Vision
IRJET Journal
 
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
FIR filter-based Sample Rate Convertors and its use in NR PRACH
IRJET Journal
 

Recently uploaded (20)

PDF
FLEX-LNG-Company-Presentation-Nov-2017.pdf
jbloggzs
 
PDF
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 
PDF
Advanced LangChain & RAG: Building a Financial AI Assistant with Real-Time Data
Soufiane Sejjari
 
PDF
Zero Carbon Building Performance standard
BassemOsman1
 
PDF
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 
PDF
flutter Launcher Icons, Splash Screens & Fonts
Ahmed Mohamed
 
PDF
EVS+PRESENTATIONS EVS+PRESENTATIONS like
saiyedaqib429
 
PPTX
MT Chapter 1.pptx- Magnetic particle testing
ABCAnyBodyCanRelax
 
PDF
Zero carbon Building Design Guidelines V4
BassemOsman1
 
PDF
Introduction to Data Science: data science process
ShivarkarSandip
 
PPTX
Victory Precisions_Supplier Profile.pptx
victoryprecisions199
 
PPT
SCOPE_~1- technology of green house and poyhouse
bala464780
 
PPTX
Civil Engineering Practices_BY Sh.JP Mishra 23.09.pptx
bineetmishra1990
 
PPTX
business incubation centre aaaaaaaaaaaaaa
hodeeesite4
 
PDF
dse_final_merit_2025_26 gtgfffffcjjjuuyy
rushabhjain127
 
PDF
2010_Book_EnvironmentalBioengineering (1).pdf
EmilianoRodriguezTll
 
PPTX
22PCOAM21 Session 1 Data Management.pptx
Guru Nanak Technical Institutions
 
PPTX
Tunnel Ventilation System in Kanpur Metro
220105053
 
PDF
Natural_Language_processing_Unit_I_notes.pdf
sanguleumeshit
 
PDF
Top 10 read articles In Managing Information Technology.pdf
IJMIT JOURNAL
 
FLEX-LNG-Company-Presentation-Nov-2017.pdf
jbloggzs
 
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 
Advanced LangChain & RAG: Building a Financial AI Assistant with Real-Time Data
Soufiane Sejjari
 
Zero Carbon Building Performance standard
BassemOsman1
 
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 
flutter Launcher Icons, Splash Screens & Fonts
Ahmed Mohamed
 
EVS+PRESENTATIONS EVS+PRESENTATIONS like
saiyedaqib429
 
MT Chapter 1.pptx- Magnetic particle testing
ABCAnyBodyCanRelax
 
Zero carbon Building Design Guidelines V4
BassemOsman1
 
Introduction to Data Science: data science process
ShivarkarSandip
 
Victory Precisions_Supplier Profile.pptx
victoryprecisions199
 
SCOPE_~1- technology of green house and poyhouse
bala464780
 
Civil Engineering Practices_BY Sh.JP Mishra 23.09.pptx
bineetmishra1990
 
business incubation centre aaaaaaaaaaaaaa
hodeeesite4
 
dse_final_merit_2025_26 gtgfffffcjjjuuyy
rushabhjain127
 
2010_Book_EnvironmentalBioengineering (1).pdf
EmilianoRodriguezTll
 
22PCOAM21 Session 1 Data Management.pptx
Guru Nanak Technical Institutions
 
Tunnel Ventilation System in Kanpur Metro
220105053
 
Natural_Language_processing_Unit_I_notes.pdf
sanguleumeshit
 
Top 10 read articles In Managing Information Technology.pdf
IJMIT JOURNAL
 

IRJET- Agricultural Data Modeling and Yield Forecasting using Data Mining Techniques

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 09 | Sep 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 448 Agricultural Data Modeling and Yield Forecasting using Data Mining Techniques Rithesh Pakkala P.1, Akhila Thejaswi R2 1,2Assistant Professor, Department of Information Science of Engineering, Sahyadri College of Engineering & Management, Mangaluru, Karnataka, India ---------------------------------------------------------------------***---------------------------------------------------------------------- Abstract - Agriculture encompasses a great impact on the economy of developing countries. The Global change in climatic conditions and the cost of investment in agriculture are major obstacle for small-holder farmers. The proposed work intends to design a predictive model that provides a cultivation plan for the farmers to high yield of paddy crop using data mining techniques. Data miningtechniquesextract hidden knowledge through data analysis, unlike statistical approaches. The dataset is collected from the agricultural department. K- Means clustering and various classifiers like Support Vector Machine, Naïve Bayes are applied to meteorological and agronomic data for the paddy crop. The performance of various classifiers are validated and compared. The result of the work is the accurate prediction of crop yield. The final rules extracted by this work are useful for farmers to make proactive and knowledge-driven decisions before harvest. Key Words: Data mining, Predictive Model, K - Means Clustering, Support Vector Machine, Naïve Bayes classifiers 1. INTRODUCTION Data mining is the process of analysing various hidden patterns of data according to different views for categorization into required information. This data is been collected and gathered from common area, such as agriculture department, for efficient analysis, data mining algorithms, improving business decision making and other information requirementsto ultimately reducethecosts and increase revenue. Data mining technique is intended in extracting the hidden, useful and interesting patterns from raw data. Data mining tools predict future trends and behaviours, allowing businesses to make proactive knowledge based decision. Data mining accompanies the use of complicated statistics, analysis tools to find previously unknown, suitable unseen structure and interaction in the huge dataset. It helps to develop a predictive model that provides a cultivation plan for farmers to get high yield of paddy crops. Descriptive data mining tasks featurize the general properties of the data in the database while predictive data mining is used to predict explicit values based on patterns determined from known results. Prediction also involves usage of some fields or variables in the database to predict unknown or future values of other variables that are of concern. As far as data mining technique are concerned in most of cases, predictive data mining approaches are been used. Predictive data mining technique is used to predict future crop, pesticides, weather forecastingandfertilizersto be used, revenue to be generated and so on. Forecasting crop productivity is one of the scientific techniques of predicting crop yield before harvest. Data mining techniques like clustering and classification are performed in order to maximize the crop yield prediction. A final prediction model is developed and implemented, that protects farmers from agricultural risks by providing a framework that helps them in scientific decision making in agriculture. Using this predictive model, farmers can plan the cultivation process well in advance. To prevent loss, farmers can identify suitable combinations of varying factors like seed quality, rainfall, temperature and sowing procedure. It is a scientific model that provides suitable cultivation plans to farmers in accordance with the changing agronomic factors. Paddy is a pivotal crop in south India. Yield of paddy crop depends on various meteorological and agronomic factors such as seed quality, rainfall, temperature and sowing procedure. In order to evaluate the relationship between these factors and crop yield and to identify the input variables effecting the output of paddy crop, a realtime data set is collected from farmers cultivating paddy isusedinthis research. Raw agricultural data are pre-processed and only the necessary factors are establishedby filtering.Themajordata mining techniques used in this research are K-means clustering and classifiers such as Support Vector Machine, Naïve Bayes. Performances of theabovearecomparedbased on classier accuracy measures. The final knowledge regarding the cultivation plan is discovered, evaluated, and presented. The result of the desire models will help agribusiness associations in equip agriculturists with necessary information as to which factors add to high yield. 2. RELATED WORK This section describes the various works carried out in the relevant fields. Jharna Majumdar et al.[1] proposed a data mining model which is applied on agriculture dataset using different clustering algorithms such as DBSCAN, PAM and CLARA.
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 09 | Sep 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 449 Clustering is considered as an unsupervised classification process. Clustering techniques can be divided into Partitioningclustering,Hierarchicalclustering,Densitybased methods, Grid-based methods and Model based clustering methods. Clustering methods are contrasted using quality metrics. According to the analysis of clustering quality metrics, DBSCAN gives the better clustering quality when compared to PAM and CLARA, CLARA gives the better clustering quality than the PAM. At the end comparison through different factors which includes Root Mean Squared Error(RMSE), Mean Absolute Error (MAE), etc. A predictive model that provides a cultivation plan for farmers to get high yield of paddy crops using data mining techniques is proposed by Anitha Arumugam[2]. Data is collected from farmers cultivating paddy along the Thamirabarani river basin. K-means and various decision tree classifiers are applied. The performance of various classifiers is validated and compared. In this research, K- means clustering is integratedwithdecisiontreeclassifiersin order to improve classification accuracy. Even with worst- case performance using decision stump, integration of k- means clustering has improved the accuracy from 63.5. Shruti Mishra et al. [3] describes the use of data mining in crop yield prediction. Classification is a technique in data mining that assigns item in acollection to targetcategoriesof classes. Different classifiers are used namely J48, LWL, LAD Tree and IBK forprediction andthentheperformanceofeach is compared using WEKA tool. The classifiers are compared with the values of accuracy,rootmeansquarederror(RMSE), mean absolute error (MAE) and relative absolute error (RAE). Lesser the value of error, more accuratethealgorithm will work. Akanksha Verma et al. [4] proposed trim forecastframework utilizing fuzzy bunching strategies and neural system. In the proposed approachinitiallytherawdatasetistakenandthen clustering and classificationwill be performed inordertoget the results in MATLAB. To perform clustering,Fuzzycmeans clustering approach is taken. Fuzzy C Means clustering is a precised learning algorithm that provides less error rate probabilityand arranges thedata in the hierarchicalmanner. The framework encourages ranchers to do right things at opportune time. The model will help the ranchers in expanding their profitability by choosing the proper harvest for their land, soil temperature, humidity and few other conditions. Pooja M C et al. [5] designed a model to forecast crop yield using data mining technique J48of C4.5 algorithm. A modelis built which uses past information like soil type, soil pH, ESP, EC of a particular region to give better crop yield estimation for that region. This model can be used to select the most excellent crops for the region and also its yield there by improving the values and gain of farming also. This aids farmers to decide on the crop they would like to plant for the forthcoming year. Prediction will help the associated industries for planning the logistics of their business. Rupinder Singh and Gurpreeth Singh[6] proposed a model using decision tree algorithm to predict accurate yield of crop. The work done is based on detecting the influence of rainfall and relative humidity on wheat crop yield. Decision tree results indicated that rainfall and relative humidity has more influence over wheat crop yield during vegetative period as compared to reproduction and maturation period. Rules generated from decision tree analysis will help the users to predict theconditionsresponsibleforvariablewheat crop yield under given meteorological parameters. Ramesh Babu Palepu and Rajesh Reddy Muley[7] used data mining for forecasting the future trends of agricultural processes. This paper presents about the role of data mining in perspective of soil analysis in the field of agriculture and also confers about several data mining techniques and their related work by different authors in context to domainofsoil analysis. The data mining techniques are of very up-to-the- minute in the area of soil analysis. In this model fuzzy algorithmareappliedformanagingcrops,K-meansalgorithm used for classify the soils and Support Vector Machine technique applied to predict the crop yield. U. Kumar Dey et al.[8] in their study analyzed crop yield prediction by using Support Vector Machine (SVM), Multiple LinearRegression (MLR), AdaBoostandModifiedNonLinear Regression forthe regions in Bangladesh. Rice is dividedinto three categories: Aman, Aus and Boro. Prediction is done during the aforementioned seasons. The different training techniques were judged based on the Root Means Square Error(RMSE) and Mean Absolute Error(MAE) values. R.Sujatha and Dr P. Isakki[9] proposed the model to predict crop yield using classification techniques. The paper describeshowimprovingagricultureefficientbyprophesying and improves yields by previous agriculture information. It also used to select a best crop by farmer, to plant depending on the weather situation and provides required information to prefer the suitable season to do excellence farming. Monali Paul et al. [10] proposed the model to predict crop yield using two classification techniques i.e., K-Nearest Neighbor and Naive Bayes algorithms. These algorithms are applied to the soil dataset which is taken from the soiltesting laboratory Jabalpur, M.P. There accuracy is obtained by evaluating the datasets. Classification of soil into low, medium and high categories are done by adopting data mining techniques in order to predict the crop yield using available dataset. This study can help the soil analysts and farmers to decide farmers to decide sowing in which land may result in better crop production. Researchers namely Ramesh and Vishnu Vardhan [11] are analysed the agriculture data for the years 1965–2009 in the district East Godavari of Andhra Pradesh, India.Rainfalldata is clustered into 4 clusters by adopting the K means clustering method. Multiple linear regression(MLR)isoneof the data mining technique that is used to model the linear relationship existing between a dependent variable and one
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 09 | Sep 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 450 or more independent variables. The dependent variable is rainfall and independent variables are year, area of sowing, production. Purpose of the work is to obtain many suitable data models that achieve high accuracy in terms of yield prediction capabilities. Geraldin B Dela Cruz et al. [12] focus of this study is to implement an efficient data mining mechanism based on the combination of Principal ComponentAnalysis(PCA)asapre- processing method and a modifiedGeneticAlgorithm(GA)as the learningalgorithm, in order toreducecomputationalcost and time by keeping a number of features as discriminating and small as possible. In doing so, generating agricultural cropsclassification models isefficientandcharacterizationis improved. The Principle Component Analysis-Genetic Algorithm(PCA-GA) data mining mechanism will be implemented for agricultural crops dataset to identify key attribute combinations and characteristics that determine crop performance. Anshu Bharadwaj et al. [13] in their research attempt to extend the boundaries of discretization and to evaluate its effect on other machinelearningtechniquesforclassification, support vector machines with and without discretization of the datasets. On comparing the results obtained by the algorithms which was previously mentioned, it was inferred that Discretization based Support Vector Machine (D-SVM) produced the model with highest accuracy for all the datasets. The results clearly indicate that the accuracies of discretization based SVM are better when compared to that of the classification accuracy without SVM of the same datasets when they were classified without getting discretized. Shruthi Ramdas and RPakkala discussed about how Web mining makes use of data mining classification techniques to automatically discover Web documents andservices, extract information from Web resources, and uncover general patterns on the Web for analyzing visitor activities in network-based systems in [14]. The objective of this study is to classify the users into interested or non interested for particular website based on their access pattern using web server log’s by making use of decision tree classification technique. 3. PROPOSED METHODOLOGY The main objectives of the proposed work are as follows:  To perform classification using differentalgorithms like Support Vector Machine, Naïve Bayes  Predict the crop yield. The agricultural datasetinvolvingattributessuchasseed quality, rainfall, sowing procedure and temperature is collected from different sources. The raw data collected is pre-processed using pre-processing techniques which removes the outliers and irrelevant data. This is followed by data mining techniques such as clustering and classification techniques in order to determine the yield of the crop. The proposed framework is shown below. Fig -1: Schematic representation of Proposed Framework Pre-processing is a technique to improve the quality of data to be presented to the mining process. In the proposed system data pre-processing is dine in 3 major ways: (i) data cleaning (ii) attribute selection (iii) transformation. Data cleaning is a method of replacing incomplete, inconsistent, and noisy data. Some attributes have null data or missing data. To obtain high quality knowledge, cleaning is done by eliminating null values. Attributes that contribute more to mining process are identified in this process. Attribute selection is done by eliminating irrelevant and redundant attributes. The process of converting the data to the form suitable for mining task is called transformation. Attributes with numeric values are converted to categorical attributes. Basically data mining techniques areusedinordertopredict the crop yield. The farmer gives different attributes or parameters as an input, and later by using support vector machine the possible yield of crop is been predictedinterms of low, medium or high. The output of support vector machine is been directed to the Naïve Bayes which gives the accuracy of prediction. Classification is a supervised learningprocessbywhichdata objects are grouped into classes of known labels. Classification algorithms uses classifiers to classify a group of similar objects under one type and when a new object is introduced, prediction is made so as to put that object into one of the class. It involves two phases: learning phase and classification phase. In the learning phase, training data are analyzed and a classifier model is built. In the classification phase, the test data are used to estimate the accuracy of classification.
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 09 | Sep 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 451 The support vector machines(SVM) are supervised learning models with associated learning algorithm that examines the data that have been used for the purpose of classification and clustering analysis. A set of training examples are given which includes the dataset, eachmarked as belonging to one or the other of two categories, an SVM training algorithm builds a model thatassignsnewexamples to one category or the other, making it a non-probabilistic binary linear classifiers. An SVM model isa representationof the given examples as points in space, mapped in order that the examples of the separate categories are divided by a clear gap that is as wide enough. New examples are then mapped and adapted in such a way that the same space and predicted value to belong to a category based on which side of the gap they will appear. Naive Bayes classifiers are a community of simple probabilistic classifiers in data mining that are based on applying Bayes theorem with intensive and strong independence considerations that are existing between their characteristics. Naive Bayes classifiers are highly scalable, highly changeable, requiring a number of parameters that are linear in the number of variables (features/predictors) defined in a learning problem. Maximum likelihood training can be done by evaluating a closed form expression, whichtakeslineartime, rather than by any other cost inefficient iterative approximation that are used for other types of classifiers. Psuedo Code: 1. Preprocess the raw data and prepare it for classification 2. Input the preprocessed data for Support Vector Machine if the dataset is separable then create a hyperplane using equation ax+b=0 else then create a hyperplane using equation y(ax+b)>=1 3. Use the classes formed from hyperplane for further classification 4. Input the data for classification 5. Classify it with SVM into different classes 6. Predict the output 4. RESULTS From the raw data collected from various agricultural departments, a pre-processing operation is carried out. The classification method identifies suitable classes such as low depicted as 0, medium depicted as 1 and high depicted as 2. Fig -2: Classification of Data based on Temperature and Seed Quality Fig -3: Clustering of Data based on Rainfall and Seed Quality Fig -4: Classification of Data based on Temperature and Rainfall The attribute which affects the crop yield are taken on different axes on the graph and their respective crop yield result is shown in Figure 2, 3 and 4. Figure 2 depicts the graph which is plotted by taking seed quality along the horizontal plane and temperature along the vertical plane. Figure 3 depicts the graph which is plotted by taking seed quality along the horizontal plane and rainfall along the vertical plane. Figure 4 depicts the graph which is plotted by taking rainfall along the horizontal plane and temperature along the vertical plane.
  • 5. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 09 | Sep 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 452 5. CONCLUSION The major outcome of this work is the prediction of the rice crop yield based on the input given by the farmers. There is also an accurate extraction of hidden knowledge about a cultivation plan involving major agronomic and meteorological factors. This knowledge thus obtained is useful for getting high yield of rice crop. In the project, since there is the usage of both clustering as well as classification the accuracy attained is high. This accuracy is been measured by using Naïve Bayes algorithm. The result shows that different kinds of estimations that can be done on a crop. REFERENCES [1] Jharna Majumdar, Sneha Naraseeyappa and Shilpa Ankalaki, “Analysis of Agriculture Data using Data Mining Techniques: Application of Big Data”, Journal of Big Data, Springer, 2017. [2] Anitha Arumugam, ”A PredictiveModelingApproachfor Improving Paddy Crop Productivity using Data Mining Techniques“, Turkish Journal of Electrical Engineering Computer Sciences 2017. [3] Shruti Mishra, Priyanka Paygude,Snehal Chaudharyand Sonali Idate, “Use of Data Mining in Crop Yield Prediction”, IEEE International ConferenceonInventive Systems and Control (ICISC 2018). [4] Akanksha Verma, Aman Jatain and Shalini Bajaj, “Crop Yield Prediction of Wheat using Fuzzy C Means Clustering and Neural Network”,International Journal of Applied Engineering Research 2017. [5] Pooja M C, Sangeetha M, Shreyaswi J Salian, Veena Kamath and Mithun Naik,“ImplementationofCrop Yield Forecasting using Data Mining”, International Research Journal of Engineering and Technology 2017. [6] Rupinder Singh and Gurpreeth Singh,“WheatCrop Yield Assessment using Decision Tree Algorithms”, International Journal ofAdvancedResearchinComputer Science 2017. [7] Ramesh Babu Palepu and Rajesh Reddy Muley, “An Analysis of Agricultural Soils by using Data Mining Techniques”, International Journal of Engineering Science and Computing 2017. [8] Umid Kumar Dey, Abdulla Hassan Masud and Mohammed Nazim Uddin, “Rice Yield Prediction Model using Data Mining”, International Conference on Electrical, Computer and Communication Engineering (ECCE) 2017. [9] R.Sujatha and Dr.P.Isakki, “A Study on Crop Yield Forecasting using Classification Techniques”, IEEE International Conference on Computing Technologies and Intelligent Data Engineering (ICCTIDE'16), 2016. [10] Monali Paul, Santosh K. Vishwakarma andAshok Verma, “Analysis of Soil Behaviour and Prediction of Crop Yield Using Data Mining Approach”, IEEE International Conference on Computational Intelligence and Communication Networks (CICN), 2015. [11] D Ramesh and B Vishnu Vardhan,“Analysisof CropYield Prediction using Data Mining Techniques”,International Journal of Research in Engineering and Technology 2015. [12] Geraldin B Dela Cruz, Bobby D Gerardo, Bartolome T, “Agricultural Crops Classification ModelsBasedonPCA- GA Implementation in Data Mining”, International Journal of Modeling and Optimization 2014. [13] Anshu Bharadwaj, Shashi Dahiya and Shashi Dahiya, ”Discretization based Support Vector Machine(D-SVM) for Classification of Agricultural Datasets”,International Journal of Computer Applications 2014. [14] Shruthi Ramdas, Rithesh Pakkala P., Akhila Thejaswi R, “Determination and Classification of InterestingVisitors of Websites using Web Logs”, International Journal of Computer Science and Mobile Computing, Vol.5,Issue.1, January-2016, pg.01-09.