SlideShare a Scribd company logo
3
Most read
4
Most read
9
Most read
Page | 1
PROJECT REPORT
STUDENT PERFORMANCE
(DATAMINING)
BS (SE)2017
GROUP MEMBER(S):
NAME: HAFSAHABIB 2017/COMP/BS(SE)-21597
NAME: MUNIBAJAVIAD 2017/COMP/BS(SE)-21621
SUPERVISOR:
MISS SADIA JAVED
29TH APRIL, 2019
DEPARTMENT OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY
JINNAH UNIVERSITY FOR WOMEN
5-C NAZIMABAD, KARACHI 74600
Page | 2
Table of Contents
1. Introduction.................................................................................................................................3
2. Description of the problem and problem domain....................................................................3
3. Description of implemented data mining techniques/methods...............................................3
3.1. Naïve Bayes Classifier..........................................................................................................3
4. Data Set........................................................................................................................................3
4.1. Exploring the Data Set.........................................................................................................4
4.1.1. General Distribution of Exam Scores ..........................................................................4
4.1.2. Exam scores based on the gender.................................................................................5
4.1.3. Exam scores based on the Parent Level of Education................................................6
4.1.4. Exam scores based on the Lunch Type........................................................................7
4.1.5. Exam scores based on theTest Prepration Course .....................................................7
5. Implementation ...........................................................................................................................8
5.1. Operators ..............................................................................................................................8
6. Results and evaluation/discussion of the results ......................................................................9
7. Future directions/ideas how to extend and enhance the technique......................................10
8. Conclusion .................................................................................................................................10
9. References..................................................................................................................................10
Page | 3
1. Introduction
Using the Students Performance in Exams Dataset we will try to understand what affects the
exam scores. The data is limited, but it will present a good visualization to spot the relations. First
of all, we explore our data and after that we apply Naive Bayes Classification technique for
evaluation purpose.
2. Description of the problem and problem domain
To understand the influence of the parent’s background, test preparation etc. on students’
performance.
Objectives
 Check the dataset and tidying the data if needed.
 Visualize the data to understand the effects of different factors on a student performance.
 Check the effectiveness of test preparation course.
 Check what are the major factors influencing the test scores.
3. Description of implemented data mining techniques/methods
3.1. Naïve Bayes Classifier
Bayesian classifiers are statistical classifiers that predict class membership by probabilities, such
as the probability that a given sample belongs to a particular class. Naive Bayes algorithms
assume that the effect that an attribute plays on a given class is independent of the values of other
attributes. However, in practice, dependencies often exist among attributes; hence Bayesian
networks are graphical models, which can describe joint conditional probability distributions.
Bayesian classifiers are popular classification algorithms due to their simplicity, computational
efficiency and very good performance for real-world problems. Another important advantage is
also that the Bayesian models are fast to train and to evaluate, and have a high accuracy in many
domains.
4. Data Set
 Gender: Gender of the student (i.e. Male, Female)
 Ethnicity: Ethnicity to which the student belongs (i.e. group A, B, C, D, E)
 Parent level of Education: Education level of the parents/guardian of the student (i.e.
high school, bachelor’s degree, master’s degree, some college, associate’s degree)
 Lunch: Standard of the lunch provided to the student in school (i.e. standard,
free/reduced)
 Test preparation course: Whether the student took the preparation course (i.e. none,
completed)
 Math score: Mathematics score of the student (from 0 to 100)
 Reading score: Reading score of the student (from 0 to 100)
 Writing score: Writing score of the student (from 0 to 100)
 Student Performance: Overall performance of the student (i.e. Good, Average, Bad,
Worst)
Page | 4
4.1. Exploring the Data Set
Firstly, We Import the dataset repository and display first few rows of the dataset.
4.1.1. General Distribution of Exam Scores
There are 5 features which might affect the scores of each exam. First thing to analyses would be
to see how the scores are distributed within each exam (Math’s, Reading, and Writing). We will
plot histograms to see if there any differences in the scores' distribution.
Page | 5
The scores are distributed in the Gaussian manner. It is hard to draw any conclusion from the
graphs above: they all look very similar and we don't have enough data for the plots to look more
smoothly.
4.1.2. Exam scores based on the gender
Graphical representation of the exam scores’ based on the gender (i.e. Male, Female).
Page | 6
4.1.3. Exam scores based on the Parent Level of Education
Displaying the mean values as a table or a heat map.
Indeed, it seems that a lower parental level of education has a negative impact on the exam scores.
A child of parents who’s the highest education level was college or high school has noticeably
lower exam scores than their peers. Similarly, parents with master's or bachelor's degree have
children who scores much better in the exams.
Page | 7
4.1.4. Exam scores based on the Lunch Type
It might be amusing to think that type of lunch students have is correlated to their exam scores.
On the other hand, we can see from the dataset that there are two types of
lunch: standard and free/reduced. So it depends on the parents' financial situation rather than on
the type of the dish. There might be some correlation be here, so let's try to visualize the problem.
According to above visualization, there is a huge disproportion between students who have
a free/reduced lunch when compared to those having standard lunch.
4.1.5. Exam scores based on theTest Prepration Course
The last thing we explore in this dataset is to determine how the completion of the test preparation
course affects the exam scores by using heat map. There are only two categorical
variables: none and completed.
Page | 8
5. Implementation
This dataset is clean and free of unwanted data. We don’t have to go through the processes of
cleaning the data. In our data set Student Performance, we apply Naïve Bayes classification
technique. Naïve Bayes classifier is a famous approach for supervised learning. It mainly
classifies a test data provided with the fact that training data is used to train up the model. There
exist 8 features and 1 label named as Student performance.
Student Performance is the class label which needs to be predicted. As the testing data is not
separately provided thereby, we will split this dataset for training and testing respectively. We are
using the ratio of 70:30 for training and testing respectively.
We then train Naïve Bayes model using 70% of the dataset and then classify the rest 30% of the
data. After that we Measure performance parameters i.e. accuracy, precision and recall to show
how much accurate the model has been for the dataset.
5.1. Operators
The details of the operators that are used for the creation of the process are as follows:
 Retrieve
This Operator can access stored information in the Repository and load them into the Process.
 Set Role
This Operator is used to change the role of one or more Attributes.
 Split data
This operator produces the desired number of subsets of the given Data Set.
 Naïve Bayes
This Operator generates a Naive Bayes classification model.
 Apply Model
This Operator applies a model on the given Data Set.
 Performance
This operator is used for performance evaluation. It delivers a list of performance criteria
values. These performance criteria are automatically determined in order to fit the learning
task type.
Page | 9
6. Results and evaluation/discussion of the results
Confusion Matrix
Here, the result of the process of data set “Student Performance” is shown below in the form of
confusion matrix. This table shows the accuracy, class precision and class recall.
The following criteria are added for binominal classification tasks:
 Accuracy
 Precision
 Recall
Accuracy is calculated by taking the percentage of correct predictions over the total number of
examples. Correct prediction means examples where the value of the prediction attribute is equal
to the value of the label attribute.
Here, the Accuracy of the Student Performance data set is 92.64%
Page | 10
7. Future directions/ideas how to extend and enhance the technique
By using the process or model we can predict more about the student performances and theirs
factors involves with them.
In future, this can be implemented in any university by using this process we can calculate the
GPA of the student in advance by just knowing their previous GPA.
In schools, we can calculate the performance of the worst student so that by knowing the name of
those students, teacher may focus more on such type of students.
8. Conclusion
We have already seen the insights of the Data, the summary is written below:
 135 students failed in mathematics, 90 students failed in reading examination, 114
students failed in writing examination and overall 103 students failed the examination.
 Reading score and Writing score are positively linearly correlated with correlation
coefficient 0.95(approx.).
 Students who belongs to group D in ethnicity performed very well.
 Test Preparation Course is very effective. We saw that the students who had completed
their test preparation course failed less in number.
 Students who take standard lunch performed very well than others.
 In case of parental education level, the parents with master's or bachelor's degree have
children who scores much better in the exams.
 The Accuracy of the Student Performance data set is 92.64% calculated by the naïve
Bayes classifier process.
9. References
https://www.kaggle.com/spscientist/students-performance-in-exams#StudentsPerformance.csv

More Related Content

What's hot (20)

PPT
Data Warehouse Architectures
Theju Paul
 
PPT
A N S I S P A R C Architecture
Sabeeh Ahmed
 
PPT
cloud computing.ppt
MunmunSaha7
 
PDF
Federated Cloud Computing - The OpenNebula Experience v1.0s
Ignacio M. Llorente
 
DOC
Naming in Distributed System
MNM Jain Engineering College
 
PPT
Lecture 01 introduction to database
emailharmeet
 
PPTX
Students academic performance using clustering technique
saniacorreya
 
PPTX
Relational database
SanthiNivas
 
PPTX
Big data
Pooja Shah
 
PPT
Cloud computing and service models
Prateek Soni
 
PDF
Classification and regression trees (cart)
Learnbay Datascience
 
PPT
Virtualization in cloud computing ppt
Mehul Patel
 
PDF
Data Science - Part III - EDA & Model Selection
Derek Kane
 
PDF
Big Data: Its Characteristics And Architecture Capabilities
Ashraf Uddin
 
PPTX
Implementation levels of virtualization
Gokulnath S
 
PDF
Collaborating Using Cloud Services
Dr. Sunil Kr. Pandey
 
PPTX
Cloud computing in healthcare
Mithisar Basumatary
 
PPT
Cloud architecture
Adeel Javaid
 
PPTX
Pattern recognition UNIT 5
Dr. SURBHI SAROHA
 
PDF
UNIT 1 -BIG DATA ANALYTICS Full.pdf
vvpadhu
 
Data Warehouse Architectures
Theju Paul
 
A N S I S P A R C Architecture
Sabeeh Ahmed
 
cloud computing.ppt
MunmunSaha7
 
Federated Cloud Computing - The OpenNebula Experience v1.0s
Ignacio M. Llorente
 
Naming in Distributed System
MNM Jain Engineering College
 
Lecture 01 introduction to database
emailharmeet
 
Students academic performance using clustering technique
saniacorreya
 
Relational database
SanthiNivas
 
Big data
Pooja Shah
 
Cloud computing and service models
Prateek Soni
 
Classification and regression trees (cart)
Learnbay Datascience
 
Virtualization in cloud computing ppt
Mehul Patel
 
Data Science - Part III - EDA & Model Selection
Derek Kane
 
Big Data: Its Characteristics And Architecture Capabilities
Ashraf Uddin
 
Implementation levels of virtualization
Gokulnath S
 
Collaborating Using Cloud Services
Dr. Sunil Kr. Pandey
 
Cloud computing in healthcare
Mithisar Basumatary
 
Cloud architecture
Adeel Javaid
 
Pattern recognition UNIT 5
Dr. SURBHI SAROHA
 
UNIT 1 -BIG DATA ANALYTICS Full.pdf
vvpadhu
 

Similar to Student Performance Data Mining Project Report (20)

PDF
IRJET- Using Data Mining to Predict Students Performance
IRJET Journal
 
PDF
M-Learners Performance Using Intelligence and Adaptive E-Learning Classify th...
IRJET Journal
 
PDF
Multi Criteria Decision Making Methodology on Selection of a Student for All ...
ijtsrd
 
PPT
CMSI Math Benchmark Assessments MC Analysis
SendhilRevuluri
 
PDF
ADMINISTRATION SCORING AND REPORTING.pdf
OM VERMA
 
PPTX
GRADING-AND-REPORTING-SYSTEMS REPORT.pptx
kathtolentino55
 
PPT
educ331 Linear Regression for Baseball
boernerj
 
PDF
IMPROVING FAIRNESS ON STUDENTS’ OVERALL MARKS VIA DYNAMIC RESELECTION OF ASSE...
IJITE
 
PDF
Improving Fairness on Students' Overall Marks via Dynamic Reselection of Asse...
IJITE
 
PDF
IMPROVING FAIRNESS ON STUDENTS’ OVERALL MARKS VIA DYNAMIC RESELECTION OF ASSE...
IJITE
 
PDF
Improving Fairness on Students' Overall Marks via Dynamic Reselection of Asse...
IJITE
 
PDF
B05110409
IOSR-JEN
 
PDF
I-ready Research
Jacqueline Smith
 
PDF
C0364010013
inventionjournals
 
PDF
IRJET - A Study on Student Career Prediction
IRJET Journal
 
PDF
Exams evaluate students. Who’s evaluating exams? Data-Informed Exam Design
G. Alex Ambrose
 
PDF
IRJET-Student Performance Prediction for Education Loan System
IRJET Journal
 
PPTX
Primer_NATG12.pptx
ChristyJoyRetanal
 
PDF
Automating the Assessment of Learning Outcomes.pdf
Charlie Congdon
 
DOCX
Modes of Learning Enhancing Student Outcomes at DDM Business School
lamluanvan.net Viết thuê luận văn
 
IRJET- Using Data Mining to Predict Students Performance
IRJET Journal
 
M-Learners Performance Using Intelligence and Adaptive E-Learning Classify th...
IRJET Journal
 
Multi Criteria Decision Making Methodology on Selection of a Student for All ...
ijtsrd
 
CMSI Math Benchmark Assessments MC Analysis
SendhilRevuluri
 
ADMINISTRATION SCORING AND REPORTING.pdf
OM VERMA
 
GRADING-AND-REPORTING-SYSTEMS REPORT.pptx
kathtolentino55
 
educ331 Linear Regression for Baseball
boernerj
 
IMPROVING FAIRNESS ON STUDENTS’ OVERALL MARKS VIA DYNAMIC RESELECTION OF ASSE...
IJITE
 
Improving Fairness on Students' Overall Marks via Dynamic Reselection of Asse...
IJITE
 
IMPROVING FAIRNESS ON STUDENTS’ OVERALL MARKS VIA DYNAMIC RESELECTION OF ASSE...
IJITE
 
Improving Fairness on Students' Overall Marks via Dynamic Reselection of Asse...
IJITE
 
B05110409
IOSR-JEN
 
I-ready Research
Jacqueline Smith
 
C0364010013
inventionjournals
 
IRJET - A Study on Student Career Prediction
IRJET Journal
 
Exams evaluate students. Who’s evaluating exams? Data-Informed Exam Design
G. Alex Ambrose
 
IRJET-Student Performance Prediction for Education Loan System
IRJET Journal
 
Primer_NATG12.pptx
ChristyJoyRetanal
 
Automating the Assessment of Learning Outcomes.pdf
Charlie Congdon
 
Modes of Learning Enhancing Student Outcomes at DDM Business School
lamluanvan.net Viết thuê luận văn
 
Ad

Recently uploaded (20)

PDF
Technical-Careers-Roadmap-in-Software-Market.pdf
Hussein Ali
 
PPTX
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
PPTX
Customise Your Correlation Table in IBM SPSS Statistics.pptx
Version 1 Analytics
 
PDF
4K Video Downloader Plus Pro Crack for MacOS New Download 2025
bashirkhan333g
 
PDF
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
PPTX
Agentic Automation Journey Series Day 2 – Prompt Engineering for UiPath Agents
klpathrudu
 
PDF
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
PDF
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
PDF
Driver Easy Pro 6.1.1 Crack Licensce key 2025 FREE
utfefguu
 
PPTX
Finding Your License Details in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PPTX
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
PPTX
Milwaukee Marketo User Group - Summer Road Trip: Mapping and Personalizing Yo...
bbedford2
 
PPTX
Help for Correlations in IBM SPSS Statistics.pptx
Version 1 Analytics
 
PPTX
Coefficient of Variance in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
PPTX
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PPTX
ChiSquare Procedure in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
PDF
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
PDF
Open Chain Q2 Steering Committee Meeting - 2025-06-25
Shane Coughlan
 
Technical-Careers-Roadmap-in-Software-Market.pdf
Hussein Ali
 
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
Customise Your Correlation Table in IBM SPSS Statistics.pptx
Version 1 Analytics
 
4K Video Downloader Plus Pro Crack for MacOS New Download 2025
bashirkhan333g
 
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
Agentic Automation Journey Series Day 2 – Prompt Engineering for UiPath Agents
klpathrudu
 
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
Driver Easy Pro 6.1.1 Crack Licensce key 2025 FREE
utfefguu
 
Finding Your License Details in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
Milwaukee Marketo User Group - Summer Road Trip: Mapping and Personalizing Yo...
bbedford2
 
Help for Correlations in IBM SPSS Statistics.pptx
Version 1 Analytics
 
Coefficient of Variance in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
ChiSquare Procedure in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
Open Chain Q2 Steering Committee Meeting - 2025-06-25
Shane Coughlan
 
Ad

Student Performance Data Mining Project Report

  • 1. Page | 1 PROJECT REPORT STUDENT PERFORMANCE (DATAMINING) BS (SE)2017 GROUP MEMBER(S): NAME: HAFSAHABIB 2017/COMP/BS(SE)-21597 NAME: MUNIBAJAVIAD 2017/COMP/BS(SE)-21621 SUPERVISOR: MISS SADIA JAVED 29TH APRIL, 2019 DEPARTMENT OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY JINNAH UNIVERSITY FOR WOMEN 5-C NAZIMABAD, KARACHI 74600
  • 2. Page | 2 Table of Contents 1. Introduction.................................................................................................................................3 2. Description of the problem and problem domain....................................................................3 3. Description of implemented data mining techniques/methods...............................................3 3.1. Naïve Bayes Classifier..........................................................................................................3 4. Data Set........................................................................................................................................3 4.1. Exploring the Data Set.........................................................................................................4 4.1.1. General Distribution of Exam Scores ..........................................................................4 4.1.2. Exam scores based on the gender.................................................................................5 4.1.3. Exam scores based on the Parent Level of Education................................................6 4.1.4. Exam scores based on the Lunch Type........................................................................7 4.1.5. Exam scores based on theTest Prepration Course .....................................................7 5. Implementation ...........................................................................................................................8 5.1. Operators ..............................................................................................................................8 6. Results and evaluation/discussion of the results ......................................................................9 7. Future directions/ideas how to extend and enhance the technique......................................10 8. Conclusion .................................................................................................................................10 9. References..................................................................................................................................10
  • 3. Page | 3 1. Introduction Using the Students Performance in Exams Dataset we will try to understand what affects the exam scores. The data is limited, but it will present a good visualization to spot the relations. First of all, we explore our data and after that we apply Naive Bayes Classification technique for evaluation purpose. 2. Description of the problem and problem domain To understand the influence of the parent’s background, test preparation etc. on students’ performance. Objectives  Check the dataset and tidying the data if needed.  Visualize the data to understand the effects of different factors on a student performance.  Check the effectiveness of test preparation course.  Check what are the major factors influencing the test scores. 3. Description of implemented data mining techniques/methods 3.1. Naïve Bayes Classifier Bayesian classifiers are statistical classifiers that predict class membership by probabilities, such as the probability that a given sample belongs to a particular class. Naive Bayes algorithms assume that the effect that an attribute plays on a given class is independent of the values of other attributes. However, in practice, dependencies often exist among attributes; hence Bayesian networks are graphical models, which can describe joint conditional probability distributions. Bayesian classifiers are popular classification algorithms due to their simplicity, computational efficiency and very good performance for real-world problems. Another important advantage is also that the Bayesian models are fast to train and to evaluate, and have a high accuracy in many domains. 4. Data Set  Gender: Gender of the student (i.e. Male, Female)  Ethnicity: Ethnicity to which the student belongs (i.e. group A, B, C, D, E)  Parent level of Education: Education level of the parents/guardian of the student (i.e. high school, bachelor’s degree, master’s degree, some college, associate’s degree)  Lunch: Standard of the lunch provided to the student in school (i.e. standard, free/reduced)  Test preparation course: Whether the student took the preparation course (i.e. none, completed)  Math score: Mathematics score of the student (from 0 to 100)  Reading score: Reading score of the student (from 0 to 100)  Writing score: Writing score of the student (from 0 to 100)  Student Performance: Overall performance of the student (i.e. Good, Average, Bad, Worst)
  • 4. Page | 4 4.1. Exploring the Data Set Firstly, We Import the dataset repository and display first few rows of the dataset. 4.1.1. General Distribution of Exam Scores There are 5 features which might affect the scores of each exam. First thing to analyses would be to see how the scores are distributed within each exam (Math’s, Reading, and Writing). We will plot histograms to see if there any differences in the scores' distribution.
  • 5. Page | 5 The scores are distributed in the Gaussian manner. It is hard to draw any conclusion from the graphs above: they all look very similar and we don't have enough data for the plots to look more smoothly. 4.1.2. Exam scores based on the gender Graphical representation of the exam scores’ based on the gender (i.e. Male, Female).
  • 6. Page | 6 4.1.3. Exam scores based on the Parent Level of Education Displaying the mean values as a table or a heat map. Indeed, it seems that a lower parental level of education has a negative impact on the exam scores. A child of parents who’s the highest education level was college or high school has noticeably lower exam scores than their peers. Similarly, parents with master's or bachelor's degree have children who scores much better in the exams.
  • 7. Page | 7 4.1.4. Exam scores based on the Lunch Type It might be amusing to think that type of lunch students have is correlated to their exam scores. On the other hand, we can see from the dataset that there are two types of lunch: standard and free/reduced. So it depends on the parents' financial situation rather than on the type of the dish. There might be some correlation be here, so let's try to visualize the problem. According to above visualization, there is a huge disproportion between students who have a free/reduced lunch when compared to those having standard lunch. 4.1.5. Exam scores based on theTest Prepration Course The last thing we explore in this dataset is to determine how the completion of the test preparation course affects the exam scores by using heat map. There are only two categorical variables: none and completed.
  • 8. Page | 8 5. Implementation This dataset is clean and free of unwanted data. We don’t have to go through the processes of cleaning the data. In our data set Student Performance, we apply Naïve Bayes classification technique. Naïve Bayes classifier is a famous approach for supervised learning. It mainly classifies a test data provided with the fact that training data is used to train up the model. There exist 8 features and 1 label named as Student performance. Student Performance is the class label which needs to be predicted. As the testing data is not separately provided thereby, we will split this dataset for training and testing respectively. We are using the ratio of 70:30 for training and testing respectively. We then train Naïve Bayes model using 70% of the dataset and then classify the rest 30% of the data. After that we Measure performance parameters i.e. accuracy, precision and recall to show how much accurate the model has been for the dataset. 5.1. Operators The details of the operators that are used for the creation of the process are as follows:  Retrieve This Operator can access stored information in the Repository and load them into the Process.  Set Role This Operator is used to change the role of one or more Attributes.  Split data This operator produces the desired number of subsets of the given Data Set.  Naïve Bayes This Operator generates a Naive Bayes classification model.  Apply Model This Operator applies a model on the given Data Set.  Performance This operator is used for performance evaluation. It delivers a list of performance criteria values. These performance criteria are automatically determined in order to fit the learning task type.
  • 9. Page | 9 6. Results and evaluation/discussion of the results Confusion Matrix Here, the result of the process of data set “Student Performance” is shown below in the form of confusion matrix. This table shows the accuracy, class precision and class recall. The following criteria are added for binominal classification tasks:  Accuracy  Precision  Recall Accuracy is calculated by taking the percentage of correct predictions over the total number of examples. Correct prediction means examples where the value of the prediction attribute is equal to the value of the label attribute. Here, the Accuracy of the Student Performance data set is 92.64%
  • 10. Page | 10 7. Future directions/ideas how to extend and enhance the technique By using the process or model we can predict more about the student performances and theirs factors involves with them. In future, this can be implemented in any university by using this process we can calculate the GPA of the student in advance by just knowing their previous GPA. In schools, we can calculate the performance of the worst student so that by knowing the name of those students, teacher may focus more on such type of students. 8. Conclusion We have already seen the insights of the Data, the summary is written below:  135 students failed in mathematics, 90 students failed in reading examination, 114 students failed in writing examination and overall 103 students failed the examination.  Reading score and Writing score are positively linearly correlated with correlation coefficient 0.95(approx.).  Students who belongs to group D in ethnicity performed very well.  Test Preparation Course is very effective. We saw that the students who had completed their test preparation course failed less in number.  Students who take standard lunch performed very well than others.  In case of parental education level, the parents with master's or bachelor's degree have children who scores much better in the exams.  The Accuracy of the Student Performance data set is 92.64% calculated by the naïve Bayes classifier process. 9. References https://www.kaggle.com/spscientist/students-performance-in-exams#StudentsPerformance.csv