SlideShare a Scribd company logo
HOME WORK TITLE-DESIGN PROBLEM



COURSE CODE-CAP 617T



COURSE INSTRUCTOR-Lect.Neha Malhotra Mam



COURSE TUTOR-DO



ALLOCATION DATE-01/09/12



SUBMITION DATE-01/11/12



STUDENT ROLL NO. – A19



SECTION-D1R17



DECLARATION,

I declare that this design problem is individual work. I have not copied from any
student work or from any other source except where due acknowledgement is
made explicitly in the text, nor has any part been written from me by another
person.

EVALUATOR COMMENT………                          STUDENT SIGN

MARKS OBTAINED……….                            PRIYA RANJAN
Q.1What kind of data mining algorithms are used for mining the database
and justify the importance of the tool.

Ans:Kind of data mining algorithms are used for mining the database and we
justified the importance of the tool as described below:-

When student mistakes are recorded, association rules algorithms can be used to
find mistakes often associated together. Combined with a genetic algorithm,
concepts mastered together can be identified using student scores. The teacher may
use these findings to reflect on his/her teaching and re-design the course material.
It also comprises data exploration and visualization to present results in a
convenient way to users. New variables may be calculated and used in algorithms,
such as the average number of mistakes made per attempted exercise.

Tools: We used a range of tools. Initially we worked with Excel and Access to
perform simple SQL queries and visualization.

Data exploration and visualization: Raw data and algorithm results can be
visualized
Through tables and graphics such as graphs and histograms as well as through
more specific
Techniques such as symbolic data analysis. The aim is to display data along certain
attributes and make extreme points, trends and clusters obvious to human eye.

Clustering algorithms aim at finding homogeneous groups in data. The task of
identifying groups of records that are similar between themselves but different
from the reset of the data. Often the variables providing the best clustering should
be identified as well. We used k-means clustering and its combination with
hierarchic clustering. Both methods rest on a distance concept between individuals.
We used Euclidian distance.

Classification is used to predict values for some variable. The task of finding a
function that maps records into one of several discrete classes.
For example, given all the work done by a student, one may want to predict
whether the student will perform well in the final exam. We used C4.5 decision
tree from TADA-Ed which relies on the concept of entropy. The tree can be
represented by a set of rules such as: if x=v1 and y> v2 then t= v3.Thus, depending
on the values an individual takes for, say the variables x and y, one can predict its
value for t. The tree is built taking a representative population and is used to
Predict values for new individuals.
Association rules find relations between items. Rules have the following form: X
->Y, support 40%, confidence 66%, which could mean 'if students get X
incorrectly, then they get also Y incorrectly', with a support of 40% and a
confidence of 66%. Support is the frequency in
the population of individuals that contains both X and Y. Confidence is the
percentage of the
instances that contains Y amongst those which contain X. We implemented a
variant of the
standard Apriori algorithm in TADA-Ed that takes temporality into account.
Taking
temporality into account produces a rule X->Y only if exercise X occurred before
Y.

Q.2What preprocessing steps are lcarried out for effective mining and
comment on the same.
Ans:Tada-ed provides a pre-processing facility steps are lcarried out for
effective mining as described below:-


Tada-ed provides a pre-processing facility which allows making the data minable.
We need to specify two aspects:
(1) What element we want to cluster or classify: students, exercises, mistakes?

(2) Which attributes and distance do we want to retain to compare these elements?



Example:-An example could be to cluster students, using the number of mistakes
they made and the number of correct steps they entered.
The data was maintained in different tables was joined in a single table. After we
integrated the data into one files, to increase interpretation and comprehensibility;
we discretized the attributes to categorical ones. For examples, we grouped all
grades into five groups’ excellent, very good, good, average, and poor. In this step
the fields used in the study were determined and transformed if necessary. By
using normal distribution method, we categorized the value of each item in
questionnaire with High, Medium and Low

Data Preparation:-
Q.3In which way the Association rule mining algorithms and clustering
algorithms are useful during the analysis and interpretation?

Ans:-Association Rules:
Spatial and non-spatial association rules are in the form of X-->Y(c%), where X
and Y are sets of spatial or non-spatial predicates and c% is the confidence of the
rule. For examples:- Is_a (X, origin of CSU student) --> close to (X,railway
stations) (70%), this rule states that 60% of CSU students originally live close to
highway.- Is_a (X, CSU student) --> from (X, middle education level areas),
(80%), this rule states that 80% of CSU students are from the area which have
middle level of education.For a large database where there are a large set of objects
and attributes, there may exist a large number of associations between them
(Koperski, 1998). Some rules may only apply to a small number of objects, for
example, less than 5% of CSU's Park Recreation and Heritage students are
associated with a disability code, therefore it may not be of interest for the further
study.While 75% of students live within 10 km of railway stations, therefore it
attracted further study. A minimum support threshold needs to be specified along
with minimum confidence threshold to filter out the uninterested associations.
We used association rules to find mistakes often occurring together while solving
exercises.
The purpose of looking for these associations is for the teacher to ponder and, may
be, to
review the course material or emphasize subtleties while explaining concepts to
students.
Thus, it makes sense to have a support that is not too low.

Association rules for Year 2004.

M11 ==> M12 [sup: 77%, conf: 89%]
M12 ==> M11 [sup: 77%, conf: 87%]
M11 ==> M10 [sup: 74%, conf: 86%]
M10 ==> M12 [sup: 78%, conf: 93%]
M12 ==> M10 [sup: 78%, conf: 89%]
M10 ==> M12 [sup: 74%, conf: 88%]
M10: Premise set incorrect
M11: Rule can be applied, but deduction incorrect
M12: Wrong number of line reference given
The first association rule says that if students make mistake Rule can be applied,
but deduction incorrect while solving an exercise, then they also made the mistake
Wrong number of line references given while solving the same exercise.

Clustering
Clustering provides grouping together of similar data items. This technique
provides a high level view of the database. Clustering technique is a technique that
merges and combines techniques from different disciplines such as mathematics,
physics, math-programming, statistics, computer sciences, artificial intelligence
and databases etc. Variety of clustering algorithms exists and belongs to several
different categories. This technique helps to create new groups and classes based
on the study of patterns and relationships between values of data in a data bank. In
our case study we are considering students data which exists in three separate
clusters named as student academic record, student residence record, student
personal record etc.These clusters are again consisting of several its own attributes.
By applying this technique we will easily categorize the record of the student in
detail manner. For example:- CGPA Grade shows the total marks of the student
academic record cluster. On the basis of marks we can easily identify the actual
performance or the level of the student. This can be shown in below figure.
Different cluster numbers were tried, and successful partitioning was achieved with
5 clusters. In our case study, the cluster graph as a picture of students group
according to their performance on figure 3 gives. For graphs, the Rapidminer
software was used. The graphs are given in figure 4 is deviation plot five clusters
of students. Using these results we can divide students into five groups and guide
them according to their behavior.




 We performed clustering using this subpopulation, both using
(i) k-means in TADA-Ed, and
(ii) a combination of k-means and hierarchical clustering of Clementine. Because
there is neither a fixed number nor a fixed set of exercises to compare students,
determining a distance between individuals was not obvious. We calculated and
used a new variable: the total number of mistakes made per student in an exercise.
As a result, students with similar frequency of mistakes were put in the same
group. Histograms showing the different clusters revealed interesting patterns.
Consider the histogram shown in Figure 1 obtained with TADA-Ed. There are
three clusters: 0 (red, on the left), 1 (green, in the middle) and 4 (purple, on the
right). From other windows (not shown) we know that students in cluster 0 made
many mistakes per exercise not finished, students in cluster 1 made few mistakes
and students in cluster 4 made an intermediate number of mistakes. Students
making many mistakes use also many different logic rules while solving exercises,
this is shown with the vertical, almost solid lines.




Histogram showing, for each cluster of students, the rules incorrectly used per
student

Q.4How classification rule mining algorithms was helpful to the teachers?
Justify your actions.
Ans:Classification rule mining algorithms was helpful to the teachers we
justified our action as described below:-

Classification –
We built decision trees to try and predict exam marks (for the question related to
formal proofs). Decision trees are basically a type of procedure on the basis of this
we can decide either specific value is accept or reject by that procedure. It is
basically provides mapping with the current state to the future state. In this way it
helps to take decision in efficient manner. This follows theory of dynamic
optimization. Decision trees are using If-Then Statements. The major advantage to
use this technique is that results can be displayed in graphical manner so that user
understands easily.
For evaluating the decision trees we can use various factors like Dataset, Data type,
Scalability, accuracy and robustness etc.

                       Description of teacher activity
Mistakes              Rules               Learn




    Diagrammatical representation of classification rule mining algorithm

The information extracted greatly assisted us as teachers to better understand the
cohort of
Learners. Whilst SQL queries and various histograms were used during the course
of the
teaching semester to focus the following lecture on problem areas, the more
complex
mining was left for reflection between semesters.
Symbolic data analysis revealed that if students attempt at least two exercises, they
are
more likely to do more (probably overcoming the initial barrier of use) and
complete
their exercises. In subsequent years we required students to do at least 2 exercises
as
part of their assessment.
Mistakes that were associated together indicated to us that the very concept of
formal
proofs was a problem. In 2003, that portion of the course was redesigned to take
this
problem into account and the role of each part of the proof was emphasized. After
the
end of the semester, mining for mistakes associations was conducted again.
Surprisingly, results did not change much (a slight decrease in support and
confidence
levels in 2003 followed by a slight increase in 2004).

Q.5Based on your analysis, list out 5 best DM queries for the given context.

Ans:The 5 best DM queries for the given context are as follows:-
1.Symbolic data analysis revealed that if students attempt at least two exercises,
they are
more likely to do more and complete their exercises. In subsequent years we
required students to do at least 2 exercises as part of their assessment.

2.Mistakes that were associated together indicated to us that the very concept of
formal
proofs was a problem.
However, marks in the final exam continued increasing. This leads us to think that
making mistakes, especially while using a training tool, is simply part of the
learning process and was supported by the fact that the number of completed
exercises.

3.The level of prediction seems to be much better when the prediction is based on
exercises (number, length, variety of rules) rather than on mistakes made. This also
supports the idea that mistakes are part of the learning process, especially in a
practice
tool where mistakes are not penalised.

4.Using data exploration and results from decision tree, one can infer that if
students do
successfully 2 to 3 exercises for the topic, then they seem to have grasped the
concept
of formal proof and are likely to perform well in the exam question related to that
topic.
This finding is coherent with correlations calculated between marks in the final
exam
and activity with the Logic Tutor and with the general, human perception of tutors
in
this course. Therefore, a sensible warning system could look as follows. Report to
the
lecturer in charge students who have completed successfully less than 3 exercises.
For
those students, display the histogram of rules used. Be proactive towards these
students,
distinguishing those who use out the pop-up menu for logic rules from the others.

5.feedback functionality is available for association rule. where the teacher can
design his/her own proactive feedback for that particular sequence of mistakes. The
content of the page is up to the teacher. For instance for the pattern of mistakes A,
B -> C, the teacher may want to provide explanations about mistakes A and B
(which the current student has made) and review underlying concepts of mistake C.
This rule has a support and confidence.
Q.6From the given context, suggest few more areas where we can apply data
  mining in an University.
Ans:We can applied data mining in an University as described below:-

Student Academic Record

Attributes                   Description               Selected Attributes
Std_Reg_No                Student RegistrationNumber
Enroll_Yr                  Year of Enrollment
Completion_Yr              Year of Completion
CGPA_PG                   Post-Graduation CGPA                yes
CGPA_G                    Graduation CGPA
CGPA_HS                   High School CGPA
CGPA_T                     Tenth CGPA
E_ID                       E-Mail ID
MOB_NO                      Mobile Number
Date_of_Birth               Date of Birth
Performanc_Grade            Overall Performance              yes

Student Residence Record

Attributes                   Description                  Selected Attributes
Permanent Add                Permanent Address
Correspondance_Add           Address For Correspondence
City_C                         City
Location_L                    Location
State_S                        State

Student Personal Record

Attributes                   Description                  Selected Attributes
Gender_G                     Male-M,Female-F
Place_of_Birth_POB           Student Birth Place
Nationality_N              Nationality

More Related Content

What's hot (18)

PPTX
Basic course on computer-based methods
improvemed
 
PDF
Basic course for computer based methods
improvemed
 
PPT
2.8 accuracy and ensemble methods
Krish_ver2
 
PPT
Data.Mining.C.6(II).classification and prediction
Margaret Wang
 
PPTX
Machine Learning Unit 2 Semester 3 MSc IT Part 2 Mumbai University
Madhav Mishra
 
PPT
Data Mining
IIIT ALLAHABAD
 
PPT
MachineLearning.ppt
butest
 
PPT
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
error007
 
PDF
MACHINE LEARNING TOOLBOX
mlaij
 
PPT
Machine Learning: Foundations Course Number 0368403401
butest
 
PPTX
Machine learning
Rohit Kumar
 
PDF
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
IRJET Journal
 
PDF
Research scholars evaluation based on guides view using id3
eSAT Journals
 
PDF
AI Unit 5 machine learning
Narayan Dhamala
 
PDF
Using Naive Bayesian Classifier for Predicting Performance of a Student
ijtsrd
 
PPTX
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Simplilearn
 
PDF
PREDICTING STUDENT ACADEMIC PERFORMANCE IN BLENDED LEARNING USING ARTIFICIAL ...
ijaia
 
PPTX
Vecc day 1
achakracu
 
Basic course on computer-based methods
improvemed
 
Basic course for computer based methods
improvemed
 
2.8 accuracy and ensemble methods
Krish_ver2
 
Data.Mining.C.6(II).classification and prediction
Margaret Wang
 
Machine Learning Unit 2 Semester 3 MSc IT Part 2 Mumbai University
Madhav Mishra
 
Data Mining
IIIT ALLAHABAD
 
MachineLearning.ppt
butest
 
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
error007
 
MACHINE LEARNING TOOLBOX
mlaij
 
Machine Learning: Foundations Course Number 0368403401
butest
 
Machine learning
Rohit Kumar
 
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
IRJET Journal
 
Research scholars evaluation based on guides view using id3
eSAT Journals
 
AI Unit 5 machine learning
Narayan Dhamala
 
Using Naive Bayesian Classifier for Predicting Performance of a Student
ijtsrd
 
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Simplilearn
 
PREDICTING STUDENT ACADEMIC PERFORMANCE IN BLENDED LEARNING USING ARTIFICIAL ...
ijaia
 
Vecc day 1
achakracu
 

Viewers also liked (8)

PDF
Autonome voertuigen1
josprakker
 
PPS
L’oro blu del xxi secolo
nataleflory
 
PPT
Modelo jap.
stalinkronos
 
PPT
Modelo jap.
stalinkronos
 
DOC
2a revolucao industrial
Claudia Custodio
 
RTF
Avl tree tutorial
Ravi Kumar
 
PDF
SEO: Getting Personal
Kirsty Hulse
 
PDF
Hype vs. Reality: The AI Explainer
Luminary Labs
 
Autonome voertuigen1
josprakker
 
L’oro blu del xxi secolo
nataleflory
 
Modelo jap.
stalinkronos
 
Modelo jap.
stalinkronos
 
2a revolucao industrial
Claudia Custodio
 
Avl tree tutorial
Ravi Kumar
 
SEO: Getting Personal
Kirsty Hulse
 
Hype vs. Reality: The AI Explainer
Luminary Labs
 
Ad

Similar to Rd1 r17a19 datawarehousing and mining_cap617t_cap617 (20)

PDF
Analyzing undergraduate students’ performance in various perspectives using d...
Alexander Decker
 
PDF
IRJET- Academic Performance Analysis System
IRJET Journal
 
PDF
Dataminingoneducationaldomain (1)
IJASCSE
 
PDF
Fd33935939
IJERA Editor
 
PDF
Fd33935939
IJERA Editor
 
PDF
RESULT MINING: ANALYSIS OF DATA MINING TECHNIQUES IN EDUCATION
International Journal of Technical Research & Application
 
PDF
Data Mining
Michael Shilman
 
PDF
Application of Higher Education System for Predicting Student Using Data mini...
AM Publications
 
PDF
Data Mining Techniques in Higher Education an Empirical Study for the Univer...
IJMER
 
PDF
ICELW Conference Slides
toolboc
 
PDF
Pros And Cons Of Applying Association Rule Mining In LMS
NIET Journal of Engineering & Technology (NIETJET)
 
PDF
Data Clustering in Education for Students
IRJET Journal
 
PDF
Gr2411971203
IJERA Editor
 
PPTX
Clustering, Types of clustering, Types of data
SherinRappai
 
PPTX
Clustering.pptx
SherinRappai1
 
PPTX
Clustering.pptx
SherinRappai
 
PDF
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
IJERA Editor
 
Analyzing undergraduate students’ performance in various perspectives using d...
Alexander Decker
 
IRJET- Academic Performance Analysis System
IRJET Journal
 
Dataminingoneducationaldomain (1)
IJASCSE
 
Fd33935939
IJERA Editor
 
Fd33935939
IJERA Editor
 
RESULT MINING: ANALYSIS OF DATA MINING TECHNIQUES IN EDUCATION
International Journal of Technical Research & Application
 
Data Mining
Michael Shilman
 
Application of Higher Education System for Predicting Student Using Data mini...
AM Publications
 
Data Mining Techniques in Higher Education an Empirical Study for the Univer...
IJMER
 
ICELW Conference Slides
toolboc
 
Pros And Cons Of Applying Association Rule Mining In LMS
NIET Journal of Engineering & Technology (NIETJET)
 
Data Clustering in Education for Students
IRJET Journal
 
Gr2411971203
IJERA Editor
 
Clustering, Types of clustering, Types of data
SherinRappai
 
Clustering.pptx
SherinRappai1
 
Clustering.pptx
SherinRappai
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
IJERA Editor
 
Ad

Recently uploaded (20)

PDF
Generative AI: it's STILL not a robot (CIJ Summer 2025)
Paul Bradshaw
 
PPSX
HEALTH ASSESSMENT (Community Health Nursing) - GNM 1st Year
Priyanshu Anand
 
PDF
The History of Phone Numbers in Stoke Newington by Billy Thomas
History of Stoke Newington
 
PDF
Chapter-V-DED-Entrepreneurship: Institutions Facilitating Entrepreneurship
Dayanand Huded
 
PPTX
Stereochemistry-Optical Isomerism in organic compoundsptx
Tarannum Nadaf-Mansuri
 
PPTX
Growth and development and milestones, factors
BHUVANESHWARI BADIGER
 
PDF
The-Ever-Evolving-World-of-Science (1).pdf/7TH CLASS CURIOSITY /1ST CHAPTER/B...
Sandeep Swamy
 
PDF
Dimensions of Societal Planning in Commonism
StefanMz
 
PPTX
grade 5 lesson matatag ENGLISH 5_Q1_PPT_WEEK4.pptx
SireQuinn
 
PDF
LAW OF CONTRACT ( 5 YEAR LLB & UNITARY LLB)- MODULE-3 - LEARN THROUGH PICTURE
APARNA T SHAIL KUMAR
 
PPTX
A PPT on Alfred Lord Tennyson's Ulysses.
Beena E S
 
PDF
People & Earth's Ecosystem -Lesson 2: People & Population
marvinnbustamante1
 
PPTX
STAFF DEVELOPMENT AND WELFARE: MANAGEMENT
PRADEEP ABOTHU
 
PPTX
MENINGITIS: NURSING MANAGEMENT, BACTERIAL MENINGITIS, VIRAL MENINGITIS.pptx
PRADEEP ABOTHU
 
PDF
community health nursing question paper 2.pdf
Prince kumar
 
PPTX
How to Convert an Opportunity into a Quotation in Odoo 18 CRM
Celine George
 
PDF
0725.WHITEPAPER-UNIQUEWAYSOFPROTOTYPINGANDUXNOW.pdf
Thomas GIRARD, MA, CDP
 
PPTX
I AM MALALA The Girl Who Stood Up for Education and was Shot by the Taliban...
Beena E S
 
PPTX
How to Manage Large Scrollbar in Odoo 18 POS
Celine George
 
PPTX
SPINA BIFIDA: NURSING MANAGEMENT .pptx
PRADEEP ABOTHU
 
Generative AI: it's STILL not a robot (CIJ Summer 2025)
Paul Bradshaw
 
HEALTH ASSESSMENT (Community Health Nursing) - GNM 1st Year
Priyanshu Anand
 
The History of Phone Numbers in Stoke Newington by Billy Thomas
History of Stoke Newington
 
Chapter-V-DED-Entrepreneurship: Institutions Facilitating Entrepreneurship
Dayanand Huded
 
Stereochemistry-Optical Isomerism in organic compoundsptx
Tarannum Nadaf-Mansuri
 
Growth and development and milestones, factors
BHUVANESHWARI BADIGER
 
The-Ever-Evolving-World-of-Science (1).pdf/7TH CLASS CURIOSITY /1ST CHAPTER/B...
Sandeep Swamy
 
Dimensions of Societal Planning in Commonism
StefanMz
 
grade 5 lesson matatag ENGLISH 5_Q1_PPT_WEEK4.pptx
SireQuinn
 
LAW OF CONTRACT ( 5 YEAR LLB & UNITARY LLB)- MODULE-3 - LEARN THROUGH PICTURE
APARNA T SHAIL KUMAR
 
A PPT on Alfred Lord Tennyson's Ulysses.
Beena E S
 
People & Earth's Ecosystem -Lesson 2: People & Population
marvinnbustamante1
 
STAFF DEVELOPMENT AND WELFARE: MANAGEMENT
PRADEEP ABOTHU
 
MENINGITIS: NURSING MANAGEMENT, BACTERIAL MENINGITIS, VIRAL MENINGITIS.pptx
PRADEEP ABOTHU
 
community health nursing question paper 2.pdf
Prince kumar
 
How to Convert an Opportunity into a Quotation in Odoo 18 CRM
Celine George
 
0725.WHITEPAPER-UNIQUEWAYSOFPROTOTYPINGANDUXNOW.pdf
Thomas GIRARD, MA, CDP
 
I AM MALALA The Girl Who Stood Up for Education and was Shot by the Taliban...
Beena E S
 
How to Manage Large Scrollbar in Odoo 18 POS
Celine George
 
SPINA BIFIDA: NURSING MANAGEMENT .pptx
PRADEEP ABOTHU
 

Rd1 r17a19 datawarehousing and mining_cap617t_cap617

  • 1. HOME WORK TITLE-DESIGN PROBLEM COURSE CODE-CAP 617T COURSE INSTRUCTOR-Lect.Neha Malhotra Mam COURSE TUTOR-DO ALLOCATION DATE-01/09/12 SUBMITION DATE-01/11/12 STUDENT ROLL NO. – A19 SECTION-D1R17 DECLARATION, I declare that this design problem is individual work. I have not copied from any student work or from any other source except where due acknowledgement is made explicitly in the text, nor has any part been written from me by another person. EVALUATOR COMMENT……… STUDENT SIGN MARKS OBTAINED………. PRIYA RANJAN
  • 2. Q.1What kind of data mining algorithms are used for mining the database and justify the importance of the tool. Ans:Kind of data mining algorithms are used for mining the database and we justified the importance of the tool as described below:- When student mistakes are recorded, association rules algorithms can be used to find mistakes often associated together. Combined with a genetic algorithm, concepts mastered together can be identified using student scores. The teacher may use these findings to reflect on his/her teaching and re-design the course material. It also comprises data exploration and visualization to present results in a convenient way to users. New variables may be calculated and used in algorithms, such as the average number of mistakes made per attempted exercise. Tools: We used a range of tools. Initially we worked with Excel and Access to perform simple SQL queries and visualization. Data exploration and visualization: Raw data and algorithm results can be visualized Through tables and graphics such as graphs and histograms as well as through more specific Techniques such as symbolic data analysis. The aim is to display data along certain attributes and make extreme points, trends and clusters obvious to human eye. Clustering algorithms aim at finding homogeneous groups in data. The task of identifying groups of records that are similar between themselves but different from the reset of the data. Often the variables providing the best clustering should be identified as well. We used k-means clustering and its combination with hierarchic clustering. Both methods rest on a distance concept between individuals. We used Euclidian distance. Classification is used to predict values for some variable. The task of finding a function that maps records into one of several discrete classes. For example, given all the work done by a student, one may want to predict whether the student will perform well in the final exam. We used C4.5 decision tree from TADA-Ed which relies on the concept of entropy. The tree can be represented by a set of rules such as: if x=v1 and y> v2 then t= v3.Thus, depending on the values an individual takes for, say the variables x and y, one can predict its value for t. The tree is built taking a representative population and is used to Predict values for new individuals.
  • 3. Association rules find relations between items. Rules have the following form: X ->Y, support 40%, confidence 66%, which could mean 'if students get X incorrectly, then they get also Y incorrectly', with a support of 40% and a confidence of 66%. Support is the frequency in the population of individuals that contains both X and Y. Confidence is the percentage of the instances that contains Y amongst those which contain X. We implemented a variant of the standard Apriori algorithm in TADA-Ed that takes temporality into account. Taking temporality into account produces a rule X->Y only if exercise X occurred before Y. Q.2What preprocessing steps are lcarried out for effective mining and comment on the same. Ans:Tada-ed provides a pre-processing facility steps are lcarried out for effective mining as described below:- Tada-ed provides a pre-processing facility which allows making the data minable. We need to specify two aspects: (1) What element we want to cluster or classify: students, exercises, mistakes? (2) Which attributes and distance do we want to retain to compare these elements? Example:-An example could be to cluster students, using the number of mistakes they made and the number of correct steps they entered. The data was maintained in different tables was joined in a single table. After we integrated the data into one files, to increase interpretation and comprehensibility; we discretized the attributes to categorical ones. For examples, we grouped all grades into five groups’ excellent, very good, good, average, and poor. In this step the fields used in the study were determined and transformed if necessary. By using normal distribution method, we categorized the value of each item in questionnaire with High, Medium and Low Data Preparation:-
  • 4. Q.3In which way the Association rule mining algorithms and clustering algorithms are useful during the analysis and interpretation? Ans:-Association Rules: Spatial and non-spatial association rules are in the form of X-->Y(c%), where X and Y are sets of spatial or non-spatial predicates and c% is the confidence of the rule. For examples:- Is_a (X, origin of CSU student) --> close to (X,railway stations) (70%), this rule states that 60% of CSU students originally live close to highway.- Is_a (X, CSU student) --> from (X, middle education level areas), (80%), this rule states that 80% of CSU students are from the area which have middle level of education.For a large database where there are a large set of objects and attributes, there may exist a large number of associations between them (Koperski, 1998). Some rules may only apply to a small number of objects, for example, less than 5% of CSU's Park Recreation and Heritage students are associated with a disability code, therefore it may not be of interest for the further study.While 75% of students live within 10 km of railway stations, therefore it attracted further study. A minimum support threshold needs to be specified along with minimum confidence threshold to filter out the uninterested associations. We used association rules to find mistakes often occurring together while solving exercises. The purpose of looking for these associations is for the teacher to ponder and, may be, to
  • 5. review the course material or emphasize subtleties while explaining concepts to students. Thus, it makes sense to have a support that is not too low. Association rules for Year 2004. M11 ==> M12 [sup: 77%, conf: 89%] M12 ==> M11 [sup: 77%, conf: 87%] M11 ==> M10 [sup: 74%, conf: 86%] M10 ==> M12 [sup: 78%, conf: 93%] M12 ==> M10 [sup: 78%, conf: 89%] M10 ==> M12 [sup: 74%, conf: 88%] M10: Premise set incorrect M11: Rule can be applied, but deduction incorrect M12: Wrong number of line reference given The first association rule says that if students make mistake Rule can be applied, but deduction incorrect while solving an exercise, then they also made the mistake Wrong number of line references given while solving the same exercise. Clustering Clustering provides grouping together of similar data items. This technique provides a high level view of the database. Clustering technique is a technique that merges and combines techniques from different disciplines such as mathematics, physics, math-programming, statistics, computer sciences, artificial intelligence and databases etc. Variety of clustering algorithms exists and belongs to several different categories. This technique helps to create new groups and classes based on the study of patterns and relationships between values of data in a data bank. In our case study we are considering students data which exists in three separate clusters named as student academic record, student residence record, student personal record etc.These clusters are again consisting of several its own attributes. By applying this technique we will easily categorize the record of the student in detail manner. For example:- CGPA Grade shows the total marks of the student academic record cluster. On the basis of marks we can easily identify the actual performance or the level of the student. This can be shown in below figure.
  • 6. Different cluster numbers were tried, and successful partitioning was achieved with 5 clusters. In our case study, the cluster graph as a picture of students group according to their performance on figure 3 gives. For graphs, the Rapidminer software was used. The graphs are given in figure 4 is deviation plot five clusters of students. Using these results we can divide students into five groups and guide them according to their behavior. We performed clustering using this subpopulation, both using (i) k-means in TADA-Ed, and (ii) a combination of k-means and hierarchical clustering of Clementine. Because there is neither a fixed number nor a fixed set of exercises to compare students, determining a distance between individuals was not obvious. We calculated and used a new variable: the total number of mistakes made per student in an exercise.
  • 7. As a result, students with similar frequency of mistakes were put in the same group. Histograms showing the different clusters revealed interesting patterns. Consider the histogram shown in Figure 1 obtained with TADA-Ed. There are three clusters: 0 (red, on the left), 1 (green, in the middle) and 4 (purple, on the right). From other windows (not shown) we know that students in cluster 0 made many mistakes per exercise not finished, students in cluster 1 made few mistakes and students in cluster 4 made an intermediate number of mistakes. Students making many mistakes use also many different logic rules while solving exercises, this is shown with the vertical, almost solid lines. Histogram showing, for each cluster of students, the rules incorrectly used per student Q.4How classification rule mining algorithms was helpful to the teachers? Justify your actions. Ans:Classification rule mining algorithms was helpful to the teachers we justified our action as described below:- Classification – We built decision trees to try and predict exam marks (for the question related to formal proofs). Decision trees are basically a type of procedure on the basis of this we can decide either specific value is accept or reject by that procedure. It is basically provides mapping with the current state to the future state. In this way it helps to take decision in efficient manner. This follows theory of dynamic optimization. Decision trees are using If-Then Statements. The major advantage to use this technique is that results can be displayed in graphical manner so that user understands easily. For evaluating the decision trees we can use various factors like Dataset, Data type, Scalability, accuracy and robustness etc. Description of teacher activity
  • 8. Mistakes Rules Learn Diagrammatical representation of classification rule mining algorithm The information extracted greatly assisted us as teachers to better understand the cohort of Learners. Whilst SQL queries and various histograms were used during the course of the teaching semester to focus the following lecture on problem areas, the more complex mining was left for reflection between semesters. Symbolic data analysis revealed that if students attempt at least two exercises, they are more likely to do more (probably overcoming the initial barrier of use) and complete their exercises. In subsequent years we required students to do at least 2 exercises as part of their assessment. Mistakes that were associated together indicated to us that the very concept of formal proofs was a problem. In 2003, that portion of the course was redesigned to take this problem into account and the role of each part of the proof was emphasized. After the end of the semester, mining for mistakes associations was conducted again. Surprisingly, results did not change much (a slight decrease in support and confidence levels in 2003 followed by a slight increase in 2004). Q.5Based on your analysis, list out 5 best DM queries for the given context. Ans:The 5 best DM queries for the given context are as follows:- 1.Symbolic data analysis revealed that if students attempt at least two exercises, they are
  • 9. more likely to do more and complete their exercises. In subsequent years we required students to do at least 2 exercises as part of their assessment. 2.Mistakes that were associated together indicated to us that the very concept of formal proofs was a problem. However, marks in the final exam continued increasing. This leads us to think that making mistakes, especially while using a training tool, is simply part of the learning process and was supported by the fact that the number of completed exercises. 3.The level of prediction seems to be much better when the prediction is based on exercises (number, length, variety of rules) rather than on mistakes made. This also supports the idea that mistakes are part of the learning process, especially in a practice tool where mistakes are not penalised. 4.Using data exploration and results from decision tree, one can infer that if students do successfully 2 to 3 exercises for the topic, then they seem to have grasped the concept of formal proof and are likely to perform well in the exam question related to that topic. This finding is coherent with correlations calculated between marks in the final exam and activity with the Logic Tutor and with the general, human perception of tutors in this course. Therefore, a sensible warning system could look as follows. Report to the lecturer in charge students who have completed successfully less than 3 exercises. For those students, display the histogram of rules used. Be proactive towards these students, distinguishing those who use out the pop-up menu for logic rules from the others. 5.feedback functionality is available for association rule. where the teacher can design his/her own proactive feedback for that particular sequence of mistakes. The content of the page is up to the teacher. For instance for the pattern of mistakes A, B -> C, the teacher may want to provide explanations about mistakes A and B (which the current student has made) and review underlying concepts of mistake C. This rule has a support and confidence.
  • 10. Q.6From the given context, suggest few more areas where we can apply data mining in an University. Ans:We can applied data mining in an University as described below:- Student Academic Record Attributes Description Selected Attributes Std_Reg_No Student RegistrationNumber Enroll_Yr Year of Enrollment Completion_Yr Year of Completion CGPA_PG Post-Graduation CGPA yes CGPA_G Graduation CGPA CGPA_HS High School CGPA CGPA_T Tenth CGPA E_ID E-Mail ID MOB_NO Mobile Number Date_of_Birth Date of Birth Performanc_Grade Overall Performance yes Student Residence Record Attributes Description Selected Attributes Permanent Add Permanent Address Correspondance_Add Address For Correspondence City_C City Location_L Location State_S State Student Personal Record Attributes Description Selected Attributes Gender_G Male-M,Female-F Place_of_Birth_POB Student Birth Place Nationality_N Nationality