This document discusses data mining and the architecture of data mining systems. It describes data mining as extracting knowledge from large amounts of data. The architecture of a data mining system is important, with a good system facilitating efficient and timely data mining tasks. Different levels of coupling between data mining systems and database/data warehouse systems are described, including no coupling, loose coupling, semi-tight coupling, and tight coupling. Tight coupling provides the most integrated and optimized system but is also the most complex to implement.
Introduction of S. Nandhini from Nadar Saraswathi Arts and Science College, Theni.
Data mining is the extraction of knowledge from large data sets, also known as knowledge discovery in databases. It highlights the importance of data mining system architecture.
Diagram illustrating components of a data mining system, including data selection, integration, and evaluation processes.
Different coupling schemes for integrating data mining systems with databases/data warehouses: No Coupling, Loose Coupling, Semi-Tight Coupling, Tight Coupling.
No coupling indicates no interaction with DB/DW systems, leading to inefficiencies in data processing and poor design.
Loose coupling allows partial use of DB/DW facilities, enabling efficient data fetching and storage, enhancing scalability and performance.
Semi-tight coupling integrates various primitives like sorting and indexing, providing structured data mining capabilities.
Tight coupling ensures smooth integration of data mining with DB/DW systems for optimized performance and efficient data processing.
Comparison of coupling types: loose coupling is more efficient than no coupling; tight coupling is ideal but complex, requiring further research.
DATA MINING:
Data miningrefers to extracting or
“mining” knowledge from large amounts of
data. Also referred as knowledge discovery in
databases.
ARCHITECTURE OF DATA MINING
SYSTEM:
*The architecture and design of a
data mining system is critically important.
*A good system architecture will
facilitate the system to make best use of the
software environment,accomplish data mining
tasks in an efficient and timely manner.
3.
ARCHITECTURE OF DATAMINING DIAGRAM
Data Selection
Data integration
database
Data
warehouse
World
wide web
Other
informatio
n repostior
Database/data warehouse server
Database engine
Data/pattern evaluation
Graphical user interface
Knowledge
base
4.
Data mining systemcan be integrated with a
DB/DW system using the following coupling
schemes:
NO COUPLING
LOOSE COUPLING
SEMI-TIGHT COUPLING
TIGHT COUPLING
5.
NO COUPLING:
No couplingmeans that a DM system
will not utilize any function of a DB or DW
system.
It may fetch data from a particular
source process data using some data mining
algorthims, and then store the mining result in
another file.
DM system may spend a substantial
amount of time finding,collecting,cleaning, and
transforming data.
No coupling represents a poor design.
6.
LOOSE COUPLING:
* Loosecoupling means that a DM
system will use some facilities of a DB or DW
system.
*Fetching data from a data repository
managed by these system,performing data mining
and then storing the mining results either in file or
in a designated place in a database or data
warehouse.
Advantages of the Flexibility, efficiency,
and other features provided by such systems.
loose coupling to achieve High Scalability
and good performance with large data sets.
7.
SEMI-TIGHT COUPLING:
Semitight couplingmeans that
besides linking a DM system to a DB/DM
system.
These primitives can include
sorting,indexing,aggregation,histogram
analysis,multiway join,and precomputation of
some essential statistical measures.
such as sum,count,max,min,standard
deviation
8.
TIGHT COUPLING:
Tightcoupling means that a DM system is smoothly
integrated into DB/DM system.
Data mining queries and functions are optimized
based on mining query analysis,data
structures,indexing schemes,and query processing
methods of a DB or Dw system
Efficient implementations of data mining
functions,high system performance, and an
integrated information processing environment.
9.
Loose coupling thoughnot efficient is
better than no coupling since it makes use of
both data and system facilities of a DB/DW
system.
Tight coupling is highly desirable but
its implementation is nontrivial and more
research is needed in this area.
Semitight coupling is a compromise
between loose and tight coupling.