Data mining query language

BY
G.GOWRILATHA,M.Sc(Info Tech)
Department Of CS & IT
Nadar Saraswathi College Of Arts and Science,
Theni.

 A desired feature of data mining systems is the ability to support ad hoc
and interactive data
mining in order to facilitate the flexible and effective knowledge discovery.
Data mining query
languages can be designed to support such a feature.
 The standardization of relational query languages, which occurred at the
early stages of relational database development, is widely credited for the
success of the relational
database field.
The recent standardization
activities in database systems, such as work relating to SQL-3, and so on,
further illustrate the
importance of having a standard database language for success the
development and commercialization
of database systems.

• The set of task-relevant data to be mined
• The kind of knowledge to be mined
• The background to be mined
• The background knowledge to be used in the
discovery process
• The interestingness measures and thresholds for
pattern evolution
• The expected representation for visualizing the
discovered patterns

The first step in defining a data-mining task is the
specification of the task-relevant data, that is, the data
on which mining is to be performed. This involves
specifying the database and tables or data warehouse
containing the relevant data, conditions for selecting
the relevant data, the relevant attributes or merinos
for exploration, and instructions regarding the order
or grouping of the data retried.
Use database (database_name) or use the data
warehouse (data_warehouse_name): the use clause
directs the mining task to the database or data
warehouse specified.

From (relation (s)/cubes(s)) [ware (condition)]: the
from and where clauses respectively specify the
database tables or data cubes involved, and the
conditions defining the data to be retrieved
(DMQL)::= (DMQL_statement} ;{(DMQL_Statement)
(DMQL_Statement)::= (Data_Mining_Statment)
| {Concept_Hierarchy,.Denmtion_Statement)
| (Visualization_and_Presentation)
Syntax for Specifying the Kind of Knowledge to be mined
The (Mine_Knowledge_Specification) statement is used to
specify the kind of knowledge to be mined.
In other words, it indicates the data mining functionality to be
performed. Its syntax is defined below for
characterization, discrimination, association, and classification.

(Mine_Knowledge_Specification) ::=mine
characteristics [as (pattern_name) ]
Analyze (measure(s))
This specifies that characteristic descriptions are to be
mined. The analyze clause, when used for
characterization, specifies aggregate measure, such as
count, sum, or count%( percentage count, i.e.,
the percentage of tuples in the relevant data set with
the specified (characteristics). These measures are
to be computed for each data characteristic found.

Concept hierarchies allow the mining of knowledge at
multiple levels of abstraction. In order to
accommodate the different viewpoints of users with regard
to the data, there may be more than one
concept hierarchy per attribute or dimension. For instance,
some users may prefer to organize branch
locations by provinces and states, while others may prefer to
organize them according to languages
used. In such cases, a user can indicate which concept
hierarchy is to be used with statement use
hierarchy (hierarchy_name) for {attribute _or_dimension)
Otherwise, a default hierarchy per attribute
or dimension is used.

The user can help control the number of uninteresting
patterns returned by the data mining
system by specifying measures of pattern
interestingness and their corresponding thresholds.
Interestingness measures and thresholds can be
specified be the user with the statement with
{(interest_measure_name)] threshold-(threshold_value)

Our data mining query language needs syntax that allows
users to specify the
display of discovered patterns in one or more forms,
including rules, tables cross tabs, pie or bar charts,
decision trees ,cubes ,curves or surfaces-We define the
DMQL display statement for this purpose;
display a
(Result _form)
Where the ( result_form) could be any of the knowledge
presentation or visualization forms listedInteractive mining
should allow the discovered patterns to be viewed at
different concept levels or from
different angles.

The attribute added must be one of the attributes listed
in
the in relevance to clause for task-relevant specification.
The user can alternately view the patterns at
different levels of abstractions with the use of following
DMQL syntax:
(Multilevel_Manapulation)::= roll up on
(attribute_or_dimension)
| drill down on (attribute_or_dimension)
| add (attribute_or_dimension)
| drop (attribute_or_dimension)

• Data collection and data mining query compositions:
This component allows the user to specify task-relevant data
sets and to compose data mining queries. It is similar to
GUI’s used for the specification of relational queries.
• Presentations of discovered patterns:
This component allows the display of the discovered
patterns in various forms, including tables, graphs, charts,
curves and other visualization techniques.
• Hierarchy specification and manipulation:
This component allows for concept hierarchy specification,
either manually by the user or automatically (based on
analysis of the data athand). In addition, this component
should allow concept hierarchies to be modified by the user
or adjusted automatically based on the given data set
distribution.

• Manipulation of data mining primitives:
This component may allow the dynamic adjustment of
the data mining thresholds, as well as the selection,
display and modification of concept
hierarchies. It may also allow the modification of
previous data mining queries or conditions.
• Interactive multilevel mining:
This component should allow roll-up or drill-down
operations on discovered patterns. Other miscellaneous
information: this component may include on-line help
manuals indexed search, debugging, and other
interactive graphical facilities. The design of a
graphical user interface should also take into
consideration different classes of users of a data
mining system.

No coupling:
No coupling means that a DM system will not utilize any
function of a DB or DW system., It may fetch
data from a particular source (such as a file system), process
data using some data mining algorithms,
and then store the mining results in another file. Such a
system, though simple, suffers from several
drawbacks. First, a DB system provides a great deal of
flexibility and efficiency at storing, organizing,
accessing, and processing data. Without using a DB/DW
system, a DM system may spend a substantial
amount of time finding, collecting, cleaning, and
transforming data. In DB and/or DW systems, data tend
to be well organized, indexed, cleaned integrated, or
consolidated, so that finding the task-relevant,
high-quality data becomes an easy task. Second, there are
many tested, scalable algorithms and data
structures implemented in DB and DW systems.

Loose coupling:
Loose coupling means that a DM system will use some
facilities of a DB or DW system, fetching
data from a data repository managed by these systems,
performing data mining, and then storing the
mining results either in a file or in a designated place in
a database or data warehouse. Loose coupling is
better than no coupling since it con fetch any portion of
data stored in databases or data warehouses by
using query processing. Indexing and other system
facilities. It incurs some advantages of the flexibility,
efficiency, and other features provided by such systems.

Semi tight coupling:
Semi tight coupling means that besides linking a DM
system to a DB/DW system, efficient
implementations of a few identical data mining
functions) can be provided in the DB/DW system. These
primitives can include sorting, indexing, aggregation,
histogram analysis, multilayer join, and
precipitation of some essential statistical measures, such
as sum, count, max, min, standard deviation
and so on. Moreover, some frequently used intermediate
mining results can be pre-computed and
stored in the DB/DW system. Since these intermediate
mining results are either pre-computed or can
be computed efficiently, this design will enhance the
performance of a DM system.

Tight coupling:
Tight coupling means that a DM system is smoothly
integrated into the DB/W system. the data mining
subsystem is treated as one functional component of an
information system. Data mining queries and
functions are optimized based on mining query
analyses, data structures, indexing schemes, and query
processing methods of a DB/DW system. With further
technology advances, DM, DB, and DW systems
with evolve and integrate together as one information
system with multiple functionalities .This will
provide a uniform information processing environment.

Data mining query language

More Related Content

What's hot (20)

Similar to Data mining query language (20)

More from GowriLatha1 (20)

Recently uploaded (20)

Data mining query language