2. Data mining
Definition:
Data mining in databases is the
process of analyzing large sets of data stored
in a database to find patterns, trends, and
useful information.
Why Important?
Without data mining, large
amounts of data stored in databases might go
underused
•.
3. Key Concepts in Data Mining
Data Warehousing:
Centralized data storage from different sources for analysis.
Database Queries:
Using SQL to retrieve specific data for mining.
Data Preprocessing:
Cleaning data to ensure it is usable for mining (removing errors or irrelevant
info).
Mining Algorithms:
Tools used to analyze data, like classification, clustering, and association rules.
4. Data Mining Techniques in Databases
1. Classification: Organizing data into predefined
categories.
Example: Categorizing customer data into different
segments (e.g., age groups).
2. Clustering: Grouping data into clusters based on
similarities.
Example: Grouping customers by purchasing behavior
without predefined labels.
3. Association Rule Mining: Discovering relationships
between data.
Example: If a customer buys a laptop, they might also
buy a mouse.
4. Regression: Predicting values based on existing data.
Example: Predicting the future sales of a product
based on past data.
5. Data Mining Process in Databases
Data Selection: Choosing relevant data from the
database.
Data Cleaning: Removing errors or irrelevant data from
the database.
Data Transformation: Converting data into a suitable
format for mining.
Data Mining: Applying mining techniques to discover
patterns.
Pattern Evaluation: Assessing the usefulness of
discovered patterns.
Knowledge Representation: Presenting the results of
data mining in an understandable way.
6. Challenges of Data Mining
Data Quality Issues:
Incomplete, noisy, or inconsistent data can lead to wrong results.
Scalability:
Mining large databases with millions of records can be resource-intensive.
Privacy Concerns:
Mining personal data requires ethical handling and user consent.
Complexity:
Finding meaningful patterns in huge datasets can be difficult.
8. What is Data Warehousing?
■ A data warehouse is a system that collects and
stores large amounts of data from different sources
in one place. It's organized to make it easy to
analyze the data and create reports, helping
businesses make better decisions.
9. Key characteristics
■ Organized for Fast Answers
Data is structured to make searching and analyzing quick and easy.
■ Handles Large Data
Can store and process a huge amount of data from many sources
■ Consistent Data
Ensures all data is cleaned and formatted the same way, so everyone gets reliable
results.
■ Non-Volatile
Once data is entered into the warehouse, it is not altered or deleted.
10. Components of data warehouse
■ Data Sources
■ ETL/ELT Process
Extract:
Transform
Load
■ Data Storage
■ Metadata
■ Access Tools
11. Architectures of data warehouse
■ Single-Tier Architecture
Everything is in one place
■ Two-Tier Architecture
One layer stores and manages the data.
Another layer is for users to access and analyze the data.
■ Three-Tier Architecture
Bottom Tier: Data storage
Middle Tier: Processing layer
Top Tier: User interface