SlideShare a Scribd company logo
Data Warehousing and Data Mining (CS601)
Unit I: Data Warehouse and OLAP
Introduction to data Warehouse:
A Data Warehouse is separate from DBMS, it stores a huge amount of data, which is
typically collected from multiple heterogeneous sources like files, DBMS, etc. The goal is to
produce statistical results that may help in decision-making.
Data Warehouse is a central place where data is stored from different data sources and
applications. A Data Warehouse is always kept separate from an Operational Database.
A data warehouse can also be viewed as a database for historical data from different functions
within a company. The term Data Warehouse was coined by Bill Inmon in 1990, which he
defined in the following way: "A warehouse is a subject-oriented, integrated, time-variant and
non-volatile collection of data in support of management's decision making process".
A Data Warehouse is used for reporting and analyzing of information and stores both historical
and current data. The data in DW system is used for Analytical reporting, which is later used by
Business Analysts, Sales Managers or Knowledge workers for decision-making.
In the above image, you can see that the data is coming from multiple heterogeneous
data sources to a Data Warehouse. Common data sources for a data warehouse includes −
Mrs. Ujjwala S. Patil- SITCOE Yadrav Page 1
Data Warehousing and Data Mining (CS601)
 Operational databases
 SAP and non-SAP Applications
 Flat Files (xls, csv, txt files)
Data in data warehouse is accessed by BI (Business Intelligence) users for Analytical Reporting,
Data Mining and Analysis. This is used for decision making by Business Users, Sales Manager,
Analysts to define future strategy.
Difference between Operational Database System and Data Warehouse:
Operational Database Data Warehouse
Operational systems are designed to support
high-volume transaction processing.
Data warehousing systems are typically
designed to support high-volume analytical
processing (i.e., OLAP).
Operational systems are usually concerned with
current data.
Data warehousing systems are usually
concerned with historical data.
Data within operational systems are mainly
updated regularly according to need.
Non-volatile, new data may be added
regularly. Once Added rarely changed.
It is designed for real-time business dealing and
processes.
It is designed for analysis of business measures
by subject area, categories, and attributes.
It is optimized for a simple set of transactions,
generally adding or retrieving a single row at a
time per table.
It is optimized for extent loads and high,
complex, unpredictable queries that access
many rows per table.
It is optimized for validation of incoming
information during transactions, uses validation
data tables.
Loaded with consistent, valid information,
requires no real-time validation.
It supports thousands of concurrent clients.
It supports a few concurrent clients relative to
OLTP.
Operational systems are widely process-
oriented.
Data warehousing systems are widely subject-
oriented
Operational systems are usually optimized to
perform fast inserts and updates of associatively
small volumes of data.
Data warehousing systems are usually
optimized to perform fast retrievals of
relatively high volumes of data.
Mrs. Ujjwala S. Patil- SITCOE Yadrav Page 2
Data Warehousing and Data Mining (CS601)
Data In Data Out
Less Number of data accessed. Large Number of data accessed.
Relational databases are created for on-line
transactional Processing (OLTP)
Data Warehouse designed for on-line
Analytical Processing (OLAP)
Data Warehouse Characteristics:
Integrated Data:
One of the key characteristics of a data warehouse is that it contains integrated data. This means
that the data is collected from various sources, such as transactional systems, and then cleaned,
transformed, and consolidated into a single, unified view. This allows for easy access and
analysis of the data, as well as the ability to track data over time.
Subject-Oriented:
A data warehouse is also subject-oriented, which means that the data is organized around
specific subjects, such as customers, products, or sales. This allows for easy access to the data
relevant to a specific subject, as well as the ability to track the data over time.
Non-Volatile:
Another characteristic of a data warehouse is that it is non-volatile. This means that the data in
the warehouse is never updated or deleted, only added to. This is important because it allows for
the preservation of historical data, making it possible to track trends and patterns over time.
Time-Variant:
A data warehouse is also time-variant, which means that the data is stored with a time dimension.
This allows for easy access to data for specific time periods, such as last quarter or last year. This
makes it possible to track trends and patterns over time.
Data Warehouse architecture and its components:
Data Warehouse architecture:
Mrs. Ujjwala S. Patil- SITCOE Yadrav Page 3
Data Warehousing and Data Mining (CS601)
A data warehouse architecture is a method of defining the overall architecture of data
communication processing and presentation that exist for end-clients computing within the
enterprise. Each data warehouse is different, but all are characterized by standard vital
components.
Data Warehouse applications are designed to support the user ad-hoc data requirements, an
activity recently dubbed online analytical processing (OLAP). These include applications such as
forecasting, profiling, summary reporting, and trend analysis.
Three common architectures are:
o Data Warehouse Architecture: Basic
o Data Warehouse Architecture: With Staging Area
o Data Warehouse Architecture: With Staging Area and Data Marts
Data Warehouse Architecture: Basic
Operational System
An operational system is a method used in data warehousing to refer to a system that is used to
process the day-to-day transactions of an organization.
Flat Files
A Flat file system is a system of files in which transactional data is stored, and every file in the
system must have a different name.
Meta Data
Mrs. Ujjwala S. Patil- SITCOE Yadrav Page 4
Data Warehousing and Data Mining (CS601)
A set of data that defines and gives information about other data. Meta Data used in Data
Warehouse for a variety of purpose, including:
Meta Data summarizes necessary information about data, which can make finding and work with
particular instances of data more accessible. For example, author, data build, and data changed,
and file size are examples of very basic document metadata.
Metadata is used to direct a query to the most appropriate data source.
Lightly and highly summarized data
The area of the data warehouse saves all the predefined lightly and highly summarized
(aggregated) data generated by the warehouse manager.
The goals of the summarized information are to speed up query performance. The summarized
record is updated continuously as new information is loaded into the warehouse.
End-User access Tools
The principal purpose of a data warehouse is to provide information to the business managers for
strategic decision-making. These customers interact with the warehouse using end-client access
tools.
The examples of some of the end-user access tools can be:
o Reporting and Query Tools
o Application Development Tools
o Executive Information Systems Tools
o Online Analytical Processing Tools
o Data Mining Tools
Data Warehouse Architecture: With Staging Area
We must clean and process your operational information before put it into the warehouse.
We can do this programmatically, although data warehouses uses a staging area (A place where
data is processed before entering the warehouse).
A staging area simplifies data cleansing and consolidation for operational method coming from
multiple source systems, especially for enterprise data warehouses where all relevant data of an
enterprise is consolidated.
Mrs. Ujjwala S. Patil- SITCOE Yadrav Page 5
Data Warehousing and Data Mining (CS601)
Data Warehouse Staging Area is a temporary location where a record from source systems is
copied.
Data Warehouse Architecture: With Staging Area and Data Marts
We may want to customize our warehouse's architecture for multiple groups within our
organization.
Mrs. Ujjwala S. Patil- SITCOE Yadrav Page 6
Data Warehousing and Data Mining (CS601)
We can do this by adding data marts. A data mart is a segment of a data warehouses that can
provided information for reporting and analysis on a section, unit, department or operation in the
company, e.g., sales, payroll, production, etc.
The figure illustrates an example where purchasing, sales, and stocks are separated. In this
example, a financial analyst wants to analyze historical data for purchases and sales or mine
historical information to make predictions about customer behavior.
Components of Data Warehouse:
Architecture is the proper arrangement of the elements. We build a data warehouse with software
and hardware components. To suit the requirements of our organizations, we arrange these
building we may want to boost up another part with extra tools and services. All of these depends
on our circumstances.
Mrs. Ujjwala S. Patil- SITCOE Yadrav Page 7
Data Warehousing and Data Mining (CS601)
The figure shows the essential elements of a typical warehouse. We see the Source Data
component shows on the left. The Data staging element serves as the next building block. In the
middle, we see the Data Storage component that handles the data warehouses data. This element
not only stores and manages the data; it also keeps track of data using the metadata repository.
The Information Delivery component shows on the right consists of all the different ways of
making the information from the data warehouses available to the users.
Source Data Component
Source data coming into the data warehouses may be grouped into four broad categories:
Production Data: This type of data comes from the different operating systems of the
enterprise. Based on the data requirements in the data warehouse, we choose segments of the
data from the various operational modes.
Internal Data: In each organization, the client keeps their "private" spreadsheets, reports,
customer profiles, and sometimes even department databases. This is the internal data, part of
which could be useful in a data warehouse.
Archived Data: Operational systems are mainly intended to run the current business. In every
operational system, we periodically take the old data and store it in achieved files.
External Data: Most executives depend on information from external sources for a large
percentage of the information they use. They use statistics associating to their industry produced
by the external department.
Mrs. Ujjwala S. Patil- SITCOE Yadrav Page 8
Data Warehousing and Data Mining (CS601)
Data Staging Component
After we have been extracted data from various operational systems and external sources, we
have to prepare the files for storing in the data warehouse. The extracted data coming from
several different sources need to be changed, converted, and made ready in a format that is
relevant to be saved for querying and analysis.
We will now discuss the three primary functions that take place in the staging area.
1) Data Extraction: This method has to deal with numerous data sources. We have to employ
the appropriate techniques for each data source.
2) Data Transformation: As we know, data for a data warehouse comes from many different
sources. If data extraction for a data warehouse posture big challenges, data transformation
present even significant challenges. We perform several individual tasks as part of data
transformation.
First, we clean the data extracted from each source. Cleaning may be the correction of
misspellings or may deal with providing default values for missing data elements, or elimination
of duplicates when we bring in the same data from various source systems.
Standardization of data components forms a large part of data transformation. Data
transformation contains many forms of combining pieces of data from different sources. We
combine data from single source record or related data parts from many source records.
Mrs. Ujjwala S. Patil- SITCOE Yadrav Page 9
Data Warehousing and Data Mining (CS601)
On the other hand, data transformation also contains purging source data that is not useful and
separating outsource records into new combinations. Sorting and merging of data take place on a
large scale in the data staging area. When the data transformation function ends, we have a
collection of integrated data that is cleaned, standardized, and summarized.
3) Data Loading: Two distinct categories of tasks form data loading functions. When we
complete the structure and construction of the data warehouse and go live for the first time, we
do the initial loading of the information into the data warehouse storage. The initial load moves
high volumes of data using up a substantial amount of time.
Data Storage Components
Data storage for the data warehousing is a split repository. The data repositories for the
operational systems generally include only the current data. Also, these data repositories include
the data structured in highly normalized for fast and efficient processing.
Information Delivery Component
The information delivery element is used to enable the process of subscribing for data warehouse
files and having it transferred to one or more destinations according to some customer-specified
scheduling algorithm.
Mrs. Ujjwala S. Patil- SITCOE Yadrav Page 10
Data Warehousing and Data Mining (CS601)
Metadata Component
Metadata in a data warehouse is equal to the data dictionary or the data catalog in a database
management system. In the data dictionary, we keep the data about the logical data structures,
the data about the records and addresses, the information about the indexes, and so on.
Data Marts
It includes a subset of corporate-wide data that is of value to a specific group of users. The scope
is confined to particular selected subjects. Data in a data warehouse should be a fairly current,
but not mainly up to the minute, although development in the data warehouse industry has made
standard and incremental data dumps more achievable. Data marts are lower than data
warehouses and usually contain organization. The current trends in data warehousing are to
developed a data warehouse with several smaller related data marts for particular kinds of
queries and reports.
Management and Control Component
The management and control elements coordinate the services and functions within the data
warehouse. These components control the data transformation and the data transfer into the data
warehouse storage. On the other hand, it moderates the data delivery to the clients. Its work with
the database management systems and authorizes data to be correctly saved in the repositories. It
monitors the movement of information into the staging method and from there into the data
warehouses storage itself.
Extract-Transform- Loading:
The mechanism of extracting information from source systems and bringing it into the data
warehouse is commonly called ETL, which stands for Extraction, Transformation and
Loading.
The ETL process requires active inputs from various stakeholders, including developers,
analysts, testers, top executives and is technically challenging.
To maintain its value as a tool for decision-makers, Data warehouse technique needs to change
with business changes. ETL is a recurring method (daily, weekly, monthly) of a Data warehouse
system and needs to be agile, automated, and well documented.
Mrs. Ujjwala S. Patil- SITCOE Yadrav Page 11
Data Warehousing and Data Mining (CS601)
1. Extraction:
The first step of the ETL process is extraction. In this step, data from various source
systems is extracted which can be in various formats like relational databases, No SQL,
XML, and flat files into the staging area. It is important to extract the data from various
source systems and store it into the staging area first and not directly into the data
warehouse because the extracted data is in various formats and can be corrupted also.
Hence loading it directly into the data warehouse may damage it and rollback will be
much more difficult. Therefore, this is one of the most important steps of ETL process.
2. Transformation:
The second step of the ETL process is transformation. In this step, a set of rules or
functions are applied on the extracted data to convert it into a single standard format. It
may involve following processes/tasks:
 Filtering – loading only certain attributes into the data warehouse.
 Cleaning – filling up the NULL values with some default values, mapping U.S.A,
United States, and America into USA, etc.
 Joining – joining multiple attributes into one.
 Splitting – splitting a single attribute into multiple attributes.
 Sorting – sorting tuples on the basis of some attribute (generally key-attribute).
3. Loading:
The third and final step of the ETL process is loading. In this step, the transformed data is
finally loaded into the data warehouse. Sometimes the data is updated by loading into the
data warehouse very frequently and sometimes it is done after longer but regular intervals.
The rate and period of loading solely depends on the requirements and varies from system
to system.
Mrs. Ujjwala S. Patil- SITCOE Yadrav Page 12
Data Warehousing and Data Mining (CS601)
Data Modeling:
Data warehouse modeling is the process of designing the schemas of the detailed and
summarized information of the data warehouse. The goal of data warehouse modeling is to
develop a schema describing the reality, or at least a part of the fact, which the data warehouse is
needed to support.
Data Modeling Life Cycle:
In this section, we define a data modeling life cycle. It is a straight forward process of
transforming the business requirements to fulfill the goals for storing, maintaining, and accessing
the data within IT systems. The result is a logical and physical data model for an enterprise data
warehouse.
The objective of the data modeling life cycle is primarily the creation of a storage area for
business information. That area comes from the logical and physical data modeling stages, as
shown in Figure:
Logical Data Model
A logical data model defines the information in as much structure as possible, without observing
how they will be physically achieved in the database. The primary objective of logical data
modeling is to document the business data structures, processes, rules, and relationships by a
single view - the logical data model.
Physical Data Model
Physical data model describes how the model will be presented in the database. A physical
database model demonstrates all table structures, column names, data types, constraints, primary
key, foreign key, and relationships between tables. The purpose of physical data modeling is the
mapping of the logical data model to the physical structures of the RDBMS system hosting the
data warehouse. This contains defining physical RDBMS structures, such as tables and data
types to use when storing the information. It may also include the definition of new data
structures for enhancing query performance.
Mrs. Ujjwala S. Patil- SITCOE Yadrav Page 13
Data Warehousing and Data Mining (CS601)
Logical (Multi Dimensional) data Model:
A multidimensional model views data in the form of a data-cube. A data cube enables data to be
modeled and viewed in multiple dimensions. It is defined by dimensions and facts.
The multi-Dimensional Data Model is a method which is used for ordering data in the
database along with good arrangement and assembling of the contents in the database.
The Multi Dimensional Data Model allows customers to interrogate analytical questions
associated with market or business trends, unlike relational databases which allow customers
to access data in the form of queries. They allow users to rapidly receive answers to the
requests which they made by creating and examining the data comparatively fast.
OLAP (online analytical processing) and data warehousing uses multi dimensional databases.
It is used to show multiple dimensions of the data to users.
It represents data in the form of data cubes. Data cubes allow to model and view the data from
many dimensions and perspectives. It is defined by dimensions and facts and is represented by
a fact table. Facts are numerical measures and fact tables contain measures of the related
dimensional tables or names of the facts.
Multidimensional Data Representation
OLAP:
Mrs. Ujjwala S. Patil- SITCOE Yadrav Page 14
Data Warehousing and Data Mining (CS601)
At its core, an OLAP cube is a data structure designed for fast analysis of data based on multiple
dimensions.
OLAP cubes support various analytical operations that enhance data
exploration. These include:
 Slicing enables the selection of specific subsets of data based on one or more
dimensions.
 Dicing allows for the selection of specific combinations of dimension values. Drill-down
enables users to explore data at a more granular level by navigating hierarchies.
 Roll-up aggregates data to higher levels of summarization, facilitating broader analysis.
 Pivoting reorients the cube to view data from different dimensions, providing alternate
perspectives.
‍
Let's take an example to understand how an OLAP cube works. Imagine you are managing a
chain of retail stores, and you want to analyze sales data to gain insights into your business
performance. You have data about sales revenue, products, stores, and time periods (e.g., months
or quarters).
Mrs. Ujjwala S. Patil- SITCOE Yadrav Page 15
Data Warehousing and Data Mining (CS601)
‍
To create an OLAP cube, you would start by identifying the dimensions of your data. In this
case, the dimensions could be:
 Time (e.g., months, quarters, years)
 Product (e.g., categories, brands, individual products)
 Store (e.g., locations, regions, individual stores)
‍
The cube would then be structured with these dimensions forming the axes of the cube. Each
intersection point within the cube represents a specific combination of dimension values. For
example, one intersection point might represent the sales revenue for a particular product in a
specific store during a specific month.
OLAP Operations:
OLAP stands for Online Analytical Processing Server. It is a software technology that
allows users to analyze information from multiple database systems at the same time. It is
based on multidimensional data model and allows the user to query on multi-dimensional data
(eg. Delhi -> 2018 -> Sales data). OLAP databases are divided into one or more cubes and
these cubes are known as Hyper-cubes.
There are five basic analytical operations that can be performed on an OLAP cube:
1. Drill down:
In drill-down operation, the less detailed data is converted into highly detailed data. It can
be done by:
 Moving down in the concept hierarchy
 Adding a new dimension
In the cube given in overview section, the drill down operation is performed by moving
down in the concept hierarchy of Time dimension (Quarter -> Month).
Mrs. Ujjwala S. Patil- SITCOE Yadrav Page 16
Data Warehousing and Data Mining (CS601)
2. Roll up:
It is just opposite of the drill-down operation. It performs aggregation on the OLAP cube.
It can be done by:
 Climbing up in the concept hierarchy
 Reducing the dimensions
 In the cube given in the overview section, the roll-up operation is performed by climbing
up in the concept hierarchy of Location dimension (City -> Country).
3. Dice:
Mrs. Ujjwala S. Patil- SITCOE Yadrav Page 17
Data Warehousing and Data Mining (CS601)
It selects a sub-cube from the OLAP cube by selecting two or more dimensions. In the
cube given in the overview section, a sub-cube is selected by selecting following
dimensions with criteria:
 Location = “Delhi” or “Kolkata”
 Time = “Q1” or “Q2”
 Item = “Car” or “Bus”
4. Slice:
It selects a single dimension from the OLAP cube which results in a new sub-cube
creation. In the cube given in the overview section, Slice is performed on the dimension
Time = “Q1”.
5. Pivot:
It is also known as rotation operation as it rotates the current view to get a new view of
the representation. In the sub-cube obtained after the slice operation, performing pivot
operation gives a new view of it.
Mrs. Ujjwala S. Patil- SITCOE Yadrav Page 18
Data Warehousing and Data Mining (CS601)
OLAP Servers:
Online Analytical Processing(OLAP) refers to a set of software tools used for data analysis in
order to make business decisions. OLAP provides a platform for gaining insights from
databases retrieved from multiple database systems at the same time. It is based on a
multidimensional data model, which enables users to extract and view data from various
perspectives. A multidimensional database is used to store OLAP data. Many Business
Intelligence (BI) applications rely on OLAP technology.
Type of OLAP servers:
The three major types of OLAP servers are as follows:
 ROLAP
 MOLAP
 HOLAP
Relational OLAP (ROLAP):
Relational On-Line Analytical Processing (ROLAP) is primarily used for data stored in a
relational database, where both the base data and dimension tables are stored as relational
tables. ROLAP servers are used to bridge the gap between the relational back-end server and
the client’s front-end tools. ROLAP servers store and manage warehouse data using RDBMS,
and OLAP middleware fills in the gaps.
Mrs. Ujjwala S. Patil- SITCOE Yadrav Page 19
Data Warehousing and Data Mining (CS601)
Multidimensional OLAP (MOLAP):
Through array-based multidimensional storage engines, Multidimensional On-Line Analytical
Processing (MOLAP) supports multidimensional views of data. Storage utilization in
multidimensional data stores may be low if the data set is sparse.
MOLAP stores data on discs in the form of a specialized multidimensional array structure. It
is used for OLAP, which is based on the arrays’ random access capability. Dimension
instances determine array elements, and the data or measured value associated with each cell
is typically stored in the corresponding array element. The multidimensional array is typically
stored in MOLAP in a linear allocation based on nested traversal of the axes in some
predetermined order.
However, unlike ROLAP, which stores only records with non-zero facts, all array elements
are defined in MOLAP, and as a result, the arrays tend to be sparse, with empty elements
occupying a larger portion of them. MOLAP systems typically include provisions such as
advanced indexing and hashing to locate data while performing queries for handling sparse
arrays, because both storage and retrieval costs are important when evaluating online
performance. MOLAP cubes are ideal for slicing and dicing data and can perform complex
calculations. When the cube is created, all calculations are pre-generated.
Mrs. Ujjwala S. Patil- SITCOE Yadrav Page 20
Data Warehousing and Data Mining (CS601)
Hybrid OLAP (HOLAP):
ROLAP and MOLAP are combined in Hybrid On-Line Analytical Processing (HOLAP).
HOLAP offers greater scalability than ROLAP and faster computation than MOLAP.HOLAP
is a hybrid of ROLAP and MOLAP. HOLAP servers are capable of storing large amounts of
detailed data. On the one hand, HOLAP benefits from ROLAP’s greater scalability. HOLAP,
on the other hand, makes use of cube technology for faster performance and summary-type
information. Because detailed data is stored in a relational database, cubes are smaller than
MOLAP.
Mrs. Ujjwala S. Patil- SITCOE Yadrav Page 21

More Related Content

PPTX
Data Warehouse
MadhuriNigam1
 
PPTX
Datawarehouse
Ashish Kargwal
 
PDF
data warehousing and data mining (1).pdf
SCITprojects2022
 
PDF
TOPIC 9 data warehousing and data mining.pdf
SCITprojects2022
 
PPTX
Data warehousing
Anshika Nigam
 
PPTX
UNIT 2 DATA WAREHOUSING AND DATA MINING PRESENTATION.pptx
shruthisweety4
 
PPT
11667 Bitt I 2008 Lect4
ambujm
 
PPTX
module 1 DWDM (complete) chapter ppt.pptx
rakshajain287
 
Data Warehouse
MadhuriNigam1
 
Datawarehouse
Ashish Kargwal
 
data warehousing and data mining (1).pdf
SCITprojects2022
 
TOPIC 9 data warehousing and data mining.pdf
SCITprojects2022
 
Data warehousing
Anshika Nigam
 
UNIT 2 DATA WAREHOUSING AND DATA MINING PRESENTATION.pptx
shruthisweety4
 
11667 Bitt I 2008 Lect4
ambujm
 
module 1 DWDM (complete) chapter ppt.pptx
rakshajain287
 

Similar to Data Warehose and Data Mining Unit I.docx (20)

PPTX
Data warehouse
Yogendra Uikey
 
PPTX
MIS and Business Functions, TPS/DSS/ESS, MIS and Business Processes, Impact o...
ShivaniTiwari24572
 
PPT
Datawarehouse and OLAP
SAS SNDP YOGAM COLLEGE,KONNI
 
PPTX
DATA WAREHOUSING.2.pptx
GraceJoyMoleroCarwan
 
PPT
Data warehouse
krishna kumar singh
 
PPT
Data warehouse
Chakravarthi ch
 
PPT
Dataware housing
work
 
PPTX
DWDM Unit 1 (1).pptx
SalehaMariyam
 
PPTX
Data warehouse
safaataamsah
 
PPTX
presentationofism-complete-1-100227093028-phpapp01.pptx
vipush1
 
PPT
Various Applications of Data Warehouse.ppt
RafiulHasan19
 
PPTX
Business Intelligence Module 3_Datawarehousing.pptx
AmbikaVenkatesh4
 
PDF
Data Mining is the process ofData Mining is the process ofData Mining is the ...
naveedabbas61
 
PPTX
Data warehouse introduction
Murli Jha
 
PDF
Introduction to Data Warehouse
SOMASUNDARAM T
 
DOC
Oracle sql plsql & dw
Sateesh Kumar Sarvasiddi
 
PPT
Data warehousing and online analytical processing
VijayasankariS
 
PPTX
Data warehouse-complete-1-100227093028-phpapp01.pptx
ArunPatrick2
 
PPTX
datamining techniques and various tools.pptx
n200886
 
PPT
Ch1 data-warehousing
Ahmad Shlool
 
Data warehouse
Yogendra Uikey
 
MIS and Business Functions, TPS/DSS/ESS, MIS and Business Processes, Impact o...
ShivaniTiwari24572
 
Datawarehouse and OLAP
SAS SNDP YOGAM COLLEGE,KONNI
 
DATA WAREHOUSING.2.pptx
GraceJoyMoleroCarwan
 
Data warehouse
krishna kumar singh
 
Data warehouse
Chakravarthi ch
 
Dataware housing
work
 
DWDM Unit 1 (1).pptx
SalehaMariyam
 
Data warehouse
safaataamsah
 
presentationofism-complete-1-100227093028-phpapp01.pptx
vipush1
 
Various Applications of Data Warehouse.ppt
RafiulHasan19
 
Business Intelligence Module 3_Datawarehousing.pptx
AmbikaVenkatesh4
 
Data Mining is the process ofData Mining is the process ofData Mining is the ...
naveedabbas61
 
Data warehouse introduction
Murli Jha
 
Introduction to Data Warehouse
SOMASUNDARAM T
 
Oracle sql plsql & dw
Sateesh Kumar Sarvasiddi
 
Data warehousing and online analytical processing
VijayasankariS
 
Data warehouse-complete-1-100227093028-phpapp01.pptx
ArunPatrick2
 
datamining techniques and various tools.pptx
n200886
 
Ch1 data-warehousing
Ahmad Shlool
 
Ad

Recently uploaded (20)

PDF
Unit I Part II.pdf : Security Fundamentals
Dr. Madhuri Jawale
 
PDF
FLEX-LNG-Company-Presentation-Nov-2017.pdf
jbloggzs
 
PDF
Traditional Exams vs Continuous Assessment in Boarding Schools.pdf
The Asian School
 
PDF
Natural_Language_processing_Unit_I_notes.pdf
sanguleumeshit
 
PDF
Introduction to Data Science: data science process
ShivarkarSandip
 
PDF
LEAP-1B presedntation xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
hatem173148
 
PDF
Zero carbon Building Design Guidelines V4
BassemOsman1
 
PPTX
Tunnel Ventilation System in Kanpur Metro
220105053
 
PPT
SCOPE_~1- technology of green house and poyhouse
bala464780
 
PDF
Biodegradable Plastics: Innovations and Market Potential (www.kiu.ac.ug)
publication11
 
PPTX
22PCOAM21 Session 2 Understanding Data Source.pptx
Guru Nanak Technical Institutions
 
PDF
dse_final_merit_2025_26 gtgfffffcjjjuuyy
rushabhjain127
 
PPTX
business incubation centre aaaaaaaaaaaaaa
hodeeesite4
 
PPTX
22PCOAM21 Session 1 Data Management.pptx
Guru Nanak Technical Institutions
 
PPT
1. SYSTEMS, ROLES, AND DEVELOPMENT METHODOLOGIES.ppt
zilow058
 
PDF
Software Testing Tools - names and explanation
shruti533256
 
PDF
The Effect of Artifact Removal from EEG Signals on the Detection of Epileptic...
Partho Prosad
 
PPTX
FUNDAMENTALS OF ELECTRIC VEHICLES UNIT-1
MikkiliSuresh
 
PPTX
MSME 4.0 Template idea hackathon pdf to understand
alaudeenaarish
 
PPTX
database slide on modern techniques for optimizing database queries.pptx
aky52024
 
Unit I Part II.pdf : Security Fundamentals
Dr. Madhuri Jawale
 
FLEX-LNG-Company-Presentation-Nov-2017.pdf
jbloggzs
 
Traditional Exams vs Continuous Assessment in Boarding Schools.pdf
The Asian School
 
Natural_Language_processing_Unit_I_notes.pdf
sanguleumeshit
 
Introduction to Data Science: data science process
ShivarkarSandip
 
LEAP-1B presedntation xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
hatem173148
 
Zero carbon Building Design Guidelines V4
BassemOsman1
 
Tunnel Ventilation System in Kanpur Metro
220105053
 
SCOPE_~1- technology of green house and poyhouse
bala464780
 
Biodegradable Plastics: Innovations and Market Potential (www.kiu.ac.ug)
publication11
 
22PCOAM21 Session 2 Understanding Data Source.pptx
Guru Nanak Technical Institutions
 
dse_final_merit_2025_26 gtgfffffcjjjuuyy
rushabhjain127
 
business incubation centre aaaaaaaaaaaaaa
hodeeesite4
 
22PCOAM21 Session 1 Data Management.pptx
Guru Nanak Technical Institutions
 
1. SYSTEMS, ROLES, AND DEVELOPMENT METHODOLOGIES.ppt
zilow058
 
Software Testing Tools - names and explanation
shruti533256
 
The Effect of Artifact Removal from EEG Signals on the Detection of Epileptic...
Partho Prosad
 
FUNDAMENTALS OF ELECTRIC VEHICLES UNIT-1
MikkiliSuresh
 
MSME 4.0 Template idea hackathon pdf to understand
alaudeenaarish
 
database slide on modern techniques for optimizing database queries.pptx
aky52024
 
Ad

Data Warehose and Data Mining Unit I.docx

  • 1. Data Warehousing and Data Mining (CS601) Unit I: Data Warehouse and OLAP Introduction to data Warehouse: A Data Warehouse is separate from DBMS, it stores a huge amount of data, which is typically collected from multiple heterogeneous sources like files, DBMS, etc. The goal is to produce statistical results that may help in decision-making. Data Warehouse is a central place where data is stored from different data sources and applications. A Data Warehouse is always kept separate from an Operational Database. A data warehouse can also be viewed as a database for historical data from different functions within a company. The term Data Warehouse was coined by Bill Inmon in 1990, which he defined in the following way: "A warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process". A Data Warehouse is used for reporting and analyzing of information and stores both historical and current data. The data in DW system is used for Analytical reporting, which is later used by Business Analysts, Sales Managers or Knowledge workers for decision-making. In the above image, you can see that the data is coming from multiple heterogeneous data sources to a Data Warehouse. Common data sources for a data warehouse includes − Mrs. Ujjwala S. Patil- SITCOE Yadrav Page 1
  • 2. Data Warehousing and Data Mining (CS601)  Operational databases  SAP and non-SAP Applications  Flat Files (xls, csv, txt files) Data in data warehouse is accessed by BI (Business Intelligence) users for Analytical Reporting, Data Mining and Analysis. This is used for decision making by Business Users, Sales Manager, Analysts to define future strategy. Difference between Operational Database System and Data Warehouse: Operational Database Data Warehouse Operational systems are designed to support high-volume transaction processing. Data warehousing systems are typically designed to support high-volume analytical processing (i.e., OLAP). Operational systems are usually concerned with current data. Data warehousing systems are usually concerned with historical data. Data within operational systems are mainly updated regularly according to need. Non-volatile, new data may be added regularly. Once Added rarely changed. It is designed for real-time business dealing and processes. It is designed for analysis of business measures by subject area, categories, and attributes. It is optimized for a simple set of transactions, generally adding or retrieving a single row at a time per table. It is optimized for extent loads and high, complex, unpredictable queries that access many rows per table. It is optimized for validation of incoming information during transactions, uses validation data tables. Loaded with consistent, valid information, requires no real-time validation. It supports thousands of concurrent clients. It supports a few concurrent clients relative to OLTP. Operational systems are widely process- oriented. Data warehousing systems are widely subject- oriented Operational systems are usually optimized to perform fast inserts and updates of associatively small volumes of data. Data warehousing systems are usually optimized to perform fast retrievals of relatively high volumes of data. Mrs. Ujjwala S. Patil- SITCOE Yadrav Page 2
  • 3. Data Warehousing and Data Mining (CS601) Data In Data Out Less Number of data accessed. Large Number of data accessed. Relational databases are created for on-line transactional Processing (OLTP) Data Warehouse designed for on-line Analytical Processing (OLAP) Data Warehouse Characteristics: Integrated Data: One of the key characteristics of a data warehouse is that it contains integrated data. This means that the data is collected from various sources, such as transactional systems, and then cleaned, transformed, and consolidated into a single, unified view. This allows for easy access and analysis of the data, as well as the ability to track data over time. Subject-Oriented: A data warehouse is also subject-oriented, which means that the data is organized around specific subjects, such as customers, products, or sales. This allows for easy access to the data relevant to a specific subject, as well as the ability to track the data over time. Non-Volatile: Another characteristic of a data warehouse is that it is non-volatile. This means that the data in the warehouse is never updated or deleted, only added to. This is important because it allows for the preservation of historical data, making it possible to track trends and patterns over time. Time-Variant: A data warehouse is also time-variant, which means that the data is stored with a time dimension. This allows for easy access to data for specific time periods, such as last quarter or last year. This makes it possible to track trends and patterns over time. Data Warehouse architecture and its components: Data Warehouse architecture: Mrs. Ujjwala S. Patil- SITCOE Yadrav Page 3
  • 4. Data Warehousing and Data Mining (CS601) A data warehouse architecture is a method of defining the overall architecture of data communication processing and presentation that exist for end-clients computing within the enterprise. Each data warehouse is different, but all are characterized by standard vital components. Data Warehouse applications are designed to support the user ad-hoc data requirements, an activity recently dubbed online analytical processing (OLAP). These include applications such as forecasting, profiling, summary reporting, and trend analysis. Three common architectures are: o Data Warehouse Architecture: Basic o Data Warehouse Architecture: With Staging Area o Data Warehouse Architecture: With Staging Area and Data Marts Data Warehouse Architecture: Basic Operational System An operational system is a method used in data warehousing to refer to a system that is used to process the day-to-day transactions of an organization. Flat Files A Flat file system is a system of files in which transactional data is stored, and every file in the system must have a different name. Meta Data Mrs. Ujjwala S. Patil- SITCOE Yadrav Page 4
  • 5. Data Warehousing and Data Mining (CS601) A set of data that defines and gives information about other data. Meta Data used in Data Warehouse for a variety of purpose, including: Meta Data summarizes necessary information about data, which can make finding and work with particular instances of data more accessible. For example, author, data build, and data changed, and file size are examples of very basic document metadata. Metadata is used to direct a query to the most appropriate data source. Lightly and highly summarized data The area of the data warehouse saves all the predefined lightly and highly summarized (aggregated) data generated by the warehouse manager. The goals of the summarized information are to speed up query performance. The summarized record is updated continuously as new information is loaded into the warehouse. End-User access Tools The principal purpose of a data warehouse is to provide information to the business managers for strategic decision-making. These customers interact with the warehouse using end-client access tools. The examples of some of the end-user access tools can be: o Reporting and Query Tools o Application Development Tools o Executive Information Systems Tools o Online Analytical Processing Tools o Data Mining Tools Data Warehouse Architecture: With Staging Area We must clean and process your operational information before put it into the warehouse. We can do this programmatically, although data warehouses uses a staging area (A place where data is processed before entering the warehouse). A staging area simplifies data cleansing and consolidation for operational method coming from multiple source systems, especially for enterprise data warehouses where all relevant data of an enterprise is consolidated. Mrs. Ujjwala S. Patil- SITCOE Yadrav Page 5
  • 6. Data Warehousing and Data Mining (CS601) Data Warehouse Staging Area is a temporary location where a record from source systems is copied. Data Warehouse Architecture: With Staging Area and Data Marts We may want to customize our warehouse's architecture for multiple groups within our organization. Mrs. Ujjwala S. Patil- SITCOE Yadrav Page 6
  • 7. Data Warehousing and Data Mining (CS601) We can do this by adding data marts. A data mart is a segment of a data warehouses that can provided information for reporting and analysis on a section, unit, department or operation in the company, e.g., sales, payroll, production, etc. The figure illustrates an example where purchasing, sales, and stocks are separated. In this example, a financial analyst wants to analyze historical data for purchases and sales or mine historical information to make predictions about customer behavior. Components of Data Warehouse: Architecture is the proper arrangement of the elements. We build a data warehouse with software and hardware components. To suit the requirements of our organizations, we arrange these building we may want to boost up another part with extra tools and services. All of these depends on our circumstances. Mrs. Ujjwala S. Patil- SITCOE Yadrav Page 7
  • 8. Data Warehousing and Data Mining (CS601) The figure shows the essential elements of a typical warehouse. We see the Source Data component shows on the left. The Data staging element serves as the next building block. In the middle, we see the Data Storage component that handles the data warehouses data. This element not only stores and manages the data; it also keeps track of data using the metadata repository. The Information Delivery component shows on the right consists of all the different ways of making the information from the data warehouses available to the users. Source Data Component Source data coming into the data warehouses may be grouped into four broad categories: Production Data: This type of data comes from the different operating systems of the enterprise. Based on the data requirements in the data warehouse, we choose segments of the data from the various operational modes. Internal Data: In each organization, the client keeps their "private" spreadsheets, reports, customer profiles, and sometimes even department databases. This is the internal data, part of which could be useful in a data warehouse. Archived Data: Operational systems are mainly intended to run the current business. In every operational system, we periodically take the old data and store it in achieved files. External Data: Most executives depend on information from external sources for a large percentage of the information they use. They use statistics associating to their industry produced by the external department. Mrs. Ujjwala S. Patil- SITCOE Yadrav Page 8
  • 9. Data Warehousing and Data Mining (CS601) Data Staging Component After we have been extracted data from various operational systems and external sources, we have to prepare the files for storing in the data warehouse. The extracted data coming from several different sources need to be changed, converted, and made ready in a format that is relevant to be saved for querying and analysis. We will now discuss the three primary functions that take place in the staging area. 1) Data Extraction: This method has to deal with numerous data sources. We have to employ the appropriate techniques for each data source. 2) Data Transformation: As we know, data for a data warehouse comes from many different sources. If data extraction for a data warehouse posture big challenges, data transformation present even significant challenges. We perform several individual tasks as part of data transformation. First, we clean the data extracted from each source. Cleaning may be the correction of misspellings or may deal with providing default values for missing data elements, or elimination of duplicates when we bring in the same data from various source systems. Standardization of data components forms a large part of data transformation. Data transformation contains many forms of combining pieces of data from different sources. We combine data from single source record or related data parts from many source records. Mrs. Ujjwala S. Patil- SITCOE Yadrav Page 9
  • 10. Data Warehousing and Data Mining (CS601) On the other hand, data transformation also contains purging source data that is not useful and separating outsource records into new combinations. Sorting and merging of data take place on a large scale in the data staging area. When the data transformation function ends, we have a collection of integrated data that is cleaned, standardized, and summarized. 3) Data Loading: Two distinct categories of tasks form data loading functions. When we complete the structure and construction of the data warehouse and go live for the first time, we do the initial loading of the information into the data warehouse storage. The initial load moves high volumes of data using up a substantial amount of time. Data Storage Components Data storage for the data warehousing is a split repository. The data repositories for the operational systems generally include only the current data. Also, these data repositories include the data structured in highly normalized for fast and efficient processing. Information Delivery Component The information delivery element is used to enable the process of subscribing for data warehouse files and having it transferred to one or more destinations according to some customer-specified scheduling algorithm. Mrs. Ujjwala S. Patil- SITCOE Yadrav Page 10
  • 11. Data Warehousing and Data Mining (CS601) Metadata Component Metadata in a data warehouse is equal to the data dictionary or the data catalog in a database management system. In the data dictionary, we keep the data about the logical data structures, the data about the records and addresses, the information about the indexes, and so on. Data Marts It includes a subset of corporate-wide data that is of value to a specific group of users. The scope is confined to particular selected subjects. Data in a data warehouse should be a fairly current, but not mainly up to the minute, although development in the data warehouse industry has made standard and incremental data dumps more achievable. Data marts are lower than data warehouses and usually contain organization. The current trends in data warehousing are to developed a data warehouse with several smaller related data marts for particular kinds of queries and reports. Management and Control Component The management and control elements coordinate the services and functions within the data warehouse. These components control the data transformation and the data transfer into the data warehouse storage. On the other hand, it moderates the data delivery to the clients. Its work with the database management systems and authorizes data to be correctly saved in the repositories. It monitors the movement of information into the staging method and from there into the data warehouses storage itself. Extract-Transform- Loading: The mechanism of extracting information from source systems and bringing it into the data warehouse is commonly called ETL, which stands for Extraction, Transformation and Loading. The ETL process requires active inputs from various stakeholders, including developers, analysts, testers, top executives and is technically challenging. To maintain its value as a tool for decision-makers, Data warehouse technique needs to change with business changes. ETL is a recurring method (daily, weekly, monthly) of a Data warehouse system and needs to be agile, automated, and well documented. Mrs. Ujjwala S. Patil- SITCOE Yadrav Page 11
  • 12. Data Warehousing and Data Mining (CS601) 1. Extraction: The first step of the ETL process is extraction. In this step, data from various source systems is extracted which can be in various formats like relational databases, No SQL, XML, and flat files into the staging area. It is important to extract the data from various source systems and store it into the staging area first and not directly into the data warehouse because the extracted data is in various formats and can be corrupted also. Hence loading it directly into the data warehouse may damage it and rollback will be much more difficult. Therefore, this is one of the most important steps of ETL process. 2. Transformation: The second step of the ETL process is transformation. In this step, a set of rules or functions are applied on the extracted data to convert it into a single standard format. It may involve following processes/tasks:  Filtering – loading only certain attributes into the data warehouse.  Cleaning – filling up the NULL values with some default values, mapping U.S.A, United States, and America into USA, etc.  Joining – joining multiple attributes into one.  Splitting – splitting a single attribute into multiple attributes.  Sorting – sorting tuples on the basis of some attribute (generally key-attribute). 3. Loading: The third and final step of the ETL process is loading. In this step, the transformed data is finally loaded into the data warehouse. Sometimes the data is updated by loading into the data warehouse very frequently and sometimes it is done after longer but regular intervals. The rate and period of loading solely depends on the requirements and varies from system to system. Mrs. Ujjwala S. Patil- SITCOE Yadrav Page 12
  • 13. Data Warehousing and Data Mining (CS601) Data Modeling: Data warehouse modeling is the process of designing the schemas of the detailed and summarized information of the data warehouse. The goal of data warehouse modeling is to develop a schema describing the reality, or at least a part of the fact, which the data warehouse is needed to support. Data Modeling Life Cycle: In this section, we define a data modeling life cycle. It is a straight forward process of transforming the business requirements to fulfill the goals for storing, maintaining, and accessing the data within IT systems. The result is a logical and physical data model for an enterprise data warehouse. The objective of the data modeling life cycle is primarily the creation of a storage area for business information. That area comes from the logical and physical data modeling stages, as shown in Figure: Logical Data Model A logical data model defines the information in as much structure as possible, without observing how they will be physically achieved in the database. The primary objective of logical data modeling is to document the business data structures, processes, rules, and relationships by a single view - the logical data model. Physical Data Model Physical data model describes how the model will be presented in the database. A physical database model demonstrates all table structures, column names, data types, constraints, primary key, foreign key, and relationships between tables. The purpose of physical data modeling is the mapping of the logical data model to the physical structures of the RDBMS system hosting the data warehouse. This contains defining physical RDBMS structures, such as tables and data types to use when storing the information. It may also include the definition of new data structures for enhancing query performance. Mrs. Ujjwala S. Patil- SITCOE Yadrav Page 13
  • 14. Data Warehousing and Data Mining (CS601) Logical (Multi Dimensional) data Model: A multidimensional model views data in the form of a data-cube. A data cube enables data to be modeled and viewed in multiple dimensions. It is defined by dimensions and facts. The multi-Dimensional Data Model is a method which is used for ordering data in the database along with good arrangement and assembling of the contents in the database. The Multi Dimensional Data Model allows customers to interrogate analytical questions associated with market or business trends, unlike relational databases which allow customers to access data in the form of queries. They allow users to rapidly receive answers to the requests which they made by creating and examining the data comparatively fast. OLAP (online analytical processing) and data warehousing uses multi dimensional databases. It is used to show multiple dimensions of the data to users. It represents data in the form of data cubes. Data cubes allow to model and view the data from many dimensions and perspectives. It is defined by dimensions and facts and is represented by a fact table. Facts are numerical measures and fact tables contain measures of the related dimensional tables or names of the facts. Multidimensional Data Representation OLAP: Mrs. Ujjwala S. Patil- SITCOE Yadrav Page 14
  • 15. Data Warehousing and Data Mining (CS601) At its core, an OLAP cube is a data structure designed for fast analysis of data based on multiple dimensions. OLAP cubes support various analytical operations that enhance data exploration. These include:  Slicing enables the selection of specific subsets of data based on one or more dimensions.  Dicing allows for the selection of specific combinations of dimension values. Drill-down enables users to explore data at a more granular level by navigating hierarchies.  Roll-up aggregates data to higher levels of summarization, facilitating broader analysis.  Pivoting reorients the cube to view data from different dimensions, providing alternate perspectives. ‍ Let's take an example to understand how an OLAP cube works. Imagine you are managing a chain of retail stores, and you want to analyze sales data to gain insights into your business performance. You have data about sales revenue, products, stores, and time periods (e.g., months or quarters). Mrs. Ujjwala S. Patil- SITCOE Yadrav Page 15
  • 16. Data Warehousing and Data Mining (CS601) ‍ To create an OLAP cube, you would start by identifying the dimensions of your data. In this case, the dimensions could be:  Time (e.g., months, quarters, years)  Product (e.g., categories, brands, individual products)  Store (e.g., locations, regions, individual stores) ‍ The cube would then be structured with these dimensions forming the axes of the cube. Each intersection point within the cube represents a specific combination of dimension values. For example, one intersection point might represent the sales revenue for a particular product in a specific store during a specific month. OLAP Operations: OLAP stands for Online Analytical Processing Server. It is a software technology that allows users to analyze information from multiple database systems at the same time. It is based on multidimensional data model and allows the user to query on multi-dimensional data (eg. Delhi -> 2018 -> Sales data). OLAP databases are divided into one or more cubes and these cubes are known as Hyper-cubes. There are five basic analytical operations that can be performed on an OLAP cube: 1. Drill down: In drill-down operation, the less detailed data is converted into highly detailed data. It can be done by:  Moving down in the concept hierarchy  Adding a new dimension In the cube given in overview section, the drill down operation is performed by moving down in the concept hierarchy of Time dimension (Quarter -> Month). Mrs. Ujjwala S. Patil- SITCOE Yadrav Page 16
  • 17. Data Warehousing and Data Mining (CS601) 2. Roll up: It is just opposite of the drill-down operation. It performs aggregation on the OLAP cube. It can be done by:  Climbing up in the concept hierarchy  Reducing the dimensions  In the cube given in the overview section, the roll-up operation is performed by climbing up in the concept hierarchy of Location dimension (City -> Country). 3. Dice: Mrs. Ujjwala S. Patil- SITCOE Yadrav Page 17
  • 18. Data Warehousing and Data Mining (CS601) It selects a sub-cube from the OLAP cube by selecting two or more dimensions. In the cube given in the overview section, a sub-cube is selected by selecting following dimensions with criteria:  Location = “Delhi” or “Kolkata”  Time = “Q1” or “Q2”  Item = “Car” or “Bus” 4. Slice: It selects a single dimension from the OLAP cube which results in a new sub-cube creation. In the cube given in the overview section, Slice is performed on the dimension Time = “Q1”. 5. Pivot: It is also known as rotation operation as it rotates the current view to get a new view of the representation. In the sub-cube obtained after the slice operation, performing pivot operation gives a new view of it. Mrs. Ujjwala S. Patil- SITCOE Yadrav Page 18
  • 19. Data Warehousing and Data Mining (CS601) OLAP Servers: Online Analytical Processing(OLAP) refers to a set of software tools used for data analysis in order to make business decisions. OLAP provides a platform for gaining insights from databases retrieved from multiple database systems at the same time. It is based on a multidimensional data model, which enables users to extract and view data from various perspectives. A multidimensional database is used to store OLAP data. Many Business Intelligence (BI) applications rely on OLAP technology. Type of OLAP servers: The three major types of OLAP servers are as follows:  ROLAP  MOLAP  HOLAP Relational OLAP (ROLAP): Relational On-Line Analytical Processing (ROLAP) is primarily used for data stored in a relational database, where both the base data and dimension tables are stored as relational tables. ROLAP servers are used to bridge the gap between the relational back-end server and the client’s front-end tools. ROLAP servers store and manage warehouse data using RDBMS, and OLAP middleware fills in the gaps. Mrs. Ujjwala S. Patil- SITCOE Yadrav Page 19
  • 20. Data Warehousing and Data Mining (CS601) Multidimensional OLAP (MOLAP): Through array-based multidimensional storage engines, Multidimensional On-Line Analytical Processing (MOLAP) supports multidimensional views of data. Storage utilization in multidimensional data stores may be low if the data set is sparse. MOLAP stores data on discs in the form of a specialized multidimensional array structure. It is used for OLAP, which is based on the arrays’ random access capability. Dimension instances determine array elements, and the data or measured value associated with each cell is typically stored in the corresponding array element. The multidimensional array is typically stored in MOLAP in a linear allocation based on nested traversal of the axes in some predetermined order. However, unlike ROLAP, which stores only records with non-zero facts, all array elements are defined in MOLAP, and as a result, the arrays tend to be sparse, with empty elements occupying a larger portion of them. MOLAP systems typically include provisions such as advanced indexing and hashing to locate data while performing queries for handling sparse arrays, because both storage and retrieval costs are important when evaluating online performance. MOLAP cubes are ideal for slicing and dicing data and can perform complex calculations. When the cube is created, all calculations are pre-generated. Mrs. Ujjwala S. Patil- SITCOE Yadrav Page 20
  • 21. Data Warehousing and Data Mining (CS601) Hybrid OLAP (HOLAP): ROLAP and MOLAP are combined in Hybrid On-Line Analytical Processing (HOLAP). HOLAP offers greater scalability than ROLAP and faster computation than MOLAP.HOLAP is a hybrid of ROLAP and MOLAP. HOLAP servers are capable of storing large amounts of detailed data. On the one hand, HOLAP benefits from ROLAP’s greater scalability. HOLAP, on the other hand, makes use of cube technology for faster performance and summary-type information. Because detailed data is stored in a relational database, cubes are smaller than MOLAP. Mrs. Ujjwala S. Patil- SITCOE Yadrav Page 21