Overview of Distributed Data Warehouse
Introduction
Most
companies develop and maintain a single, centralised data warehouse system.
Only corporate headquarters uses an integrated view of the data in the
warehouse, which is integrated across the entire organisation. The corporation
employs a centralised business model. Given the amount of data in the data
warehouse, a single, central store of data makes sense. Even if data could be
integrated, if it were dispersed among multiple local sites, it would be
challenging to access. Politics, economics, and technology are, in essence,
strongly in favour of a single, central data warehouse.
Distributed
Warehouse
When
creating a data warehouse, there are two options: distributed data warehouses
and basic data warehouses. As a result, several companies made the decision to
create flexible, small data marts that are tailored to particular business
sectors. Distributed data warehousing is a type of data warehousing
architecture where data is stored and managed across a network of
decentralized, independent computer systems. This architecture is often used in
organizations with large, distributed databases. The distributed data
warehousing system is more and more popular and useful with the latest
technology. All the business corporations, institutes or any management deals
with a large amount of data so it is not feasible for them to manage it for
that purpose they need to distribute that data for further processes. The data
is stored at different data warehouse sites locally should operate
simultaneously to create one large data processing unit.
Framework
For Distributed Data Warehouses:
Inmon's
Approach
Inmon's
method assumes that data stored in the global and local data warehouses are
mutually exclusive. Data from a local data warehouse is pre-staged at each
local site before being sent to the central global data warehouse, which offers
the global DSS (Decision Support System) functionality.
Fig-1: Inmon’s Approach to Distributed
Data Warehouses
Inmon's Approach to Distributed Data Warehouses |
White's
Approach
White's method, commonly
referred to as a "Two-Tier Data Warehouse," combines a decentralised
data mart with a centralised data warehouse. A particular user or user group
will find value in the denormalized and summarised data found in the data mart
or decentralised data mart. White's core data warehouse houses cleansed and
normalised detailed data that is periodically pulled from operational systems.
Data collections made up of data obtained from the detailed base data are kept
up to date in the central data warehouse. Data collections, which can include
both summarised and denormalized comprehensive data, provide the user's
perspective of warehouse data.
Fig-2: White's Approach to Distributed
Data Warehouses.
White's Approach to Distributed Data Warehouses. |
- The Distributed Warehouse Architecture
The
Distributed System Architecture The ANSI/SPARC design, which comprises three
levels of schemas—internal, conceptual, and external—is the foundation of the
distributed data warehouse system architecture. The data integration layer,
which contains the source database systems and the processes necessary to
integrate the item, is the first layer of the four-tiered architecture [1].
Using a homogenous model, the data staging layer merges subject-oriented and
recent detailed data. restricted ability to adapt to changing information
needs. As adjustments are made to the operational systems, the data
distribution layer allocates, segments, and 3 updates the distributed data
warehouse. By giving a corporate-wide view of the dispersed data throughout the
network, the distributed data warehouse management lager is in charge of
interacting with the decision support environment.
- Data Integration Layer
The
data integration layer consists of the source databases available across the
sites and the integration and transformation tools. Each source at each site
has its own Local Internal Schema (LIS) and Local Conceptual Schema. The LIS
defines the physical data organization on the source database.
- The Data Staging Layer
The
data staging layer stores the integrated, subject oriented, current-value and
detailed data. The underlying model for the staging layer is a canonical data
model. The staging layer will be transformed into the Global Conceptual Schema
(GCS) under the data warehouse model.
- The Data Distribution Layer
The
data distribution layer provides the following processes: fragmentation,
allocation and updating the distributed data warehouse. The main objective of
the fragmentation and allocation processes is to minimize the total transaction
processing cost for a given set of transactions.
- Manager Layer
Distributed data warehouse manager layer manages the fragments at each site. Fragments represent the integrated, subject oriented, non-Volatile, time Variant and detailed data. End users at each local site are supported by External Schema (ES) to allow them to execute the DSS applications.
Types of Distributed Data Warehouse [DDW]–
There
are 3 types of Distributed data warehouse re as follows:
1. Local and Global Data
Warehouses-
A
Local data which is unique to the local operating system and global is
integrated data.
For
example, Fig1-SBI is a local DW and RBI is a Global DW.
2.
Technologically
distributed data warehouse-
A
DW which is logically a single DW but physically it is a combination of
multiple data warehouses.
3.
Independently
evolving distributed data warehouse-
It
is made in an uncoordinated data warehouse. If the storage of first data
warehouse is full then there will be a second DW and after that one by one it’s
going on.