Saturday, 10 September 2022

    Overview of Distributed Data Warehouse 

 

Introduction

Most companies develop and maintain a single, centralised data warehouse system. Only corporate headquarters uses an integrated view of the data in the warehouse, which is integrated across the entire organisation. The corporation employs a centralised business model. Given the amount of data in the data warehouse, a single, central store of data makes sense. Even if data could be integrated, if it were dispersed among multiple local sites, it would be challenging to access. Politics, economics, and technology are, in essence, strongly in favour of a single, central data warehouse.

 

Distributed Warehouse

When creating a data warehouse, there are two options: distributed data warehouses and basic data warehouses. As a result, several companies made the decision to create flexible, small data marts that are tailored to particular business sectors. Distributed data warehousing is a type of data warehousing architecture where data is stored and managed across a network of decentralized, independent computer systems. This architecture is often used in organizations with large, distributed databases. The distributed data warehousing system is more and more popular and useful with the latest technology. All the business corporations, institutes or any management deals with a large amount of data so it is not feasible for them to manage it for that purpose they need to distribute that data for further processes. The data is stored at different data warehouse sites locally should operate simultaneously to create one large data processing unit.

 

Framework For Distributed Data Warehouses:

Inmon's Approach

Inmon's method assumes that data stored in the global and local data warehouses are mutually exclusive. Data from a local data warehouse is pre-staged at each local site before being sent to the central global data warehouse, which offers the global DSS (Decision Support System) functionality.

Fig-1: Inmon’s Approach to Distributed Data Warehouses


Inmon's Approach to Distributed Data Warehouses


White's Approach

White's method, commonly referred to as a "Two-Tier Data Warehouse," combines a decentralised data mart with a centralised data warehouse. A particular user or user group will find value in the denormalized and summarised data found in the data mart or decentralised data mart. White's core data warehouse houses cleansed and normalised detailed data that is periodically pulled from operational systems. Data collections made up of data obtained from the detailed base data are kept up to date in the central data warehouse. Data collections, which can include both summarised and denormalized comprehensive data, provide the user's perspective of warehouse data.

Fig-2: White's Approach to Distributed Data Warehouses.


White's Approach to Distributed Data Warehouses.




  • The Distributed Warehouse Architecture

The Distributed System Architecture The ANSI/SPARC design, which comprises three levels of schemas—internal, conceptual, and external—is the foundation of the distributed data warehouse system architecture. The data integration layer, which contains the source database systems and the processes necessary to integrate the item, is the first layer of the four-tiered architecture [1]. Using a homogenous model, the data staging layer merges subject-oriented and recent detailed data. restricted ability to adapt to changing information needs. As adjustments are made to the operational systems, the data distribution layer allocates, segments, and 3 updates the distributed data warehouse. By giving a corporate-wide view of the dispersed data throughout the network, the distributed data warehouse management lager is in charge of interacting with the decision support environment.

  • Data Integration Layer

The data integration layer consists of the source databases available across the sites and the integration and transformation tools. Each source at each site has its own Local Internal Schema (LIS) and Local Conceptual Schema. The LIS defines the physical data organization on the source database.

  • The Data Staging Layer

The data staging layer stores the integrated, subject oriented, current-value and detailed data. The underlying model for the staging layer is a canonical data model. The staging layer will be transformed into the Global Conceptual Schema (GCS) under the data warehouse model.

  • The Data Distribution Layer

The data distribution layer provides the following processes: fragmentation, allocation and updating the distributed data warehouse. The main objective of the fragmentation and allocation processes is to minimize the total transaction processing cost for a given set of transactions.

  • Manager Layer

Distributed data warehouse manager layer manages the fragments at each site. Fragments represent the integrated, subject oriented, non-Volatile, time Variant and detailed data. End users at each local site are supported by External Schema (ES) to allow them to execute the DSS applications.


Types of Distributed Data Warehouse [DDW]–

There are 3 types of Distributed data warehouse re as follows:

1.     Local and Global Data Warehouses-

A Local data which is unique to the local operating system and global is integrated data.

For example, Fig1-SBI is a local DW and RBI is a Global DW.

2.     Technologically distributed data warehouse-

A DW which is logically a single DW but physically it is a combination of multiple data warehouses.

3.     Independently evolving distributed data warehouse-

It is made in an uncoordinated data warehouse. If the storage of first data warehouse is full then there will be a second DW and after that one by one it’s going on.

 

Advantages of Distributed Data Warehouse –

One of the key advantages of DDW is that it allows for the distribution of data and analytic processing across a wide variety of locations. This can help to improve performance and scalability, as well as provide a more robust disaster recovery solution. Additionally, a DDW provides better security and privacy controls, as each location can have its security measure in place There are a few advantages of Distributed Data Warehouse including,

1.     Scalability and flexibility A distributed data warehouse can be scaled up or down as needed, making it easier to handle fluctuating needs. For example, if more data needs to be stored, additional nodes can be added to the system.

2.     Improved performance – A distributed data warehouse can provide faster query processing times than a traditional data warehouse. This is because data is spread across multiple nodes, which can be processed in parallel.

3.     Availability –By storing data in multiple locations, a distributed data warehouse can help ensure high availability in an outage at one location. If a distributed data warehouse supports replicated data at more than one site, so a crash or failure of a communication link at one or more of the sites does not necessarily make the warehouse data inaccessible.

4.     Reduced costs –A distributed data warehouse can be less expensive to maintain than a centralized data warehouse since it requires fewer resources (eg., hardware, software, and personnel))


Disadvantages of Distributed Data Warehouse[4] –

1.     Security - A DDW utilizes a network, which introduces weak security.

2.     Complexity –In Distributed data warehouses data are stored in multiple sites so it can be more difficult to manage, access, and maintain than traditional data warehouses.

3.     Cost -DDW is distributed worldwide so each site must have people to maintain the system.

4.     Data integrityThere is a potential for data inconsistencies if different parts of the data warehouse are not updated at the same time. There is also potential for decreased performance as data is spread out across multiple locations. Due to multiple sites excessive network traffic starts.


Conclusion

The distributed data warehouses that have been utilized to swiftly get information and run queries are reviewed and summarized in this study. We have listed local and global data warehouses in this study and evaluated them using Figure 1, along with their benefits and drawbacks. According to reviews, distributed data warehouse is an effective method for data warehousing.


References:

[1] Sagar Yeruva, Dr. P. V. Kumar, Dr. P. Padmanabham, “Distributed Warehouses: A Review on Design Methods and Recent Trends”, International Journal of Computers and Distributed Systems, Vol. No.1, Issue 3, October 2012

[2] Shaweta, “A Review on Designing of Distributed Data Warehouse and New Trends in Distributed Data Warehousing”, International Journal of Computer Science and Information Technologies, Vol. 5 (2), 2014

[3] Abhay Kumar Agarwal, Neelendra Badal, “Parallel and Distributed, Data Warehouse and Data Mining: A Walk Through”, Journal of Information and Computational Science, Volume 9, September 2019

[4] Bindia, Jaspreet Kaur Sahiwal, “Agent Based Architecture in Distributed Data Warehousing”, International Journal of Scientific and Research Publications, Volume 2, Issue 5, May 2012

[5] Pujari, N., Day, J., Huq, F., and Hale, T. S., (2008) ‘A framework for an integrated distribution system optimization model’, Int. J. Log. Sys. And Mgmt., Vol. 6, No. 4.

[6] Ileana ŞTEFAN and Maricel POPA “Distributed Database Design – Top-Down Design ’’ , Volume 48, Number 1, 2007

 

 

 

No comments:

Post a Comment

Accessing and Parsing OneNote Notebook Content from Azure Storage Containers

Accessing and Parsing OneNote Notebook Content from Azure Storage Containers OneNote is a powerful tool for digital note-taking and collabor...