FOR COMPUTER SCIENCE ONLY

profilecftins69
ProjectDeliverable3DatabaseandDataWarehousingDesign.docx

Running head: DATABASE AND DATA WAREHOUSING DESIGN

DATABASE AND DATA WAREHOUSING DESIGN 11

Project Deliverable 3: Database and Data Warehousing Design

Section 1: Design Document

Charles Tinsley

Dr. Mark Cohen

CIS 599

August 13, 2019

Introduction

Data warehousing and relational databases have become very popular these days for corporations and businesses. It has become a standard for almost every company that intends to compete effectively in the current world to own this unique technology. Data from different sources such as sales and marketing departments, finance department, and human resources department in a company is collected, stored, and processed in a database. The operational systems used in these sources help to gather and disseminate data to the database. This data is then analyzed to produce reports that may assist the management in making strategic decisions. When a company makes decisions based on real data, then it can stay a step ahead of the competitors, among other benefits. It is important to note that database design plays a crucial role in determining the effectiveness of the flow of data in a system (Beynon-Davies, 2013). In our context, a relational database and data warehousing will be very effective in the storage and processing of data. Incorporating this technological solution into the information network will enable consolidation of all data from within and outside the company into one place hence making it easier to manage.

Need for the Relational Database and Data Warehousing

Since the majority of the business activities and operations in our company are online-based, then data warehouse will be one of the best options to establish. Information from throughout the organization will be aggregated into a single repository. The data collected can then be utilized later to analyze various business activities. Analysis of data in the data warehouse will help in revealing the status quo of the company since it based on facts (Van Aken et al., 2017).

A data warehouse is a central repository area for all the company's data which helps the management to access unique data that may not be found in other areas. The sources of data include the Enterprise Resource Planning (ERP) system, operating system, and the database will give distinct data in different formats. A data warehouse will help in the processing of this data into simple formats that will be easy to comprehend. The good thing about a data warehouse is that processing of data will be done in one centralized location instead of different operational systems.

Since our company receives data from many clients, sometimes making it hard and costly to manage it, then the data warehouse will provide a solution to this problem. The data warehouse will aggregate it to avail a database that is reportable and searchable. The data warehouse will also enable consistency of the data by establishing a similar process for all varieties of data. One of the major purposes for the establishment of a data warehouse will be to give more insight into the data aggregated into it. The insight from the information generated will be used in strategic planning, decision making, and supporting other activities of the management (Beynon-Davies, 2013). By taking into consideration the importance of the decision-making process in a company, then this project must be implemented as soon as possible. The insight from the data warehouse will be easy to understand by all people in the company from juniors to senior managers. Decisions made by the management will touch every department of the company; hence, information derived from the data warehouse will be reliable since the raw data will have been from all departments. When a data warehouse is not available, and every department is expected to generate its report, the data is independent of other departments hence not entirely reliable. This sometimes confuses the management during the interpretation of reports since different departments use distinct approaches and formats to process and analyze data. Data warehouse eliminates such confusion and all possible errors by generating consistent data in a centralized processing location.

A data warehouse enables storage and easy analysis of historical data based on past activities within or outside the company. In most cases, past performance enables accurate prediction of the future. Hence historical data should be treated as a vital part of the company. Many big organizations thrive on analysis of past data, which enables them to dominate in the market. Having access to prior data is one of the key aspects that allow big organizations to outdo startups. Such organizations use past data to analyze risks; hence, they know which direction to take at all times. This help to avoid or reduce the effects of certain unfavorable scenarios before they happen. Reports from past data enable organization management to make effective decisions. To ensure that the data that is received and stored in the database is reliable, then the right kind of information should be fed to it at all times (Cuzzocrea et al., 2017). All the data sources should be verified to ensure data integrity and avoid contamination.

A data warehouse will help in data mining; this involves processing and analysis of big amounts of data to produce patterns and establish the relationship between different sets of unrelated data. The data in the data warehouse originate from different sources and have different formats. The data also arrives in a raw and unstructured state, which is hard to interpret. The data in its original state cannot provide the expected insight; hence, it needs to be processed to be meaningful. Data mining process enables sorting out and processing of large chunks of data in the data warehouse to identify patterns and relationship among different sets of data (Cuzzocrea et al., 2017). For example, the relationship between sales and prices can be established by the use of a graph, table, or pattern. The main goal of data mining is to provide insights to assist in making accurate predictions. Data mining also helps identify problems in a company based on analyzed data. Various parameters are utilized to produce results during the data mining process enabled by data mining tools and techniques.

Data warehousing come in handy with data visualization tools. This is in addition to its data analytic capabilities. The data visualization tools enable a better presentation of the analyzed reports. After analyzing data, users need to view it to understand it. The process of presentation is, therefore, very important; hence, it should be made easy. In the past presentation of data was text-based data, but with the advancement in technology, this has changed. Data visualization tools are now used in the data warehouse to present data in a more easily understandable format. There is a dashboard where users can see the data information in various detailed versions such as infographics, geographic maps, spark lines, and heat maps (Beynon-Davies, 2013). Initially, texts were presented in simple bar graphs and charts, but today, more detailed and better visualization tools are continuing to dominate the market. The modern visualization tools offer a more effective solution for data analytics as they easily identify trends, patterns, and correlations in data. The data is manipulated by the tools within a short period to provide real-time graphical reports.

Data warehouses process data within a short period hence proves to be one of the best technology solutions to be preferred by any Internet-based companies that need to process a large amount of data. Speed will help to save time and cost that would otherwise have been incurred if data processing was done manually or using other slower means. The data warehouse will analyze large amounts of data at a faster rate than when human users are used. This technological solution will help in quick processing of queries during emergency meetings when the management wants a quick data processing. Most times in businesses emergencies do not have set times hence require quick feedback which can be facilitated by shorter data processing of the warehouse. The data warehousing has higher data processing capability than most of the ordinary operational systems; it should be a priority. The data warehouse also processes data without impartiality associated with human beings hence making reports from the more reliable and accurate.

A data warehouse is highly secure to store the company’s data. Confidential data such as personal data of clients, financial data, and sales data can be safeguarded in the data warehouse without unauthorized access that can lead to a data breach. With the high data breaches in the world resulting from hacking and illegal entry, every company should put in place security measures to protect its data (Elmasri, & Navathe, 2017). A data breach can be detrimental to the involved company in a variety of ways such as financial loss, brand tarnishing, and getting sued. Due to the centralized data processing state of the data warehouse, the security also becomes centralized. This is contrary to a situation where there are many entry and exit points in the database, making it difficult to monitor and control the access. The data warehouse can also enable encryption of the data stored in hence preventing unauthorized access.

Database Schema

A database schema helps to illustrate the logical design of the database. A database schema indicates a part of the database to be implemented. Components of the database, descriptions on data organization, attributes, and the relationship between different components are indicated on the database schema. Below is a schema for part of the database that will be ideal for our company.

Diagram 1: Database Schema

C:\Users\Bones\Documents\Mark\Write\employees-schema.png

The schema above is for the employee records for one of the company’s customers. The schema indicates the relationship between employees’ information and various components of the organization. The components of the organization indicated to be relating closely with employees’ records include titles, departments, salaries, and department managers.

Entity Relationship Diagram (ERD)

Diagram 2: Entity Relationship Diagram (ERD)

C:\Users\Bones\Documents\Mark\Write\emp.gif

The entity-relationship diagram above indicates the available tables of employees, projects, departments, and salaries. The table "works at" indicates a one-to-many relationship while table "works on" indicates a many-to-many relationship (Cuzzocrea et al., 2017).

Data Flow Diagram

The Data Flow Diagram (DFD) above indicates the setup for the system functionalities.

Diagram 3: Data Flow Diagram

Diagram 4: Flow of Data used in the Data Warehouse

Operational System

Sales

ETL

Extraction,

Transformation, and Loading

ERP Marketing

Data Warehouse

OLAP Server

CRM Procurement

Human Resources

SCM

R&D

Flat Files

Senior Management

Conclusion

In the current world, data warehouses and relational databases have become a standard requirement for all organizations that want to compete effectively. The market has become very competitive place hence the need for relying upon data analytics. Data analytics have experienced massive growth in recent years with high expectations of this continued growth in the coming years. Data warehousing and relational databases avail a good opportunity for organizations to benefit from data analytics, which will give them a competitive advantage and a chance for scalability in the future. Therefore, the implementation of these technologies should be prioritized by the management of any company.

References

Beynon-Davies, P. (2013). Database systems. Basingstoke: Macmillan.

Cuzzocrea, A., Darmont, J., & Mahboubi, H. (2017). Fragmenting very large XML data warehouses via k-means clustering algorithm. arXiv preprint arXiv:1701.02190.

Elmasri, R., & Navathe, S. (2017). Fundamentals of database systems. Pearson.

Van Aken, D., Pavlo, A., Gordon, G. J., & Zhang, B. (2017, May). Automatic database management system tuning through large-scale machine learning. In Proceedings of the 2017 ACM International Conference on Management of Data (pp. 1009-1024). ACM.