Technical Proposal

harleyteam34
TechnicalProposalPartII.docx

Running head: DATA WAREHOUSING 1

DATA WAREHOUSING 4

Final Project Part II Technical Proposal

Jamie Vaughan

Southern New Hampshire University

25 June 2018

DATA WAREHOUSING

Introduction

A lot of things need to be done for data consolidation to take place as it is not a simple process. It becomes more challenging when many organizations are involved. Factors such as timeline and resources must be dearly considered. Consolidation of a data warehouse is not a simple process. There is a lot which needs to be done in order to see the process bear fruits. The consolidation may be harder especially where we have more than one organization involved (Rainardi, 2013). Factors like resources, timeline and resources must be considered in the implementation plan.

Timeline

The time required for the implementation will depend on the created warehouse plan. The deliverables may include user and system documentation, interface document, system test sign off, staffing costs, test plan document, design documents, data warehouse reports, and test plan document. All of the deliverables serves a different function and is vital. The end user experience can also be considered as some of the deliverable because it is the most vital and it ascertains if the process was effective or not (Ariyachandra & Watson, 2014).

Resources

Various resources are required in consolidation of the data warehouse.

Sybase

A totally new site would need to be composed, created, and actualized to deal with the new debacle recuperation design and have the capacity to process a lot of information and increment the execution to profit the end client. Arrangement: An adaptable, versatile, and dependable information-rich distribution center creating speedier load times, quicker question results, and more far-reaching data sees all at a lower cost (Mannino & Walter, 2013).

IBM

These structural objectives address fundamental difficulties concerning database extensibility, flexibility, scalability, and portability. Each requires key decisions in striking a harmony between contending progression.

Resource

Associations today rely upon consistent access to corporate data assets for compelling business basic leadership. Be that as it may, for the IT work, adequately dealing with the sheer volume of information produced by the association consistently is demonstrating a difficult task. Numerous organizations confront mounting expenses and unpredictability guaranteeing the accessibility, security, and reinforcement of information held in various sources over the undertaking (Kimball, 2011).

Money (Crista)

This depends on the size of the business, which determines the approach that will be used in BI. Large companies can afford large amounts to fund a BI initiative as compared to the small or middle-sized ones.

Training

Training is highly required among the employees before the consolidated warehouse commences functioning. Firstly, they should be trained about SQL servers. Furthermore, they need to be training about Warehousing Data and BI techniques that covers significant concepts like Data visualization, data modeling, DW Architecture, ETL fundamentals and Erwin (Silverston et al., 2014). This will ensure smooth daily operations and the employees a better position to deliver as per the set goals. Failure to offer the required training may cause complexities and much inefficiency.

Security Policy

Access to the warehouse is restricted except only to the data warehouse managers and other key staffs in the field. Employees should not have access to records because this may act as a breach on privacy. I will work with the project team to determine the rights for the users. No employee should have access to the servers at all costs.

Data integration

Data integration involves coordination of different databases to ensure that they work together smoothly. Some data issues arise while trying to achieve homogenization of data. Combining date fields with various formats is one of the significant issues associated with data integration. It can be done by using the DATEADD() function (Jonathan, 2017). This function increases performance time and also one can go from integer to Date data type. While integrating data, one can be faced with an issue such as internal legacy systems varying in data format due to vast quantities of the data. This issue may be as a result of combining hierarchical databases with relational databases. In such a case, various actions are required in to cope with data heterogeneity. To tackle this issue of moving and fusing highly incompatible databases, one can use software that facilitates a series of data access routines that enable structured query languages to access DBM data files (Federal Highway Administration, 2017). These data file can be either relational or non-relational.

Another issue that could happen in the process of integration is bad data. The quality of data that is transferred is usually a common issue during data integration. Failing to clean up data before integration may result in serious data issues afterwards. It is essential to identify the wrong data and fix it in its source (Todd, 2016). According to (Todd, 2016), integrating bad data will cause the database to be corrupted and produce results that are of low quality.

Primary Keys

A primary key is an integral part of the database as it uniquely specifies a row in a relation table. The preferable primary key for this database will be the identity card number as everyone has a different one. Another appropriate primary key is the customers’ member numbers. The foreign key is used to refer to a primary key in another table (Ben, 2017). The appropriate foreign key will be the members’ work id. The members’ identity card number is related to their work id. An index helps a query in swiftly and efficiently accessing data from a database. The indexes for this database could be the sex of the members. Members can be separated according to their sex as it will make it easier for the query to scan through the database.

After merging the databases, it is essential to eliminate the duplicate data. The elimination is done by utilising software that will help in identifying the duplicate data and deleting it (Towerdata, 2013). Such software is Dedupe software.

References

Adendorff, M., & Armstrong, M. (2002). U.S. Patent Application No. 09/987,905.

Ariyachandra, T., & Watson, H. J. (2014). Which data warehouse architecture is most successful? Business Intelligence Journal

Ben (2017). What are the Difference Between a Primary Key vs Foreign Key? Retrieved from https://www.databasestar.com/primary-key-vs-foreign-key/

Berson, A., & Smith, S. J. (1997). Data warehousing, data mining, and OLAP. McGraw-Hill, Inc.

Federal Highway Administration (2017). Challenges to Data Integration. U.S. Department of

Transportation.

Jonathan Drummey (2017). Combine Two Data Fields into Date Format. Retrieved from

https://community.tableau.com/message/288410#288410

Jonsson, A., &Malmgren, R. Security and security controls in operating systems.

Kimball, R., & Ross, M. (2011). The data warehouse toolkit: the complete guide to dimensional modeling. John Wiley & Sons.

Mannino, M. V., & Walter, Z. (2013). A framework for data warehouse refreshes policies. Decision Support Systems

Rainardi, V. (2013). Data warehouse architecture. Building a Data Warehouse: With Examples in SQL Server,

Silverston, L., Inmon, W. H., & Graziano, K. (2014). The data model resource book: a library of logical data models and data warehouse designs. John Wiley & Sons

Todd Hinton (2016). 4 Ways to Solve Data Quality Issues. Retrieved from

https://www.redpointglobal.com/blog/4-ways-solve-data-quality-issues/

Towerdata (2013). Top Reasons for Duplicate Data (And 3 Techniques to fix it). Retrieved from

https://www.towerdata.com/blog/bid/114711/Top-Reasons-for-Duplicate-And-3-

Techniques-to-Fix-It

Widom, J. (1995, December). Research problems in data warehousing. In Proceedings of the fourth international conference on Information and knowledge management (pp. 25-30). ACM.