Advanced SQL
1
RUNNING HEAD: Normalization
2
Normalization
NORMALIZATION
Charles Williams
CS352 Unit 3 IP
Professor Jeffery Karlberg
Table of Contents Table of Contents………………………………………………………………………………………………………………………………………………………………………………………………..….2 The Database Models, Languages, and Architecture 3 Database System Development Life Cycle 5 Database Management Systems 6 Advanced SQL 11 Web and Data Warehousing and Mining in the Business World 12 References 13
Database management
It is important that a formal design methodology is used as it provides a mathematical approach to coming up with a reliable database that consolidates all the environments that use the database. A design methodology helps as it provides a way in which the whole designing and development can be done with minimum errors. The design methodology helps in identifying the requirements, the specifications and design levels of the database and data warehouse up for development. The planning stage of the consolidated data base is very important as it involves the coming up with plans that will guide the development of the database (Mabogunje, 2015). The plans help in managing quality, time, risks and other related issues that might affect the design and development of the database and eventually the data warehouse.
The three layers of the 3-level ANSI-SPARC architecture include; a physical schema which is responsible for defining how data is to be stored, a conceptual schema which is responsible for indexing and relating data, and the external schema which is responsible for showing how information was presented. The 3-level ANSI-SPARC type of architecture is designed to guard and guide data change. The primary function of the first layer is to define how data is stored. It is important to note that there can be changes in the physical schema and the changes will not affect how external applications will interact with the stored data (Pokorný, 2018). The second layer’s primary function is to provide a consolidated view of a database. The third layer’s primary function is to define richer APIs and it can do so without necessarily having to change the underlying storage mechanisms in place. The 3-level ANSI-SPARC architecture helps in promoting data independence which in turn helps save time in the long run through the conceptual schema which emphasizes data mapping.
Data administrator and database administrator
A data administrator is an individual whose function is to gather data requirements, analyze data as well as design data and classify data types. They two primary roles of a data administrator include; coming up with data standards that will be applied in databases, and coming up with policies that will dictate on data security, data access, data usage, dataflow well as data authorization in an organization. Other minor duties of data administrator include; playing an assistance role by coming up with data resources and allowing for the sharing of data across applications as well as across departments (Tenopir et al., 2015). Basically, Data administrators are primary deal with data flows and data needs in an organization.
On the other hand, a database administrator is an IT officer like individual whose primary responsibilities include; configuring and installing databases and database management systems, linking databases to information systems to help in application operations, ensuring that the database management systems in place are running optimally, checking and monitoring the database performance. The minor duties of a database administrator include ensuring data integrity and ensuring that appropriate data backups are made. (Yilmaz, O'Connor & Clarke, 2015). Basically, a database administrator is primarily supposed to oversee running and operations of database management systems.
I would recommend that my company hires two people, one as a data administrator and the other as a database administrator. The role of the data administrator will be to analyse and organize the data while the role of the database administrator will be to oversee the operations and functions of the database.
Database System Development Life Cycle
Un-normalized form.
|
Charity ID |
Charity Name |
Charity Location |
POC Name |
POC ID |
Tel Extn. |
Customer ID |
Customer Name |
Date Contribution Started |
No of Month |
Date Places |
Expected Contribution End |
The above table is an un-normalized relation. First I will list all the attributes, identify the repeating groups then identify the key attributes.
The attributes are; charity ID, charity name, charity location, POC ID, Tel Extn, customer ID, customer name, date contribution started, no of month, date places and expected contribution end.
The key attributes are the POC ID, the customer ID and the charity ID.
The repeating groups are Date Contribution started, Date Places, No of Month, Charity Location and Expected Contribution End.
1st Normal form
Any relation can be said to be in the first (basic) normal form only if it has no repeating groups. To make a relation to be on the first normal form, first you remove all the repeating groups and then do a partitioning to the un-normalized relation above as shown below;
|
Charity ID |
Charity Name |
Charity Location |
POC Name |
Tel Extn. |
Customer Name |
Date Contribution Started |
No of Month |
Date Places |
Expected Contribution End |
|
POC ID |
Charity Name |
Charity Location |
POC Name |
Tel Extn. |
Customer Name |
Date Contribution Started |
No of Month |
Date Places |
Expected Contribution End |
|
Customer ID |
Charity Name |
Charity Location |
POC Name |
Tel Extn. |
Customer Name |
Date Contribution Started |
No of Month |
Date Places |
Expected Contribution End |
Before moving to the second normal form, one has to understand the dependencies in the relations, the functional, the partial and the transitive dependencies.
2nd Normal form.
A relation can be on the 2nd normal form only if it is in the first normal form. Each and every non-key attribute is dependent on the key attribute.
To achieve the second normal form for a relation, one should remove any partial dependencies that are segregating the relation.
N.B any relation that is in First Normal For and does not have a composite key is definitely in Second Normal Form.
|
Charity ID |
Charity Name |
Charity Location |
Date Contribution Started |
No of Month |
Date Places |
Expected Contribution End |
|
POC ID |
POC Name |
Tel Extn. |
Date Contribution Started |
No of Month |
Date Places |
Expected Contribution End |
|
Customer ID |
Customer Name |
Date Contribution Started |
No of Month |
Date Places |
Expected Contribution End |
3rd Normal form.
Any relation is said to be on the third normal form if it is in the second normal form and each and every non-key attributes is not dependent on all the other non-key attributes.
To achieve this. One has to remove any non-key attribute that are dependent on other non-key attributes. Any transitive dependencies should be removed (Gaikwad et.al, 2017).
Charity_Details.
|
Charity ID |
Charity Name |
Charity Location |
POC_Details.
|
POC ID |
POC Name |
Tel Extn. |
Customer_Details
|
Customer ID |
Customer Name |
Date_Details
|
Date Contribution Started |
No of Month |
Date Places
|
Expected Contribution End |
Charity ID |
POC ID |
Customer ID |
Boyce-Codd normal form.
We all know that Boyce-Codd Normal Form is an extension of the Third Normal Form and it states that for each non-trivial functional dependency, Y->A, Y has to be a super-key, for example in the relation below;
|
Customer ID |
Customer Name |
In the customer relation, customer ID is the super key in the Customer_Details relation.
In rationalization of the results achieved, the Charity ID, the POC ID and the Customer_ ID are the foreign keys.
Normalization is through and ready for designing of a high-quality database that has a better degree of independence (Eessar, 2016).
|
Vendor |
|
|
|
|
|
|
Vendor |
|
|
|
|
|
|
Vendor |
|
|
|
|
|
|
Product Version |
MySQL |
|
|
|
|
|
Product Version |
Oracle 12c, Enterprise |
|
|
|
Product Version |
Microsoft SQL Server |
|
|
||||
|
Desired Feature |
Comments |
Rating |
Weight |
Score |
|
|
Desired Feature |
Comments |
Rating |
Weight |
Score |
|
|
Desired Feature |
Comments |
Rating |
Weight |
Score |
|
|
Vendor Stability |
stable |
3.5 |
0.1 |
0.3 |
|
|
Vendor Stability |
stable |
3.6 |
0.1 |
0.36 |
|
|
Vendor Stability |
stable |
4.5 |
0.1 |
0.45 |
|
|
Operating System Choice and Requirements |
IntelCore |
4 |
0.05 |
0.2 |
|
|
Operating System Choice and Requirements |
Linux |
4.5 |
0.05 |
0.225 |
|
|
Operating System Choice and Requirements |
Linux, Microsoft Windows Server, Microsoft Windows |
4 |
0.05 |
0.2 |
|
|
Ease of Use/Training Requirements |
Easy |
4.5 |
0.2 |
0.9 |
|
|
Ease of Use/Training Requirements |
Hard |
4 |
0.2 |
0.8 |
|
|
Ease of Use/Training Requirements |
Easy |
3 |
0.2 |
0.6 |
|
|
Ease of Data Model Creation |
Easy |
3 |
0.15 |
0.45 |
|
|
Ease of Data Model Creation |
Easy |
4.5 |
0.15 |
0.675 |
|
|
Ease of Data Model Creation |
Easy |
4.5 |
0.15 |
0.675 |
|
|
Distributed Database Support |
No |
2 |
0.1 |
0.2 |
|
|
Distributed Database Support |
Yes |
4.5 |
0.1 |
0.45 |
|
|
Distributed Database Support |
Yes |
4.5 |
0.1 |
0.45 |
|
|
Administration Ease |
Easy |
4 |
0.2 |
0.8 |
|
|
Administration Ease |
Moderate |
4 |
0.2 |
0.8 |
|
|
Administration Ease |
Easy |
3 |
0.2 |
0.6 |
|
|
Cost |
$5,500 |
4.5 |
0.2 |
0.9 |
|
|
Cost |
$47,500 |
3.5 |
0.2 |
0.7 |
|
|
Cost |
$28,000 |
3 |
0.2 |
0.6 |
|
|
Total |
|
|
|
3.75 |
|
|
Total |
|
|
|
4.01 |
|
|
Total |
|
|
|
3.575 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3.75 out of 5 |
|
|
|
|
|
4.01 out of 5 |
|
|
|
|
|
3.575 out of 5 |
|
|
|
|
|||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Advanced SQL(TBD)
Web and Data Warehousing and Mining in the Business World(TBD)
References
Eessaar, E. (2016). The database normalization theory and the theory of normalized systems: finding a common ground. Baltic Journal of Modern Computing, 4(1), 5.
Gaikwad, A. S., Kadri, F. A., Khandagle, S. S., & Tava, N. I. (2017). Review on Automation Tool for ERD Normalization. International Research Journal of Engineering and Technology (IRJET)[Online], 4(2), 1323-1325.
Yilmaz, M., O'Connor, R. V., & Clarke, P. (2015). Software development roles: a multi-project empirical investigation. ACM SIGSOFT Software Engineering Notes, 40(1), 1-5.
Tenopir, C., Hughes, D., Allard, S., Frame, M., Birch, B., Baird, L., ... & Lundeen, A. (2015). Research data services in academic libraries: Data intensive roles for the future?. Journal of eScience Librarianship, 4(2), 4.
Pokorný, J. (2018, March). Integration of Relational and NoSQL Databases. In Asian Conference on Intelligent Information and Database Systems (pp. 35-45). Springer, Cham.
Mabogunje, A. (2015). The development process: A spatial perspective. Routledge.