paper
Chapter 9
Data Design
1
Learning Objectives
After this chapter, you will be able to:
Explain basic data design concepts, including data structures, DBMSs, and the evolution of the relational database model
Explain the main components of a DBMS
Define data design terminology
Draw entity-relationship diagrams
Apply data normalization
Explain data storage tools and techniques, including logical versus physical storage
Explain data control measures
IST 603 - Chapter 9
2
2
Datatype: elements
Every field in a database or a record of a file has a datatype
We called these data elements
Data elements can have a number of data types
IST 603 - Chapter 9
3
| Datatype | Description |
| Text or Char | Hold alphanumeric data. Are not usually involved in calculations |
| Numeric | Holds numeric data such as integers, singles and doubles |
| Currency | Used for money fields |
| Date | Holds date and time data |
| Boolean | Hold only two values: true, false |
3
Datatype: Structures or Records
Holds all information about one item or subject
Table shows one record for an account receivable file (the column headings are for illustration only)
If there are 50 account receivables, there will be 50 records in the file
Each record (or row) represents a different account receivable
The file will be a collection of such records
IST 603 - Chapter 9
4
| Customer Number | Due Date | Balance Due |
| 109823497 | 11-31-2019 | 86.52 |
4
Potential Issues with File Storage
IST 603 - Chapter 9
5
Storing data in files can lead to redundant data storage as well as anomalies in the data
Example discussed using Mario’s data files
Mario’s auto shop uses two separate file-oriented systems
MECHANIC SYSTEM uses the MECHANIC file to store data about shop employees
JOB SYSTEM uses the JOB file to store data about work performed at the shop
Certain data must be entered twice.: examples – Mechanic no., Name, and Pay Rate
This redundancy is inefficient and can produce data errors
Example: a mechanic changed his/her address, the change has to be implemented in all files where mechanic data exists
5
Database Storage
IST 603 - Chapter 9
.
6
Danica uses a database to store her shop operation’s data
Database design avoids duplication.
The data can be viewed as if it were one large table, regardless of where the data is stored physically
6
DBMS, Database, Interaction with End users
IST 603 - Chapter 9
7
7
DBMS – Database Management System
A collection of programs that manages the database structure and controls access to the data stored in the database
A database system is composed of five major parts
Hardware – all physical devices including computers, storage devices, printers, network devices, and other devices as necessary. These are outside our scope of discussion.
Software
Operating System such as MS Windows, UNIX, LINUX – not in our discussion scope
DBMS Software – manages the database within the database system. Examples: MS SQL Server, Oracle, MySqL, DB2
Application Programs and utility software
IST 603 - Chapter 9
8
8
DBMS – Database Management System
People:
System Administrators: responsible for underlying operating systems, network, and physical devices
Database Administrators: Responsible for DBMS management and support including security, access policies, and space allocation for data
Database Designers, End Users
Procedures – All instructions and rules that govern the design and use of the database system
Data – collection of data stored in the database
Schema - Descriptions of all fields, tables, and relationships
Subschema: database portions that a particular system or user needs or is allowed to access
IST 603 - Chapter 9
9
9
Database : An example logical view
IST 603 - Chapter 9
In this example, a sales database can support four separate business systems.
10
10
Database Types
Database can be organized using different structures
Hierarchical model – data is organized as a hierarchical tree
Network model – data is organized as a set of interconnected network
Relational model – data is organized into relational tables with relationships among them established through key attributes
Entity Relationship model – discussed later in this chapter
Object-oriented model – uses classes and their relationships to organize data
Big Data - very large data collection. Also called NoSQL since execution of SQL (Structured Query Language) queries prove too slow for very large databases
IST 603 - Chapter 9
11
11
Database Types: Hierarchical database
IST 603 - Chapter 9
12
A hierarchical database is a DBMS that represents data in a tree-like form. The relationship between records is one-to-many. That means, one parent node can have many child nodes. A hierarchical database model is a data model where data is stored as records but linked in a tree-like structure with the help of a parent and level. Each record has only one parent. The first record of the data model is a root record
12
Database Types: Hierarchical database
IST 603 - Chapter 9
13
| <>Person Table | |||
| <>ID | Role | Description | ParentID |
| 1 | Author | Writes, speaks, trains | 0 |
| Authors Table | |||
| ID | Name | Description | ParentID |
| 101 | Allen O'Neill | Author writes on AI | 1 |
| 102 | Mahesh Chand | Author writes on C# | 1 |
| 103 | David McCarter | Author writes on .NET | 1 |
| 104 | Raj Kumar | Author writes on AWS | 1 |
| Books Table | |||
| ID | Topic | Title | ParentID |
| 1001 | ADO.NET | ADO.NET Programming | 102 |
| 1002 | GDI+ | Programming GDI+ | 102 |
| 1003 | C# | Learn C# 8.0 | 102 |
| 1004 |
13
Database Types: Network database
IST 603 - Chapter 9
14
The diagram represents a network data model that shows that the Stores entity has relationships with multiple child entities
The Transactions entity has relationships with multiple parent entities.
A network database model allows one parent to have multiple child record sets and each record set can be linked to multiple nodes (parents) and children.
14
Entity Relationship Model
Entity: a person, place, thing, or event for which data is collected and maintained
An entity has a set of attributes that describe particular characteristics of an entity
Employee: attributes are Social Security Number, Name, Gender
A Table or file contains a set of related records that store data about a specific entity
An Employee table has records about all employees
Field (i.e., attribute): single characteristic or fact about an entity
Record (i.e., tuple): set of related fields that describes one instance of an entity
IST 603 - Chapter 9
15
15
Entity-Relationship Diagrams
Drawing an ERD
List the entities that were identified during the systems analysis phase
Consider the nature of the relationships that link them
IST 603 - Chapter 9
In an entity-relationship diagram, entities are labeled with singular nouns and relationships are labeled with verbs. The relationship is interpreted as a simple English sentence.
16
16
Entity-Relationship Diagrams
Types of relationships
One-to-one (1:1) relationship: exactly one of the second entity occurs for each instance of the first entity
IST 603 - Chapter 9
Examples of one-to-one (1:1) relationships.
17
17
Entity-Relationship Diagrams
One-to-many relationship (1:M): one occurrence of the first entity can relate to many instances of the second entity
Each instance of the second entity can associate with only one instance of the first entity
IST 603 - Chapter 9
Examples of one-to-many (1:M) relationships.
18
18
Entity-Relationship Diagrams
Many-to-many relationship (M:N): one instance of the first entity can relate to many instances of the second entity, and vice versa
IST 603 - Chapter 9
Examples of many-to-many (M:N) relationships. Notice that the event or transaction that links the two entities is an associative entity with its own set of attributes and characteristics.
19
19
Entity-Relationship Diagrams
IST 603 - Chapter 9
An entity-relationship diagram for SALES REP, CUSTOMER, ORDER, PRODUCT, and WAREHOUSE. Notice that the ORDER and PRODUCT entities are joined by an associative entity named ORDER LINE.
20
20
Entity-Relationship Diagrams
Cardinality
Describes numeric relationship between two entities
Shows how instances of one entity relate to instances of another entity
Crow’s foot notation indicates various possibilities using circles, bars, and symbols
IST 603 - Chapter 9
Crow’s foot notation is a common method of indicating cardinality. The four examples show how you can use various symbols to describe the relationships between entities.
21
21
Entity-Relationship Diagrams
IST 603 - Chapter 9
In the first example of cardinality notation, one and only one CUSTOMER can place anywhere from zero to many of the ORDER entity. In the second example, one and only one ORDER can include one ITEM ORDERED or many. In the third example, one and only one EMPLOYEE can have one SPOUSE or none. In the fourth example, one EMPLOYEE, or many employees, or none, can be assigned to one PROJECT, or many projects, or none.
22
22
Entity-Relationship Diagrams
IST 603 - Chapter 9
An ERD for a library system drawn with Visible Analyst. Notice that crow’s foot notation has been used and relationships are described in both directions.
23
23
Relational Database Model
A relational model is based on three well-defined components:
A logical data structure represented by a relation
A set of integrity rules to enforce that data remain consistent over time
A set of operations that define how data are manipulated
The concept of relation is based on predicate logic
End user think of a relation as a table – a two-dimensional structure composed of rows and columns
IST 603 - Chapter 9
24
24
Characteristics of a Relational Table
| 1 | A table is a two-dimensional structure composed of rows and columns |
| 2 | Each table row (tuple) represents a single entity occurrence within the entity set |
| 3 | Each table column represents an attribute, and each column has a distinct name |
| 4 | Each intersection of a row and column represents a single data value |
| 5 | Each column has a specific range of values known as the attribute domain |
| 6 | All values in a column must conform to the same data format |
| 7 | The order of the rows and columns is immaterial to the DBMS |
| 8 | Each table must have an attribute or combination of attributes that uniquely identifies each row |
IST 603 - Chapter 9
25
25
Not a Relational Table
The table above shows vendor data for a company
This table does not conform to a relational structure
All column and row intersection do not have the same data format
Phone information is a repeated value violating the rule of single data value for a row and column intersection
IST 603 - Chapter 9
26
| VEND_CODE | VEND_CONTACT | VEND_AREACODE | VEND_PHONE |
| 230 | Shelly K. Smithson | 606 | 5551234 |
| 231 | James Johnson | 615 1234536 619 7612231 | 1234536 |
| 619 | 7612231 | ||
| 232 | Candice Wallace | 904 | 3426567 |
| 233 | Arthur Jones | 615 | 1233324 |
26
A Relational Table
This vendor table now conforms to a relational structure
VEND_CODE is an unique identifier and uniquely determines the rest of the record
VEND_CODE is the Primary Key to this relation
VEND_CODE (VEND_CONTACT, VEND_AREACODE, VEND_PHONE)
VEND_CODE functionally determines the values of other attributes
IST 603 - Chapter 9
27
| VEND_CODE | VEND_CONTACT | VEND_AREACODE | VEND_PHONE |
| 230 | Shelly K. Smithson | 606 | 5551234 |
| 231 | James Johnson | 615 | 1234536 |
| 232 | Candice Wallace | 904 | 3426567 |
| 233 | Arthur Jones | 615 | 1233324 |
27
Data Design Terms
Key fields
Primary key: field or fields that uniquely and minimally identifies a member of an entity
Candidate key: field that could serve as a primary key
Foreign key: field in one table that must match a primary key value in another table for a relationship between the two tables to exist
Secondary key: field or combination of fields that can be used to access or retrieve records
Entity Integrity – Primary Key value must be unique and no key attribute in the primary key can contain a null
IST 603 - Chapter 9
28
28
Data Design Terms
Referential integrity
Set of rules that avoids data inconsistency and quality problems
The integrity requirement can be specified when creating the database structure
IST 603 - Chapter 9
Microsoft Access allows a user to specify that referential integrity rules will be enforced in a relational database design.
29
29
Data Normalization
A good relational database software is not enough to avoid data redundancy
Normalization is a process for evaluating and correcting table structure to minimize data redundancy
Normalization stages
Unnormalized design
First normal form
Second normal form
Third normal form
IST 603 - Chapter 9
30
30
Data Normalization
Standard notation format
Used to show a table’s structure, fields, and primary key
The primary key field(s) is underlined
NAME (FIELD 1, FIELD 2, FIELD 3)
Recognition of repeating group fields is important
Repeating group: set of one or more fields that can occur any number of times in a single record
Each occurrence would possess different values
IST 603 - Chapter 9
31
31
Unnormalized Data
IST 603 - Chapter 9
In the ORDER table design, two orders have repeating groups that contain several products. ORDER is the primary key for the ORDER table, and PRODUCT NUMBER serves as a primary key for the repeating group. Because it contains repeating groups, the ORDER table is unnormalized.
32
32
First Normal Form (1NF)
First Normal Form (1NF)
Table format
Does not contain a repeating group
Primary Key (PK) identified
Converting an unnormalized design to 1NF requires expansion of the table’s primary key to include the primary key of the repeating group
IST 603 - Chapter 9
33
33
First Normal Form (1NF)
IST 603 - Chapter 9
The ORDER table as it appears in 1NF. The repeating groups have been eliminated. Notice that the repeating group for order 86223 has become three separate records, and the repeating group for order 86390 has become two separate records. The 1NF primary key is a combination of ORDER and PRODUCT NUMBER, which uniquely identifies each record.
34
34
Second Normal Form (2NF)
A table (relation) is automatically in the Second Normal Form if it has a single key attribute
A table needs to be in 1NF before being considered for 2NF
If a table has a composite key, there may be partial dependency as in the following example
The order table has a composite key: ORDER + PRODUCT NUMBER
Using ORDER, the dependent value of ORDER DATE can be determined.
Using PRODUCT NUMBER, DESCRIPTION can be determined
To convert this table to 2NF, the partial dependency need to be removed as shown in the next slide
IST 603 - Chapter 9
35
35
Data Normalization
IST 603 - Chapter 9
ORDER, PRODUCT, and ORDER LINE tables in 2NF. All fields are functionally dependent on the primary key.
36
36
Third Normal Form (3NF)
A table (relation) is in the Third Normal Form if it has no transitive dependency and satisfies the requirements of 2NF
Suppose the following three tables exist
PROJECT ( PROJ_NUM, PROJ_NAME)
EMPLOYEE( EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR)
ASSIGNMENT( PROJ_NUM, EMP_NUM, ASSIGN_HOURS)
Dependencies
PROJ_NUM PROJ_NAME
EMP_NUM EMP_NAME, JOB_CLASS
JOB_CLASS CHG_HOUR
EMPLOYEE table has transitive dependency, hence not in 3NF
IST 603 - Chapter 9
37
37
Third Normal Form (3NF)
The set of tables shown in the previous slide can be converted to 3NF by removing transitive dependency
PROJECT ( PROJ_NUM, PROJ_NAME)
EMPLOYEE( EMP_NUM, EMP_NAME, JOB_CLASS)
JOB( JOB_CLASS, CHG_HOUR)
ASSIGNMENT( PROJ_NUM, EMP_NUM, ASSIGN_HOURS)
All tables are now in 3NF
IST 603 - Chapter 9
38
38
Data Normalization
IST 603 - Chapter 9
When the PRODUCT table is transformed from 2NF to 3F
The result is two separate tables:
PRODUCT and SUPPLIER.
Note that in 3NF, all fields depend on the key, the whole key, and nothing but the key!
The shown structure works so long as there is exactly one supplier for a product
39
39
Creating Databases and Accessing Data
Databases are created and managed by using languages and commands that have nothing to do with HTML
For example, MS SQL Server comes with a tool called SQL Sever Management Studio (SSMS)
This tool can be used to create databases, tables and retrieve data
CREATE DATABASE company; Creates a database CREATE TABLE employee (EMPNO int PRIMARY KEY, FIRSTNAME varchar(20), HIREDATE date, GENDER char(1), SALARY decimal);
Two statements shown here are generally called DDL (Data Definition Language) statements.
Retrieval of data can be done using DML (Data Manipulation Language) statements.
SQL (Structured Query Language) – industry standard DDL and DML
SELECT * FROM employee; Get all records with all fields from the employee table
IST 603 - Chapter 9
40
40
Web-Based Design
Databases can be created and managed through web pages
Web interaction with data is not limited to HTML
Databases can be created using programming languages such as C#, Java
Middleware is used to integrate different applications and allow them to exchange data
Web-based data must be secure, yet easily accessible to authorized users
IST 603 - Chapter 9
41
41
Web-Based Design
IST 603 - Chapter 9
When a client workstation requests a Web page (1), the Web server uses middleware to generate a data query to the database server (2). The database server responds (3), and middleware translates the retrieved data into an HTML page that can be sent by the Web server and displayed by the user’s browser (4).
42
42
Benefits of using a DBMS
Economy of Scale
Better utilization of data
Can use enterprise-wide databases as in Enterprise Systems such as SAP, and Oracle
Can be deployed using powerful servers and fast networks
Enterprise-wide application – The DBA maintains and manages the database for all departments, not one department
Stronger Standards – Standards for accessing, updating, and deleting data can be specified upfront with all users expected to comply
Better Security – Different level of access can and are often defined
Data Independence – Systems that interact with a DBMS are relatively independent of how the physical data is stored
IST 603 - Chapter 9
43
43
Data Storage and Access
Tools and techniques
Data warehouse
An integrated collection of data that can include seemingly unrelated information, no matter where it is stored in the company
Data mart
Designed to serve needs of a specific department
IST 603 - Chapter 9
44
44
Data Storage and Access
IST 603 - Chapter 9
A data warehouse stores data from several systems. By selecting data dimensions, a user can retrieve specific information without having to know how or where the data is stored.
45
45
Data Storage and Access
Data mining
Looks for meaningful data patterns and relationships
Suggested goals for data mining
Increase the number of pages viewed per session and referred customers
Reduce clicks to close
Increase checkouts per visit and average profit per checkout
IST 603 - Chapter 9
46
46
Data Storage and Access
Logical versus physical storage
Logical storage: data that a user can view, understand, and access, regardless of how or where that information actually is organized or stored
Physical storage: strictly hardware related
Involves the process of reading and writing binary data to physical media
IST 603 - Chapter 9
47
47
Data Storage and Access
Data coding
EBCDIC (Extended Binary Coded Decimal Interchange Code)
Mainframe computers and high-capacity servers
ASCII (American Standard Code for Information Interchange)
Used on most personal computers
Binary storage format
Represents numbers as actual binary values
Unicode: uses two bytes per character
IST 603 - Chapter 9
48
48
Data Storage and Access
IST 603 - Chapter 9
Unicode is an international coding format that represents characters as integers, using 16 bits per character. The Unicode Consortium maintains standards and support for Unicode.
49
49
Data Storage and Access
Storing dates
International Organization for Standardization (ISO) requires a format of four digits for the year, two for the month, and two for the day (YYYYMMDD)
Absolute date: total number of days from some specific base date
IST 603 - Chapter 9
50
50
Data Control
Well-designed DBMS
Must provide built-in control and security
Forms of data protection
Limited access to files and databases
User ID and passwords
Permissions and encryption
Backup copies of databases must be retained for a specified period of time
Recovery procedures can be used to restore
IST 603 - Chapter 9
51
51
Summary
A database consists of linked tables that form an overall data structure
DBMS enable users to add, update, manage, access, and analyze data in a database
DBMS designs are more powerful and flexible than traditional file-oriented systems
Components include interfaces for users, database administrators, and related systems
IST 603 - Chapter 9
52
52
Summary
Key fields include primary keys, candidate keys, foreign keys, and secondary keys
An entity-relationship diagram (ERD) is a graphic representation of all system entities and the relationships among them
Normalization is a process for avoiding problems in data design
IST 603 - Chapter 9
53
53
Summary
Logical storage is information seen through a user’s eyes, regardless of how or where that information actually is organized or stored
File and database control measures include limiting access to the data, data encryption, backup/recovery procedures, audit-trail files, and internal audit fields
IST 603 - Chapter 9
54
54
DBMS (Database management system)MetadataCustomersInvoicesProductsDatabase structureEnd-user dataSingleView of DataIntegratedApplication requestDataDataApplication request
Carlos CoronelAuthorSteven MorrisAlan DennisDatabase SystemsSystems Analysis
ORDER ORDER DATE PRODUCT NUMBER DESCRIPTION SUPPLIER NUMBER SUPPLIER NAME