paper

profilesand
Chapter09.pptx

Chapter 9

Data Design

1

Learning Objectives

After this chapter, you will be able to:

Explain basic data design concepts, including data structures, DBMSs, and the evolution of the relational database model

Explain the main components of a DBMS

Define data design terminology

Draw entity-relationship diagrams

Apply data normalization

Explain data storage tools and techniques, including logical versus physical storage

Explain data control measures

IST 603 - Chapter 9

2

2

Datatype: elements

Every field in a database or a record of a file has a datatype

We called these data elements

Data elements can have a number of data types

IST 603 - Chapter 9

3

Datatype Description
Text or Char Hold alphanumeric data. Are not usually involved in calculations
Numeric Holds numeric data such as integers, singles and doubles
Currency Used for money fields
Date Holds date and time data
Boolean Hold only two values: true, false

3

Datatype: Structures or Records

Holds all information about one item or subject

Table shows one record for an account receivable file (the column headings are for illustration only)

If there are 50 account receivables, there will be 50 records in the file

Each record (or row) represents a different account receivable

The file will be a collection of such records

IST 603 - Chapter 9

4

Customer Number Due Date Balance Due
109823497 11-31-2019 86.52

4

Potential Issues with File Storage

IST 603 - Chapter 9

5

Storing data in files can lead to redundant data storage as well as anomalies in the data

Example discussed using Mario’s data files

Mario’s auto shop uses two separate file-oriented systems

MECHANIC SYSTEM uses the MECHANIC file to store data about shop employees

JOB SYSTEM uses the JOB file to store data about work performed at the shop

Certain data must be entered twice.: examples – Mechanic no., Name, and Pay Rate

This redundancy is inefficient and can produce data errors

Example: a mechanic changed his/her address, the change has to be implemented in all files where mechanic data exists

5

Database Storage

IST 603 - Chapter 9

.

6

Danica uses a database to store her shop operation’s data

Database design avoids duplication.

The data can be viewed as if it were one large table, regardless of where the data is stored physically

6

DBMS, Database, Interaction with End users

IST 603 - Chapter 9

7

7

DBMS – Database Management System

A collection of programs that manages the database structure and controls access to the data stored in the database

A database system is composed of five major parts

Hardware – all physical devices including computers, storage devices, printers, network devices, and other devices as necessary. These are outside our scope of discussion.

Software

Operating System such as MS Windows, UNIX, LINUX – not in our discussion scope

DBMS Software – manages the database within the database system. Examples: MS SQL Server, Oracle, MySqL, DB2

Application Programs and utility software

IST 603 - Chapter 9

8

8

DBMS – Database Management System

People:

System Administrators: responsible for underlying operating systems, network, and physical devices

Database Administrators: Responsible for DBMS management and support including security, access policies, and space allocation for data

Database Designers, End Users

Procedures – All instructions and rules that govern the design and use of the database system

Data – collection of data stored in the database

Schema - Descriptions of all fields, tables, and relationships

Subschema: database portions that a particular system or user needs or is allowed to access

IST 603 - Chapter 9

9

9

Database : An example logical view

IST 603 - Chapter 9

In this example, a sales database can support four separate business systems.

10

10

Database Types

Database can be organized using different structures

Hierarchical model – data is organized as a hierarchical tree

Network model – data is organized as a set of interconnected network

Relational model – data is organized into relational tables with relationships among them established through key attributes

Entity Relationship model – discussed later in this chapter

Object-oriented model – uses classes and their relationships to organize data

Big Data - very large data collection. Also called NoSQL since execution of SQL (Structured Query Language) queries prove too slow for very large databases

IST 603 - Chapter 9

11

11

Database Types: Hierarchical database

IST 603 - Chapter 9

12

A hierarchical database is a DBMS that represents data in a tree-like form. The relationship between records is one-to-many. That means, one parent node can have many child nodes. A hierarchical database model is a data model where data is stored as records but linked in a tree-like structure with the help of a parent and level. Each record has only one parent. The first record of the data model is a root record

12

Database Types: Hierarchical database

IST 603 - Chapter 9

13

<>Person Table      
<>ID Role Description ParentID
1 Author Writes, speaks, trains 0
Authors Table      
ID Name Description ParentID
101 Allen O'Neill Author writes on AI 1
102 Mahesh Chand Author writes on C# 1
103 David McCarter Author writes on .NET 1
104 Raj Kumar Author writes on AWS 1
Books Table      
ID Topic Title ParentID
1001 ADO.NET ADO.NET Programming 102
1002 GDI+ Programming GDI+ 102
1003 C# Learn C# 8.0 102
1004      

 

 

 

 

13

Database Types: Network database

IST 603 - Chapter 9

14

 

 

 

 

The diagram represents a network data model that shows that the Stores entity has relationships with multiple child entities

The Transactions entity has relationships with multiple parent entities.

A network database model allows one parent to have multiple child record sets and each record set can be linked to multiple nodes (parents) and children.

14

Entity Relationship Model

Entity: a person, place, thing, or event for which data is collected and maintained

An entity has a set of attributes that describe particular characteristics of an entity

Employee: attributes are Social Security Number, Name, Gender

A Table or file contains a set of related records that store data about a specific entity

An Employee table has records about all employees

Field (i.e., attribute): single characteristic or fact about an entity

Record (i.e., tuple): set of related fields that describes one instance of an entity

IST 603 - Chapter 9

15

15

Entity-Relationship Diagrams

Drawing an ERD

List the entities that were identified during the systems analysis phase

Consider the nature of the relationships that link them

IST 603 - Chapter 9

In an entity-relationship diagram, entities are labeled with singular nouns and relationships are labeled with verbs. The relationship is interpreted as a simple English sentence.

16

16

Entity-Relationship Diagrams

Types of relationships

One-to-one (1:1) relationship: exactly one of the second entity occurs for each instance of the first entity

IST 603 - Chapter 9

Examples of one-to-one (1:1) relationships.

17

17

Entity-Relationship Diagrams

One-to-many relationship (1:M): one occurrence of the first entity can relate to many instances of the second entity

Each instance of the second entity can associate with only one instance of the first entity

IST 603 - Chapter 9

Examples of one-to-many (1:M) relationships.

18

18

Entity-Relationship Diagrams

Many-to-many relationship (M:N): one instance of the first entity can relate to many instances of the second entity, and vice versa

IST 603 - Chapter 9

Examples of many-to-many (M:N) relationships. Notice that the event or transaction that links the two entities is an associative entity with its own set of attributes and characteristics.

19

19

Entity-Relationship Diagrams

IST 603 - Chapter 9

An entity-relationship diagram for SALES REP, CUSTOMER, ORDER, PRODUCT, and WAREHOUSE. Notice that the ORDER and PRODUCT entities are joined by an associative entity named ORDER LINE.

20

20

Entity-Relationship Diagrams

Cardinality

Describes numeric relationship between two entities

Shows how instances of one entity relate to instances of another entity

Crow’s foot notation indicates various possibilities using circles, bars, and symbols

IST 603 - Chapter 9

Crow’s foot notation is a common method of indicating cardinality. The four examples show how you can use various symbols to describe the relationships between entities.

21

21

Entity-Relationship Diagrams

IST 603 - Chapter 9

In the first example of cardinality notation, one and only one CUSTOMER can place anywhere from zero to many of the ORDER entity. In the second example, one and only one ORDER can include one ITEM ORDERED or many. In the third example, one and only one EMPLOYEE can have one SPOUSE or none. In the fourth example, one EMPLOYEE, or many employees, or none, can be assigned to one PROJECT, or many projects, or none.

22

22

Entity-Relationship Diagrams

IST 603 - Chapter 9

An ERD for a library system drawn with Visible Analyst. Notice that crow’s foot notation has been used and relationships are described in both directions.

23

23

Relational Database Model

A relational model is based on three well-defined components:

A logical data structure represented by a relation

A set of integrity rules to enforce that data remain consistent over time

A set of operations that define how data are manipulated

The concept of relation is based on predicate logic

End user think of a relation as a table – a two-dimensional structure composed of rows and columns

IST 603 - Chapter 9

24

24

Characteristics of a Relational Table

1 A table is a two-dimensional structure composed of rows and columns
2 Each table row (tuple) represents a single entity occurrence within the entity set
3 Each table column represents an attribute, and each column has a distinct name
4 Each intersection of a row and column represents a single data value
5 Each column has a specific range of values known as the attribute domain
6 All values in a column must conform to the same data format
7 The order of the rows and columns is immaterial to the DBMS
8 Each table must have an attribute or combination of attributes that uniquely identifies each row

IST 603 - Chapter 9

25

25

Not a Relational Table

The table above shows vendor data for a company

This table does not conform to a relational structure

All column and row intersection do not have the same data format

Phone information is a repeated value violating the rule of single data value for a row and column intersection

IST 603 - Chapter 9

26

VEND_CODE VEND_CONTACT VEND_AREACODE VEND_PHONE
230 Shelly K. Smithson 606 5551234
231 James Johnson 615 1234536 619 7612231 1234536
619 7612231
232 Candice Wallace 904 3426567
233 Arthur Jones 615 1233324

26

A Relational Table

This vendor table now conforms to a relational structure

VEND_CODE is an unique identifier and uniquely determines the rest of the record

VEND_CODE is the Primary Key to this relation

VEND_CODE  (VEND_CONTACT, VEND_AREACODE, VEND_PHONE)

VEND_CODE functionally determines the values of other attributes

IST 603 - Chapter 9

27

VEND_CODE VEND_CONTACT VEND_AREACODE VEND_PHONE
230 Shelly K. Smithson 606 5551234
231 James Johnson 615 1234536
232 Candice Wallace 904 3426567
233 Arthur Jones 615 1233324

27

Data Design Terms

Key fields

Primary key: field or fields that uniquely and minimally identifies a member of an entity

Candidate key: field that could serve as a primary key

Foreign key: field in one table that must match a primary key value in another table for a relationship between the two tables to exist

Secondary key: field or combination of fields that can be used to access or retrieve records

Entity Integrity – Primary Key value must be unique and no key attribute in the primary key can contain a null

IST 603 - Chapter 9

28

28

Data Design Terms

Referential integrity

Set of rules that avoids data inconsistency and quality problems

The integrity requirement can be specified when creating the database structure

IST 603 - Chapter 9

Microsoft Access allows a user to specify that referential integrity rules will be enforced in a relational database design.

29

29

Data Normalization

A good relational database software is not enough to avoid data redundancy

Normalization is a process for evaluating and correcting table structure to minimize data redundancy

Normalization stages

Unnormalized design

First normal form

Second normal form

Third normal form

IST 603 - Chapter 9

30

30

Data Normalization

Standard notation format

Used to show a table’s structure, fields, and primary key

The primary key field(s) is underlined

NAME (FIELD 1, FIELD 2, FIELD 3)

Recognition of repeating group fields is important

Repeating group: set of one or more fields that can occur any number of times in a single record

Each occurrence would possess different values

IST 603 - Chapter 9

31

31

Unnormalized Data

IST 603 - Chapter 9

In the ORDER table design, two orders have repeating groups that contain several products. ORDER is the primary key for the ORDER table, and PRODUCT NUMBER serves as a primary key for the repeating group. Because it contains repeating groups, the ORDER table is unnormalized.

32

32

First Normal Form (1NF)

First Normal Form (1NF)

Table format

Does not contain a repeating group

Primary Key (PK) identified

Converting an unnormalized design to 1NF requires expansion of the table’s primary key to include the primary key of the repeating group

IST 603 - Chapter 9

33

33

First Normal Form (1NF)

IST 603 - Chapter 9

The ORDER table as it appears in 1NF. The repeating groups have been eliminated. Notice that the repeating group for order 86223 has become three separate records, and the repeating group for order 86390 has become two separate records. The 1NF primary key is a combination of ORDER and PRODUCT NUMBER, which uniquely identifies each record.

34

34

Second Normal Form (2NF)

A table (relation) is automatically in the Second Normal Form if it has a single key attribute

A table needs to be in 1NF before being considered for 2NF

If a table has a composite key, there may be partial dependency as in the following example

The order table has a composite key: ORDER + PRODUCT NUMBER

Using ORDER, the dependent value of ORDER DATE can be determined.

Using PRODUCT NUMBER, DESCRIPTION can be determined

To convert this table to 2NF, the partial dependency need to be removed as shown in the next slide

IST 603 - Chapter 9

35

35

Data Normalization

IST 603 - Chapter 9

ORDER, PRODUCT, and ORDER LINE tables in 2NF. All fields are functionally dependent on the primary key.

36

36

Third Normal Form (3NF)

A table (relation) is in the Third Normal Form if it has no transitive dependency and satisfies the requirements of 2NF

Suppose the following three tables exist

PROJECT ( PROJ_NUM, PROJ_NAME)

EMPLOYEE( EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR)

ASSIGNMENT( PROJ_NUM, EMP_NUM, ASSIGN_HOURS)

Dependencies

PROJ_NUM  PROJ_NAME

EMP_NUM  EMP_NAME, JOB_CLASS

JOB_CLASS  CHG_HOUR

EMPLOYEE table has transitive dependency, hence not in 3NF

IST 603 - Chapter 9

37

37

Third Normal Form (3NF)

The set of tables shown in the previous slide can be converted to 3NF by removing transitive dependency

PROJECT ( PROJ_NUM, PROJ_NAME)

EMPLOYEE( EMP_NUM, EMP_NAME, JOB_CLASS)

JOB( JOB_CLASS, CHG_HOUR)

ASSIGNMENT( PROJ_NUM, EMP_NUM, ASSIGN_HOURS)

All tables are now in 3NF

IST 603 - Chapter 9

38

38

Data Normalization

IST 603 - Chapter 9

When the PRODUCT table is transformed from 2NF to 3F

The result is two separate tables:

PRODUCT and SUPPLIER.

Note that in 3NF, all fields depend on the key, the whole key, and nothing but the key!

The shown structure works so long as there is exactly one supplier for a product

39

39

Creating Databases and Accessing Data

Databases are created and managed by using languages and commands that have nothing to do with HTML

For example, MS SQL Server comes with a tool called SQL Sever Management Studio (SSMS)

This tool can be used to create databases, tables and retrieve data

CREATE DATABASE company;  Creates a database CREATE TABLE employee (EMPNO int PRIMARY KEY, FIRSTNAME varchar(20), HIREDATE date, GENDER char(1), SALARY decimal);

Two statements shown here are generally called DDL (Data Definition Language) statements.

Retrieval of data can be done using DML (Data Manipulation Language) statements.

SQL (Structured Query Language) – industry standard DDL and DML

SELECT * FROM employee;  Get all records with all fields from the employee table

IST 603 - Chapter 9

40

40

Web-Based Design

Databases can be created and managed through web pages

Web interaction with data is not limited to HTML

Databases can be created using programming languages such as C#, Java

Middleware is used to integrate different applications and allow them to exchange data

Web-based data must be secure, yet easily accessible to authorized users

IST 603 - Chapter 9

41

41

Web-Based Design

IST 603 - Chapter 9

When a client workstation requests a Web page (1), the Web server uses middleware to generate a data query to the database server (2). The database server responds (3), and middleware translates the retrieved data into an HTML page that can be sent by the Web server and displayed by the user’s browser (4).

42

42

Benefits of using a DBMS

Economy of Scale

Better utilization of data

Can use enterprise-wide databases as in Enterprise Systems such as SAP, and Oracle

Can be deployed using powerful servers and fast networks

Enterprise-wide application – The DBA maintains and manages the database for all departments, not one department

Stronger Standards – Standards for accessing, updating, and deleting data can be specified upfront with all users expected to comply

Better Security – Different level of access can and are often defined

Data Independence – Systems that interact with a DBMS are relatively independent of how the physical data is stored

IST 603 - Chapter 9

43

43

Data Storage and Access

Tools and techniques

Data warehouse

An integrated collection of data that can include seemingly unrelated information, no matter where it is stored in the company

Data mart

Designed to serve needs of a specific department

IST 603 - Chapter 9

44

44

Data Storage and Access

IST 603 - Chapter 9

A data warehouse stores data from several systems. By selecting data dimensions, a user can retrieve specific information without having to know how or where the data is stored.

45

45

Data Storage and Access

Data mining

Looks for meaningful data patterns and relationships

Suggested goals for data mining

Increase the number of pages viewed per session and referred customers

Reduce clicks to close

Increase checkouts per visit and average profit per checkout

IST 603 - Chapter 9

46

46

Data Storage and Access

Logical versus physical storage

Logical storage: data that a user can view, understand, and access, regardless of how or where that information actually is organized or stored

Physical storage: strictly hardware related

Involves the process of reading and writing binary data to physical media

IST 603 - Chapter 9

47

47

Data Storage and Access

Data coding

EBCDIC (Extended Binary Coded Decimal Interchange Code)

Mainframe computers and high-capacity servers

ASCII (American Standard Code for Information Interchange)

Used on most personal computers

Binary storage format

Represents numbers as actual binary values

Unicode: uses two bytes per character

IST 603 - Chapter 9

48

48

Data Storage and Access

IST 603 - Chapter 9

Unicode is an international coding format that represents characters as integers, using 16 bits per character. The Unicode Consortium maintains standards and support for Unicode.

49

49

Data Storage and Access

Storing dates

International Organization for Standardization (ISO) requires a format of four digits for the year, two for the month, and two for the day (YYYYMMDD)

Absolute date: total number of days from some specific base date

IST 603 - Chapter 9

50

50

Data Control

Well-designed DBMS

Must provide built-in control and security

Forms of data protection

Limited access to files and databases

User ID and passwords

Permissions and encryption

Backup copies of databases must be retained for a specified period of time

Recovery procedures can be used to restore

IST 603 - Chapter 9

51

51

Summary

A database consists of linked tables that form an overall data structure

DBMS enable users to add, update, manage, access, and analyze data in a database

DBMS designs are more powerful and flexible than traditional file-oriented systems

Components include interfaces for users, database administrators, and related systems

IST 603 - Chapter 9

52

52

Summary

Key fields include primary keys, candidate keys, foreign keys, and secondary keys

An entity-relationship diagram (ERD) is a graphic representation of all system entities and the relationships among them

Normalization is a process for avoiding problems in data design

IST 603 - Chapter 9

53

53

Summary

Logical storage is information seen through a user’s eyes, regardless of how or where that information actually is organized or stored

File and database control measures include limiting access to the data, data encryption, backup/recovery procedures, audit-trail files, and internal audit fields

IST 603 - Chapter 9

54

54

DBMS (Database management system)MetadataCustomersInvoicesProductsDatabase structureEnd-user dataSingleView of DataIntegratedApplication requestDataDataApplication request

Carlos CoronelAuthorSteven MorrisAlan DennisDatabase SystemsSystems Analysis

ORDER ORDER DATE PRODUCT NUMBER DESCRIPTION SUPPLIER NUMBER SUPPLIER NAME