Research and Data Modelling

profileSnug
lecture2.pptx

What is the difference between Data and Information? give an example.

What is metadata - give an example

What is a database and a DBMS? Are they the same thing - explain

List the main functions of DBMS

Briefly list and describe 4 types of databases – give an example of each.

List the 3 types of data anomalies associated with the file system/poor DB design and give an example of one of these anomalies

List 3 disadvantages of DBMS

List 3 different skills that would be required in a position of Database Manager/administrator

Be prepared to share these answers with the rest of the class and to write key points on board

1

Welcome Back!

Quick R E V I E W Do You Remember?

If you can’t answer these questions, make sure to go over last week’s slides and read chapter 1

©2017 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.

Play Kahoot week 2 review

Username: AntoinetteCev habibi07

Socrative.com

My gmail

habibi02

F - 20% only : 90% is when we do and in engaging in real experience also in participating

1

Chapter 2

Data Models

©2017 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.

2

Objectives

In this chapter, you will learn about:

Why data models are important

The basic data-modeling building blocks

How the major data models evolved

Types of data models based on level of abstraction

‹#›

3

Introduction

Designers, programmers, and end users see data in different ways

Different views of same data lead to bad designs

‹#›

Introduction

Data modeling reduces complexities of database design

Serves as a communication tool

Data abstraction makes a complicated problem easier to understand

‹#›

Data Modeling and Data Models

Data models

Relatively simple representations of complex real-world data structures

Often graphical

Model: an abstraction of a real-world object or event

Useful in understanding complexities of the real-world environment

‹#›

6

The Importance of Data Models

Facilitate interaction among the designer, the applications programmer, and the end user

End users have different views and needs for data

Data models organize data for various users

‹#›

The Importance of Data Models

Data models communicate requirements

Data models are abstractions

Cannot draw required data out of the data model

Data modeling is iterative and progressive

Until a blueprint is produced

‹#›

Importance of Data Models

9

‹#›

9

Are a communication tool

Give an overall view of the database

Organize data for various users

Are an abstraction for the creation of good database

Data Model Basic Building Blocks

Entity: anything about which data are to be collected and stored.

Attribute: a characteristic of an entity.

Relationships: describes an association among entities

Constraint: a restriction placed on the data

‹#›

10

Data Model Basic Building Blocks

Relationships :

1:1

1:M

M:N

Examples –

One-to-many (1:M) a customer places many orders but an order is made by one customer

Many-to-many (M:N or M:M) relationship

One-to-one (1:1) an employee manages one store

Constraint: a restriction placed on the data

A student GPA must be between 0-4

‹#›

11

an agent can serve many customers

The Evolution of Data Models

‹#›

The Evolution of Data Models: the Hierarchical Model

The hierarchical model was developed in the 1960s to manage large amounts of data for manufacturing projects

Basic logical structure is represented by an upside-down “tree”

Hierarchical structure contains levels or segments

Segment analogous to a record type

Set of one-to-many relationships between segments

‹#›

13

The Network Model

The network model was created to represent complex data relationships more effectively than the hierarchical model

Improves database performance

Imposes a database standard

Resembles hierarchical model

However, record may have more than one parent

‹#›

14

The Network Model (cont’d.)

Disadvantages of the network model:

Cumbersome

Lack of ad hoc query capability placed burden on programmers to generate code for reports

Structural change in the database could produce havoc in all application programs

‹#›

15

Hierarchical Model

Advantages

Disadvantages

Promotes data sharing

Parent/child relationship promotes conceptual simplicity and data integrity

Database security is provided and enforced by DBMS

Efficient with 1:M relationships

Requires knowledge of physical data storage characteristics

Navigational system requires knowledge of hierarchical path

Changes in structure require changes in all application programs

Implementation limitations

No data definition

Lack of standards

Reason? Lacks structural independence

Access to a child segment can only be done through the parent segment

Self-reading

‹#›

Network Model

Advantages

Disadvantages

Conceptual simplicity

Handles more relationship types

Data access is flexible

Data owner/member relationship promotes data integrity

Conformance to standards

Includes data definition language (DDL) and data manipulation language (DML)

System complexity limits efficiency

Navigational system yields complex implementation, application development, and management

Structural changes require changes in all application programs

For example: records are accessed one at a time, requiring database designers, administrators, and programmers to be familiar with the internal structure of the DB.

Self-reading

‹#›

The Relational Model

Developed by E.F. Codd (IBM) in 1970

Table (relations)

Matrix consisting of row/column intersections

Each row in a relation is called a tuple

Relational models were considered impractical in 1970

Model was conceptually simple at expense of computer overhead

Computers lacked power

Our FOCUS

‹#›

18

The Relational Model (cont’d.)

Relational data management system (RDBMS)

Performs same functions provided by hierarchical model

Hides complexity from the user

Relational diagram

Representation of entities, attributes, and relationships

Relational table stores collection of related entities

‹#›

19

‹#›

20

‹#›

21

The Relational Model (cont’d.)

Another success factor of relational data model is the ability to create queries

SQL-based relational database application involves three parts:

User interface

Allows end user to interact with the data

Set of tables stored in the database

Each table is independent from another

Rows in different tables are related based on common values in common attributes

SQL “engine” executes all queries

‹#›

22

Relational Model

Advantages

Disadvantages

Structural independence is promoted using independent tables

Tabular view improves conceptual simplicity

Ad hoc query capability is based on SQL

Isolates the end user from physical-level details

Improves implementation and management simplicity

Requires substantial hardware and system software overhead

Conceptual simplicity gives untrained people the tools to use a good system poorly

May promote information problems

23

‹#›

The Entity Relationship Model

Widely accepted standard for data modeling

Introduced by Chen in 1976

Graphical representation of entities and their relationships in a database structure

‹#›

24

The Entity Relationship Model

Entity relationship diagram (ERD)

Uses graphic representations to model database components

Entity is mapped to a relational table

‹#›

25

The Entity Relationship Model

EXAMPLE: Entity relationship diagram (ERD)

How do we store data about students and courses?

What room is a class scheduled at a particular time?

‹#›

26

The Entity Relationship Model

EXAMPLE: Entity relationship diagram (ERD)

How do we store data about products and vendors?

Minimum details needed: Vendor Name, Vendor contact, Product Description & price

Queries: How do we get quick answers?

Who are the customers who bought products on a certain date and were sent invoices and they paid for their invoice?

‹#›

27

Quick answers to ad hoc questions

How do we find out what products are supplied by a vendor?

Ie: John supplies Keyboards and Wrapping paper….

‹#›

28

Add table example

The Entity Relationship Model (cont’d.)

Entity instance (or occurrence) is row in table - John supplies Keyboards and Wrapping paper….

‹#›

29

The Entity Relationship Model (cont’d.)

Entity set is collection of like entities

ER use ‘connectivity’ to label types of relationships

Relationships are expressed using special notations – ie Chen, Crows Foot, UML

Crow’s Foot notations are used as design standard in this texbook

‹#›

30

Entity Relationship Model

Advantages

Disadvantages

Visual modeling yields conceptual simplicity

Visual representation makes it an effective communication tool

Is integrated with the dominant relational model

Limited constraint representation

Limited relationship representation

No data manipulation language

Loss of information content occurs when attributes are removed from entities to avoid crowded displays

‹#›

Figure 2.3 - The ER Model Notations

32

‹#›

The Object-Oriented (OO) Model

OODM (object-oriented data model) is the basis for OODBMS

Semantic data model

An object:

Contains operations

Are self-contained: a basic building-block for autonomous structures

Is an abstraction of a real-world entity

‹#›

33

The Object-Oriented (OO) Model (cont’d.)

Attributes describe the properties of an object

Objects that share similar characteristics are grouped in classes

Classes are organized in a class hierarchy

Inheritance: object inherits methods and attributes of parent class

UML based on OO concepts that describe diagrams and symbols

Used to graphically model a system

‹#›

34

Object-Oriented Model

Advantages

Disadvantages

Semantic content is added

Visual representation includes semantic content

Inheritance promotes data integrity

Slow development of standards caused vendors to supply their own enhancements

Compromised widely accepted standard

Complex navigational system

Learning curve is steep

High system overhead slows transactions

‹#›

Figure 2.4 - A Comparison of OO, UML, and ER Models

36

‹#›

Big Data

Aims to:

Find new and better ways to manage large amounts of web and sensor-generated data

Provide high performance and scalability at a reasonable cost

Characteristics

Volume

Velocity

Variety

VALUE …

37

‹#›

37

Big Data Challenges

38

‹#›

38

Volume does not allow the usage of conventional structures

Expensive

OLAP tools proved inconsistent dealing with unstructured data

Big Data New Technologies Video on Hadoop: http://youtu.be/4DgTLaFNQq0

39

‹#›

http://youtu.be/4DgTLaFNQq0

39

Hadoop

Hadoop Distributed File System (HDFS)

MapReduce

NoSQL

NoSQL Databases

Not based on the relational model

Support distributed database architectures

Provide high scalability, high availability, and fault tolerance

Support large amounts of sparse data

Geared toward performance rather than transaction consistency

Store data in key-value stores

40

‹#›

40

NoSQL

Advantages

Disadvantages

High scalability, availability, and fault tolerance are provided

Uses low-cost commodity hardware

Supports Big Data

4. Key-value model improves storage efficiency

Complex programming is required

There is no relationship support

There is no transaction integrity support

‹#›

Figure 2.5 - A Simple Key-value Representation

42

‹#›

Example: how would you store images, short videos, …?

42

Added- Comparison betw RDBs and No SQL DB

Database Systems, 9th Edition

‹#›

Rel DBs store values in rows whereas NoSQL stores values in columns – one can search values in columns

New generation systems use column key storage – columns - entries in col are indexed and this allows users to fetch only one part of the table rather than the whole table.

https://novom.ru/en/watch/KpP4JtD7LB4

Example of keyvalue stores Casandra, Hbase, BigTable

Document oriented DB: Mongo DB

Netflix uses Casandra to keep track of your current position in the video you are watching

Good Info

https://study.com/academy/lesson/nosql-databases-design-types.html

43

Newer Data Models: Object/Relational and XML

Extended relational data model (ERDM)

Semantic data model developed in response to increasing complexity of applications

Includes many of OO model’s best features

Often described as an object/relational database management system (O/RDBMS)

Primarily geared to business applications

‹#›

44

Newer Data Models: Object/Relational and XML (cont’d.)

The Internet revolution created the potential to exchange critical business information

In this environment, Extensible Markup Language (XML) emerged as the de facto standard

Current databases support XML

XML: the standard protocol for data exchange among systems and Internet services

‹#›

XML Database is used to store huge amount of information in the XML format. As the use of XML is increasing in every field, it is required to have a secured place to store the XML documents. The data stored in the database can be queried using XQuery, serialized, and exported into a desired format.

45

The Future of Data Models

Hybrid DBMSs

Retain advantages of relational model

Provide object-oriented view of the underlying data

SQL data services

Store data remotely without incurring expensive hardware, software, and personnel costs

Companies operate on a “pay-as-you-go” system “cloud-based”

‹#›

Review

How have data models evolved?

Database Systems, 9th Edition

‹#›

Figure 2.6 - The Evolution of Data Models

48

‹#›

48

https://youtu.be/wR0jg0eQsZA

Watch this video:  Database Overview

Table 2.3 - Data Model Basic Terminology Comparison – Self Reading

49

‹#›

Figure 2.7 - Data Abstraction Levels ANSI SPARC 3 -TIER Architecture ---

50

‹#›

50

ANSI SPARC : American National Standards Institute, Standards Planning And Requirements Committee, is an abstract design standard for a Database Management System (DBMS), first proposed in 1975.

The three-level ANSI SPARC Database Architecture suggests three data abstraction levels, namely, external, conceptual, and internal levels.

The Architecture of most of commercial dbms are available today is mostly based on this ANSI-SPARC database architecture . ANSI SPARC THREE-TIER architecture has main three levels:

Internal Level

Conceptual Level

External Level

These three levels provide data abstraction ;means hide the low level complexities from end users . A database system should be efficient in performance and convenient in use. Using these three levels,it is possible to use complex structures at internal level for efficient operations and to provide simpler convenient interface at external level.

Figure 2.8 - External Models for Tiny College

51

End users’ view of the data environment

ER diagrams are used to represent the external views

External schema: Specific representation of an external view

‹#›

51

The Conceptual Model

Logical design: Task of creating a conceptual data model - arranging data into a series of logical relationships called entities and attributes.

Is software and hardware independent

52

Represents a global view of the entire database by the entire organization

Conceptual schema: Basis for the identification and high-level description of the main data objects

Has a macro-level view of data environment

‹#›

The process of logical design involves arranging data into a series of logical relationships called entities and attributes. An entity represents a chunk of information. In relational databases, an entity often maps to a table. An attribute is a component of an entity and helps define the uniqueness of the entity.

52

The Internal Model

Representing database as seen by the DBMS mapping conceptual model to the DBMS

Internal schema: Specific representation of an internal model

Uses the database constructs supported by the chosen database

Is software dependent and hardware independent

Logical independence: Changing internal model without affecting the conceptual model

53

‹#›

Figure 2.10 - Internal Model for Tiny College

54

‹#›

54

The Physical Model

Operates at lowest level of abstraction

Describes the way data are saved on storage media such as disks or tapes

Requires the definition of physical storage and data access methods

Relational model aimed at logical level

Does not require physical-level details

Physical independence: Changes in physical model do not affect internal model

55

‹#›

55

Table 2.4 - Levels of Data Abstraction

Cengage Learning © 2015

56

‹#›

Chapter Summary

A data model is an abstraction of a complex real-world data environment

Basic data modeling components:

Entities

Attributes

Relationships

Constraints

Business rules identify and define basic modeling components

‹#›

57

Chapter Summary (cont’d.)

Hierarchical model

Set of one-to-many (1:M) relationships between a parent and its children segments

Network data model

Uses sets to represent 1:M relationships between record types

Relational model

Current database implementation standard

ER model is a tool for data modeling

Complements relational model

‹#›

58

Chapter Summary (cont’d.)

Object-oriented data model: object is basic modeling structure

Relational model adopted object-oriented extensions: extended relational data model (ERDM)

OO data models depicted using UML

Data-modeling requirements are a function of different data views and abstraction levels

Three abstraction levels: external, conceptual, internal

‹#›

59

Tutorial

Start working on tutorial 2

Make sure to ask questions if you do not understand a question or a topic

60

‹#›