Research and Data Modelling
What is the difference between Data and Information? give an example.
What is metadata - give an example
What is a database and a DBMS? Are they the same thing - explain
List the main functions of DBMS
Briefly list and describe 4 types of databases – give an example of each.
List the 3 types of data anomalies associated with the file system/poor DB design and give an example of one of these anomalies
List 3 disadvantages of DBMS
List 3 different skills that would be required in a position of Database Manager/administrator
Be prepared to share these answers with the rest of the class and to write key points on board
1
Welcome Back!
Quick R E V I E W Do You Remember?
If you can’t answer these questions, make sure to go over last week’s slides and read chapter 1
©2017 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Play Kahoot week 2 review
Username: AntoinetteCev habibi07
Socrative.com
My gmail
habibi02
F - 20% only : 90% is when we do and in engaging in real experience also in participating
1
Chapter 2
Data Models
©2017 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
2
Objectives
In this chapter, you will learn about:
Why data models are important
The basic data-modeling building blocks
How the major data models evolved
Types of data models based on level of abstraction
‹#›
3
Introduction
Designers, programmers, and end users see data in different ways
Different views of same data lead to bad designs
‹#›
Introduction
Data modeling reduces complexities of database design
Serves as a communication tool
Data abstraction makes a complicated problem easier to understand
‹#›
Data Modeling and Data Models
Data models
Relatively simple representations of complex real-world data structures
Often graphical
Model: an abstraction of a real-world object or event
Useful in understanding complexities of the real-world environment
‹#›
6
The Importance of Data Models
Facilitate interaction among the designer, the applications programmer, and the end user
End users have different views and needs for data
Data models organize data for various users
‹#›
The Importance of Data Models
Data models communicate requirements
Data models are abstractions
Cannot draw required data out of the data model
Data modeling is iterative and progressive
Until a blueprint is produced
‹#›
Importance of Data Models
9
‹#›
9
Are a communication tool
Give an overall view of the database
Organize data for various users
Are an abstraction for the creation of good database
Data Model Basic Building Blocks
Entity: anything about which data are to be collected and stored.
Attribute: a characteristic of an entity.
Relationships: describes an association among entities
Constraint: a restriction placed on the data
‹#›
10
Data Model Basic Building Blocks
Relationships :
1:1
1:M
M:N
Examples –
One-to-many (1:M) a customer places many orders but an order is made by one customer
Many-to-many (M:N or M:M) relationship
One-to-one (1:1) an employee manages one store
Constraint: a restriction placed on the data
A student GPA must be between 0-4
‹#›
11
an agent can serve many customers
The Evolution of Data Models
‹#›
The Evolution of Data Models: the Hierarchical Model
The hierarchical model was developed in the 1960s to manage large amounts of data for manufacturing projects
Basic logical structure is represented by an upside-down “tree”
Hierarchical structure contains levels or segments
Segment analogous to a record type
Set of one-to-many relationships between segments
‹#›
13
The Network Model
The network model was created to represent complex data relationships more effectively than the hierarchical model
Improves database performance
Imposes a database standard
Resembles hierarchical model
However, record may have more than one parent
‹#›
14
The Network Model (cont’d.)
Disadvantages of the network model:
Cumbersome
Lack of ad hoc query capability placed burden on programmers to generate code for reports
Structural change in the database could produce havoc in all application programs
‹#›
15
Hierarchical Model
Advantages
Disadvantages
Promotes data sharing
Parent/child relationship promotes conceptual simplicity and data integrity
Database security is provided and enforced by DBMS
Efficient with 1:M relationships
Requires knowledge of physical data storage characteristics
Navigational system requires knowledge of hierarchical path
Changes in structure require changes in all application programs
Implementation limitations
No data definition
Lack of standards
Reason? Lacks structural independence
Access to a child segment can only be done through the parent segment
Self-reading
‹#›
Network Model
Advantages
Disadvantages
Conceptual simplicity
Handles more relationship types
Data access is flexible
Data owner/member relationship promotes data integrity
Conformance to standards
Includes data definition language (DDL) and data manipulation language (DML)
System complexity limits efficiency
Navigational system yields complex implementation, application development, and management
Structural changes require changes in all application programs
For example: records are accessed one at a time, requiring database designers, administrators, and programmers to be familiar with the internal structure of the DB.
Self-reading
‹#›
The Relational Model
Developed by E.F. Codd (IBM) in 1970
Table (relations)
Matrix consisting of row/column intersections
Each row in a relation is called a tuple
Relational models were considered impractical in 1970
Model was conceptually simple at expense of computer overhead
Computers lacked power
Our FOCUS
‹#›
18
The Relational Model (cont’d.)
Relational data management system (RDBMS)
Performs same functions provided by hierarchical model
Hides complexity from the user
Relational diagram
Representation of entities, attributes, and relationships
Relational table stores collection of related entities
‹#›
19
‹#›
20
‹#›
21
The Relational Model (cont’d.)
Another success factor of relational data model is the ability to create queries
SQL-based relational database application involves three parts:
User interface
Allows end user to interact with the data
Set of tables stored in the database
Each table is independent from another
Rows in different tables are related based on common values in common attributes
SQL “engine” executes all queries
‹#›
22
Relational Model
Advantages
Disadvantages
Structural independence is promoted using independent tables
Tabular view improves conceptual simplicity
Ad hoc query capability is based on SQL
Isolates the end user from physical-level details
Improves implementation and management simplicity
Requires substantial hardware and system software overhead
Conceptual simplicity gives untrained people the tools to use a good system poorly
May promote information problems
23
‹#›
The Entity Relationship Model
Widely accepted standard for data modeling
Introduced by Chen in 1976
Graphical representation of entities and their relationships in a database structure
‹#›
24
The Entity Relationship Model
Entity relationship diagram (ERD)
Uses graphic representations to model database components
Entity is mapped to a relational table
‹#›
25
The Entity Relationship Model
EXAMPLE: Entity relationship diagram (ERD)
How do we store data about students and courses?
What room is a class scheduled at a particular time?
‹#›
26
The Entity Relationship Model
EXAMPLE: Entity relationship diagram (ERD)
How do we store data about products and vendors?
Minimum details needed: Vendor Name, Vendor contact, Product Description & price
Queries: How do we get quick answers?
Who are the customers who bought products on a certain date and were sent invoices and they paid for their invoice?
‹#›
27
Quick answers to ad hoc questions
How do we find out what products are supplied by a vendor?
Ie: John supplies Keyboards and Wrapping paper….
‹#›
28
Add table example
The Entity Relationship Model (cont’d.)
Entity instance (or occurrence) is row in table - John supplies Keyboards and Wrapping paper….
‹#›
29
The Entity Relationship Model (cont’d.)
Entity set is collection of like entities
ER use ‘connectivity’ to label types of relationships
Relationships are expressed using special notations – ie Chen, Crows Foot, UML
Crow’s Foot notations are used as design standard in this texbook
‹#›
30
Entity Relationship Model
Advantages
Disadvantages
Visual modeling yields conceptual simplicity
Visual representation makes it an effective communication tool
Is integrated with the dominant relational model
Limited constraint representation
Limited relationship representation
No data manipulation language
Loss of information content occurs when attributes are removed from entities to avoid crowded displays
‹#›
Figure 2.3 - The ER Model Notations
32
‹#›
The Object-Oriented (OO) Model
OODM (object-oriented data model) is the basis for OODBMS
Semantic data model
An object:
Contains operations
Are self-contained: a basic building-block for autonomous structures
Is an abstraction of a real-world entity
‹#›
33
The Object-Oriented (OO) Model (cont’d.)
Attributes describe the properties of an object
Objects that share similar characteristics are grouped in classes
Classes are organized in a class hierarchy
Inheritance: object inherits methods and attributes of parent class
UML based on OO concepts that describe diagrams and symbols
Used to graphically model a system
‹#›
34
Object-Oriented Model
Advantages
Disadvantages
Semantic content is added
Visual representation includes semantic content
Inheritance promotes data integrity
Slow development of standards caused vendors to supply their own enhancements
Compromised widely accepted standard
Complex navigational system
Learning curve is steep
High system overhead slows transactions
‹#›
Figure 2.4 - A Comparison of OO, UML, and ER Models
36
‹#›
Big Data
Aims to:
Find new and better ways to manage large amounts of web and sensor-generated data
Provide high performance and scalability at a reasonable cost
Characteristics
Volume
Velocity
Variety
VALUE …
37
‹#›
37
Big Data Challenges
38
‹#›
38
Volume does not allow the usage of conventional structures
Expensive
OLAP tools proved inconsistent dealing with unstructured data
Big Data New Technologies Video on Hadoop: http://youtu.be/4DgTLaFNQq0
39
‹#›
http://youtu.be/4DgTLaFNQq0
39
Hadoop
Hadoop Distributed File System (HDFS)
MapReduce
NoSQL
NoSQL Databases
Not based on the relational model
Support distributed database architectures
Provide high scalability, high availability, and fault tolerance
Support large amounts of sparse data
Geared toward performance rather than transaction consistency
Store data in key-value stores
40
‹#›
40
NoSQL
Advantages
Disadvantages
High scalability, availability, and fault tolerance are provided
Uses low-cost commodity hardware
Supports Big Data
4. Key-value model improves storage efficiency
Complex programming is required
There is no relationship support
There is no transaction integrity support
‹#›
Figure 2.5 - A Simple Key-value Representation
42
‹#›
Example: how would you store images, short videos, …?
42
Added- Comparison betw RDBs and No SQL DB
Database Systems, 9th Edition
‹#›
Rel DBs store values in rows whereas NoSQL stores values in columns – one can search values in columns
New generation systems use column key storage – columns - entries in col are indexed and this allows users to fetch only one part of the table rather than the whole table.
https://novom.ru/en/watch/KpP4JtD7LB4
Example of keyvalue stores Casandra, Hbase, BigTable
Document oriented DB: Mongo DB
Netflix uses Casandra to keep track of your current position in the video you are watching
Good Info
https://study.com/academy/lesson/nosql-databases-design-types.html
43
Newer Data Models: Object/Relational and XML
Extended relational data model (ERDM)
Semantic data model developed in response to increasing complexity of applications
Includes many of OO model’s best features
Often described as an object/relational database management system (O/RDBMS)
Primarily geared to business applications
‹#›
44
Newer Data Models: Object/Relational and XML (cont’d.)
The Internet revolution created the potential to exchange critical business information
In this environment, Extensible Markup Language (XML) emerged as the de facto standard
Current databases support XML
XML: the standard protocol for data exchange among systems and Internet services
‹#›
XML Database is used to store huge amount of information in the XML format. As the use of XML is increasing in every field, it is required to have a secured place to store the XML documents. The data stored in the database can be queried using XQuery, serialized, and exported into a desired format.
45
The Future of Data Models
Hybrid DBMSs
Retain advantages of relational model
Provide object-oriented view of the underlying data
SQL data services
Store data remotely without incurring expensive hardware, software, and personnel costs
Companies operate on a “pay-as-you-go” system “cloud-based”
‹#›
Review
How have data models evolved?
Database Systems, 9th Edition
‹#›
Figure 2.6 - The Evolution of Data Models
48
‹#›
Table 2.3 - Data Model Basic Terminology Comparison – Self Reading
49
‹#›
Figure 2.7 - Data Abstraction Levels ANSI SPARC 3 -TIER Architecture ---
50
‹#›
50
ANSI SPARC : American National Standards Institute, Standards Planning And Requirements Committee, is an abstract design standard for a Database Management System (DBMS), first proposed in 1975.
The three-level ANSI SPARC Database Architecture suggests three data abstraction levels, namely, external, conceptual, and internal levels.
The Architecture of most of commercial dbms are available today is mostly based on this ANSI-SPARC database architecture . ANSI SPARC THREE-TIER architecture has main three levels:
Internal Level
Conceptual Level
External Level
These three levels provide data abstraction ;means hide the low level complexities from end users . A database system should be efficient in performance and convenient in use. Using these three levels,it is possible to use complex structures at internal level for efficient operations and to provide simpler convenient interface at external level.
Figure 2.8 - External Models for Tiny College
51
End users’ view of the data environment
ER diagrams are used to represent the external views
External schema: Specific representation of an external view
‹#›
51
The Conceptual Model
Logical design: Task of creating a conceptual data model - arranging data into a series of logical relationships called entities and attributes.
Is software and hardware independent
52
Represents a global view of the entire database by the entire organization
Conceptual schema: Basis for the identification and high-level description of the main data objects
Has a macro-level view of data environment
‹#›
The process of logical design involves arranging data into a series of logical relationships called entities and attributes. An entity represents a chunk of information. In relational databases, an entity often maps to a table. An attribute is a component of an entity and helps define the uniqueness of the entity.
52
The Internal Model
Representing database as seen by the DBMS mapping conceptual model to the DBMS
Internal schema: Specific representation of an internal model
Uses the database constructs supported by the chosen database
Is software dependent and hardware independent
Logical independence: Changing internal model without affecting the conceptual model
53
‹#›
Figure 2.10 - Internal Model for Tiny College
54
‹#›
54
The Physical Model
Operates at lowest level of abstraction
Describes the way data are saved on storage media such as disks or tapes
Requires the definition of physical storage and data access methods
Relational model aimed at logical level
Does not require physical-level details
Physical independence: Changes in physical model do not affect internal model
55
‹#›
55
Table 2.4 - Levels of Data Abstraction
Cengage Learning © 2015
56
‹#›
Chapter Summary
A data model is an abstraction of a complex real-world data environment
Basic data modeling components:
Entities
Attributes
Relationships
Constraints
Business rules identify and define basic modeling components
‹#›
57
Chapter Summary (cont’d.)
Hierarchical model
Set of one-to-many (1:M) relationships between a parent and its children segments
Network data model
Uses sets to represent 1:M relationships between record types
Relational model
Current database implementation standard
ER model is a tool for data modeling
Complements relational model
‹#›
58
Chapter Summary (cont’d.)
Object-oriented data model: object is basic modeling structure
Relational model adopted object-oriented extensions: extended relational data model (ERDM)
OO data models depicted using UML
Data-modeling requirements are a function of different data views and abstraction levels
Three abstraction levels: external, conceptual, internal
‹#›
59
Tutorial
Start working on tutorial 2
Make sure to ask questions if you do not understand a question or a topic
60
‹#›