basic questions about database theory..

profileuatoooonec
database.zip

1. Introduction to Databases.pdf

CMSC 508 Database Theory

CMSC 508 Database Theory

Introduction to Databases

Dr. Alberto Cano Assistant Professor Department of Computer Science

Chapter 1 from Database System Concepts, 6th Ed. by Silberschatz, Korth, Sudarshan, 2011 Chapter 1 from Database Management Systems, 3rd Ed. by Ramakrishnan, Gehrke, 2003

Introduction to Databases

1

CMSC 508 Database Theory Introduction to Databases

 Database Management System (DBMS) • Collection of interrelated data • Set of programs to access the data • An environment that is convenient, efficient, and reliable

 Databases model the real world • Entities: abstractions of data • Relationships: semantic information between entities

 Historical perspective • 1960s Network data model • 1960s Hierarchical data model • 1970s Relational data model • 1980s SQL standardization • 1990s Enterprise resource planning (ERP)

Management resource planning (MRP) 2

CMSC 508 Database Theory Introduction to Databases

 Example: University database • Add new students, instructors, and courses • Register students for courses, and generate class rosters • Assign grades to students, compute GPA

3

CMSC 508 Database Theory Introduction to Databases

 Example: University database • Add new students, instructors, and courses • Register students for courses, and generate class rosters • Assign grades to students, compute GPA

 Drawbacks of using file systems to store data • Data redundancy and inconsistency in multiple files and formats • Difficulty in accessing data • Need to write a new program to carry out each new task • Data isolation • Integrity problems and constraints • Atomicity of updates • Concurrent access by multiple users • Security problems

Do not underestimate the power of Excel for a small business!! 4

CMSC 508 Database Theory Introduction to Databases

 Advantages of a DBMS • Data Independence • Efficient Data Access • Data Integrity • Security and authorization • Data Administration • Concurrent Access and Crash Recovery • Reduced Application Development Time

 Schema: logical structure of the database • Conceptual schema: describes the data model • Physical schema: defines how the conceptual schema is actually

stored in data structures on disks

 Instance: actual contents of the database at a given point in time

5

CMSC 508 Database Theory Introduction to Databases

 Levels of Abstraction • Physical level

Describes how data is stored • Logical level

Describes data and relationships • View level

Application interfaces

 Physical Data Independence (physical schema - logical schema) • Applications depend on the views and logical schema interfaces

Physical Schema

Logical Schema

View 1 View 2 View 3

6

CMSC 508 Database Theory Introduction to Databases

 Data Models • A collection of tools for describing

o Data o Data relationships o Data semantics o Data constraints

• Relational model • Entity-Relationship data model • Object-oriented data models • Semistructured data model (XML) • Other older models:

o Network model o Hierarchical model

7

CMSC 508 Database Theory Introduction to Databases

 Relational model • Collection of tables to represent both data and the relationships • Tables have multiple columns with unique names • Tables contain records (tuples) that correspond to the attributes • Tables are called relations

Columns

Rows

8

CMSC 508 Database Theory Introduction to Databases

 Attribute-Relation File Format (ARFF) • Developed by Weka team for Machine Learning and Data Mining • Relational model in a text file that describes a list of instances

sharing a set of attributes

@RELATION iris @ATTRIBUTE sepallength REAL @ATTRIBUTE sepalwidth REAL @ATTRIBUTE petallength REAL @ATTRIBUTE petalwidth REAL @ATTRIBUTE class {Iris-setosa,Iris-versicolor,Iris-virginica} @DATA 5.1,3.5,1.4,0.2,Iris-setosa 4.9,3.0,1.4,0.2,Iris-setosa 5.5,2.3,4.0,1.3,Iris-versicolor 6.5,2.8,4.6,1.5,Iris-versicolor 5.7,2.8,4.5,1.3,Iris-versicolor 6.5,3.0,5.8,2.2,Iris-virginica 7.6,3.0,6.6,2.1,Iris-virginica 4.9,2.5,4.5,1.7,Iris-virginica 7.3,2.9,6.3,1.8,Iris-virginica

9

CMSC 508 Database Theory Introduction to Databases

 Data Definition Language (DDL) • Specification notation for defining the database schema

Example: create table instructor ( ID char(5), name varchar(20), dept_name varchar(20), salary numeric(8,2))

• DDL compiler generates a set of tables stored in a data dictionary • Data dictionary contains metadata

o Database schema o Integrity constraints

• Primary key • Referential integrity

o Authorization

10

CMSC 508 Database Theory Introduction to Databases

 Data Manipulation Language (DML) • Language for accessing and manipulating the data • Two classes of languages

o Procedural • Specifies what data is required and how to get those data

o Declarative • Specifies what data is required without specifying how

• SQL is the most widely used query language (declarative)

Example: select name from instructor where instructor.ID = ‘22222’

 Application programs generally access databases through: • Language extensions to allow embedded SQL • Application program interface (e.g., ODBC/JDBC) which allow SQL

queries to be sent to a database (drivers) 11

CMSC 508 Database Theory Introduction to Databases

 Database Design • Logical Design

• Deciding the specification on the database schema • Effective and efficient collection of relation schemas • What relations, attributes, and their types should we have

• Physical Design • Deciding on the physical layout of the database

• Software Design perspective 1. Analysis of requirements 2. Functional and nonfunctional analysis 3. Design synthesis

12

CMSC 508 Database Theory Introduction to Databases

 Is there any problem with this design?

13

CMSC 508 Database Theory Introduction to Databases

 Database design major issues • Isolation • Inconsistency

o Different rows should different content for an attribute that should have the same value

• Integrity o A given value is not valid according to the specification

 Normalization theory • Process of organizing the columns and tables of a relational

database to minimize data redundancy • Normalization decomposes a table into smaller tables without

losing information • References are defined using primary keys and foreign keys

14

CMSC 508 Database Theory Introduction to Databases

 The Entity-Relationship model • Data modelling as a collection of entities and relationships • Represented diagrammatically by an entity-relationship diagram • Unified Modeling Language (UML) • Entity sets are represented by a rectangular box • Relationship sets are represented by a diamond connecting

related entity sets

• Entities are translated into tables • How do we translate the relationship?

15

CMSC 508 Database Theory Introduction to Databases

 Object-Relational Data Models • Objects, classes and inheritance are directly supported in

database schemas and in the query language

• Extend the relational data model by including object orientation and constructs to deal with added data types

• Allow attributes of tuples to have complex types, including non- atomic values such as nested relations

• Preserve relational foundations, in particular the declarative access to data, while extending modeling power

• Provide upward compatibility with existing relational languages

16

CMSC 508 Database Theory Introduction to Databases

 Semistructured Data Model: XML - Extensible Markup Language • Originally intended as a document markup language • The ability to specify new tags, and to create nested tag

structures made XML a great way to exchange data • XML has become the basis for new generation data interchange • A wide variety of tools is available for parsing, browsing and

querying XML documents/data

<?xml version="1.0"?> <catalog>

<book id="bk101"> <author>Gambardella, Matthew</author> <title>XML Developer's Guide</title> <genre>Computer</genre> <price>44.95</price> <publish_date>2000-10-01</publish_date> <description>An in-depth look at creating applications with XML.</description>

</book> </catalog>

17

CMSC 508 Database Theory Introduction to Databases

 Storage Management • Storage manager is a program that provides the interface

between the low-level data stored in the database and the application programs and queries submitted to the system.

• The storage manager is responsible to the following tasks:

o Interaction with the file manager

o Efficient storing, retrieving and updating of data

• Issues:

o Storage access

o File organization

o Indexing and hashing

18

CMSC 508 Database Theory Introduction to Databases

 Query processing 1. Parsing and translation 2. Optimization 3. Evaluation

• Cost difference between equivalent queries can be enormous 19

CMSC 508 Database Theory Introduction to Databases

 Transaction management and concurrency • System failures (“If anything can go wrong, it will”)

o DBMS provides fault-tolerant design using transactions

• Concurrency o Interleaving actions of different users leads to inconsistency

20

CMSC 508 Database Theory Introduction to Databases

 Transaction management and concurrency • Transaction: collection of operations that performs a single

logical function (atomic) in a database application

• Transaction-management component ensures that the database remains in a consistent (correct) state despite system failures (e.g., power failures and operating system crashes) and transaction failures

• Concurrency-control manager controls the interaction among the concurrent transactions, to ensure the consistency of the database

21

CMSC 508 Database Theory Introduction to Databases

 ACID properties • Atomicity: Either all operations of the transaction are properly

reflected in the database or none are

• Consistency: Execution of a transaction in isolation preserves the consistency of the database

• Isolation: Although multiple transactions may execute concurrently, each transaction must be unaware of other concurrently executing transactions

For every pair of transactions Ti and Tj, it appears to Ti that either Tj, finished execution before Ti started, or Tj started execution after Ti finished

• Durability: After a transaction completes successfully, the changes it has made to the database persist, even if there are system failures

22

CMSC 508 Database Theory Introduction to Databases

 Database users and administrators • End users use application interfaces (e.g. web user)

• Application programmers write functionality (e.g. webmasters)

• Analysts use query tools (e.g. statistical business analyst)

• Database administrator (DBA)

o Designs logical /physical schemas

o Handles security and authorization

o Data availability, crash recovery

o Database tuning as needs evolve

• Authorization and abstraction level depend on the user type

23

CMSC 508 Database Theory Introduction to Databases

 Database system internals

24

CMSC 508 Database Theory Introduction to Databases

 Database architecture • Centralized: run locally on a single computer • Parallel / distributed • Most databases today use client-server architecture over network

25

CMSC 508 Database Theory Introduction to Databases

 Summary • DBMS consists of a collection of interrelated data and programs

• DBMS provide an environment that is both convenient and efficient to use in retrieving and storing information

• A major purpose of a database system is to provide users with an abstract view of the data

• The relational data model is the most widely deployed model for storing data in databases

• DDL and DML allow users to specify, access and manipulate data

• Transaction management ensures that the database remains in a consistent (correct) state despite system failures

• There are four different types of database-system users, differentiated by the way to interact with the system

26

CMSC 508 Database Theory Introduction to Databases

 Review Terms

Database management system (DBMS) Data inconsistency Consistency constraints Data abstraction Instance Schema Physical schema Logical schema Physical data independence Data models Entity-relationship model Relational data model Object-based data model Semistructured data model Database languages Data-definition language

Data-manipulation language Query language Metadata Application program Normalization Data dictionary Storage manager Query processor Transactions ACID properties Atomicity Failure recovery Concurrency control Two- and three-tier architectures Database administrator (DBA) Tuple

27

CMSC 508 Database Theory

CMSC 508 Database Theory

Introduction to Databases

Dr. Alberto Cano Assistant Professor Department of Computer Science

Chapter 1 from Database System Concepts, 6th Ed. by Silberschatz, Korth, Sudarshan, 2011 Chapter 1 from Database Management Systems, 3rd Ed. by Ramakrishnan, Gehrke, 2003

Introduction to Databases

28

__MACOSX/._1. Introduction to Databases.pdf

2. The relational model and relational algebra(2).pdf

CMSC 508 Database Theory

CMSC 508 Database Theory

The relational model and relational algebra

Dr. Alberto Cano Assistant Professor Department of Computer Science

Chapters 2,6 from Database System Concepts, 6th Ed. by Silberschatz, Korth, Sudarshan, 2011 Chapters 3,4 from Database Management Systems, 3rd Ed. by Ramakrishnan, Gehrke, 2003

Relational model

1

CMSC 508 Database Theory Relational model

 Relational model • Collection of tables to represent both data and the relationships • Tables have multiple columns (attributes) with unique names • Tables contain records (tuples) that correspond to the attributes • Tables are called relations • #Rows = cardinality, #Columns = arity • Order of tuples is irrelevant Columns

Rows

2

CMSC 508 Database Theory Relational model

 Attribute types • Domain: set of allowed values for each attribute

• Attribute values are (normally) required to be atomic (indivisible)

• The special value null is a member of every domain

• The null value causes complications in many operations

• The DDL specifies the attribute domain o Strings (VARCHAR, CHAR) o Numeric (NUMBER, INTEGER) o Date (DATE) o Objects (BFILE)

3

CMSC 508 Database Theory Relational model

 Integrity constraints • Conditions that must be true for any instance

• Specified when schema is defined

• Checked when relations are modified

• Domain integrity:

o Data type o Length or size o Is null value allowed o Is the value unique or not for an attribute

4

CMSC 508 Database Theory Relational model

 Keys • Set of attributes. Let K  R

• K is a superkey of R if values for K are sufficient to identify a unique tuple of each possible relation r(R)

Example: {ID} and {ID,name} are both superkeys of instructor

• Superkey K is a candidate key if K is minimal

Example: {ID} is a candidate key for Instructor

• One of the candidate keys is selected to be the primary key

• Any two individual tuples in the relation are prohibited from

having the same value on the key attributes at the same time

• Foreign key is a column(s) that references a column (most often

the primary key) of another table

• Use of foreign keys ensure referential integrity of the data 5

CMSC 508 Database Theory Relational model

 Schema Diagram for University Database

• Identification of entities

• Identification of relations

• Conceptual schema -> relational schema

Time to think a basic conceptual and relational schema

6

CMSC 508 Database Theory Relational model

 Schema Diagram for University Database

• Identification of entities o Student (name) o Instructor (name, salary) o Department (name, budget) o Course (title, credits)

• Identification of relationships o Students take courses o Instructors teach courses o Instructors are members of departments o Courses are organized by departments

7

CMSC 508 Database Theory Relational model

 Schema Diagram for University Database • Conceptual schema

• Primary keys? • Foreign keys? • Cardinalities? • Grades? • Translation into a relational schema • What if … ? 8

CMSC 508 Database Theory Relational model

 Schema Diagram for University Database

9

CMSC 508 Database Theory Relational model

 Schema Diagram for University Database

classroom(building, room number, capacity)

department(dept name, building, budget)

course(course id, title, dept name, credits)

instructor(ID , name, dept name, salary)

section(courseid, secid, semester, year, building, roomnumber, time slot id)

teaches(ID , course id, sec id, semester, year)

student(ID , name, dept name, tot cred)

takes(ID , course id, sec id, semester, year, grade)

advisor(sID , iID )

time slot(time slot id, day, start time, end time)

prereq(course id, prereq id)

10

CMSC 508 Database Theory Relational algebra

 Relational Query Languages • Procedural vs or declarative • Pure languages:

o Relational algebra o Tuple relational calculus o Domain relational calculus

• Relational operators o Set operators: union, intersection, difference, product o Selection o Projection o Join: natural join, θ-join, equijoin, outer joins, inner join o Division

11

CMSC 508 Database Theory Relational algebra

 Union () • Let r and s be relations on schemas R and S respectively • 𝑅 𝑆 = 𝑥 𝑥 ∈ 𝑅 𝑜𝑟 𝑥 ∈ 𝑆}

• union-compatible: relations must have the same set of attributes

12

r s

CMSC 508 Database Theory Relational algebra

 Intersection () • Let r and s be relations on schemas R and S respectively • 𝑅 𝑆 = 𝑥 𝑥 ∈ 𝑅 𝑎𝑛𝑑 𝑥 ∈ 𝑆}

• union-compatible: relations must have the same set of attributes

13

r  s

CMSC 508 Database Theory Relational algebra

 Difference(−) • Let r and s be relations on schemas R and S respectively • 𝑅 − 𝑆 = { 𝑥 ∈ 𝑅 | 𝑥 ∉ 𝑆}

• union-compatible: relations must have the same set of attributes

14

r −s

CMSC 508 Database Theory Relational model

 Cartesian product(×) • Output all pairs of rows from the two input relations • 𝑅 × 𝑆 = 𝑟, 𝑠 𝑟 ∈ 𝑅 𝑎𝑛𝑑 𝑠 ∈ 𝑆}

• Properties o NOT commutative 𝑅 × 𝑆 ≠ 𝑆 × 𝑅 o NOT associative 𝑅 × 𝑆 × 𝑊 ≠ 𝑅 × (𝑆 × 𝑊) o 𝑅  𝑆 × 𝑊  𝑍 = 𝑅 𝑊 × 𝑆  𝑍 o 𝑅 𝑆 × 𝑊 𝑍 ≠ 𝑅𝑊 × 𝑆 𝑍 o 𝑅 × 𝑆𝑊 = 𝑅 × 𝑆  𝑅 × 𝑆 o 𝑅 × 𝑆𝑊 = 𝑅 × 𝑆  𝑅 × 𝑆 15

r ×s

CMSC 508 Database Theory Relational algebra

 Cartesian product(×)

16

SQL example: select leader_name, title from LEADER, OCCUPATION

Be aware of the size of the product! Attributes are disjoint

CMSC 508 Database Theory Relational algebra

 Selection (σ) • Returns rows of the relation that satisfy the predicate • Let R (A,B,C,D)

 Properties • Idempotent

σA(R ) = σA(σA(R ))

• Commutative

σA(σB(R ) ) = σB(σA(R ))

• Breaking up

σA ∧ B(R) = σA(σB(R)) = σB(σA(R))

σA ∨ B(R) = σA(R)  σB(R)

17

σA=B ∧ D > 5(R )

CMSC 508 Database Theory Relational algebra

 Projection (∏) • Outputs specified attributes from all rows of the input relation,

and removes duplicate tuples from the output • Let R (A,B,C)

 Properties • Idempotent

∏A1,…,AN (∏B1,…,BM

(R )) = ∏A1,…,AN (R )

where {A1,…,AN} ⊆ {B1,…,BM}

• Distributive over set union

∏A1,…,AN (R  P ) = ∏A1,…,AN

(R ) ∏A1,…,AN (P )

Projection does NOT distribute over intersection and set difference 18

∏A,C(R )

CMSC 508 Database Theory Relational algebra

 Natural join (⨝) • Let r and s be relations on schemas R and S respectively • Natural join of relations R and S is a relation on schema R  S as:

o Consider each pair of tuples tr from r and ts from s. o If tr and ts have the same value on each of the attributes

in R  S, add a tuple t to the result, where t has the same value as tr on r t has the same value as ts on s

• Properties: associative and commutative

19r ⨝ s

CMSC 508 Database Theory Relational algebra

 Theta-join (θ) • R ⨝θ S = σθ (R×S) • Selection σ meeting condition θ after cross product

 Equijoin • θ is the equality operator

20

A B

3 2

2 5

C D

2 1

4 3

4 7

A B C D

3 2 4 7

2 5 4 3

2 5 4 7

R ⨝A<D SR S

CMSC 508 Database Theory Relational algebra

 Outer join • An extension of the join operation that avoids loss of information • Computes the join and then adds tuples form one relation that

does not match tuples in the other relation • Uses null values

o null signifies unknown or does not exist o All comparisons involving null are false by definition o The result of any arithmetic expression involving null is null

21

Comp. Sci. Finance Music

ID dept_name

10101 12121 15151

name

Srinivasan Wu Mozart

ID course_id

10101 12121 76766

CS-101 FIN-201 BIO-101

Instructor Teaches

CMSC 508 Database Theory Relational algebra

 Inner join • Instructor ⨝ teaches

 Left outer join • Instructor teaches

22

Comp. Sci. Finance

ID dept_name

10101 12121

name

Srinivasan Wu

course_id

CS-101 FIN-201

Comp. Sci. Finance Music

ID dept_name

10101 12121 15151

name

Srinivasan Wu Mozart

course_id

CS-101 FIN-201 null

CMSC 508 Database Theory Relational algebra

 Right outer join • Instructor teaches

 Full outer join • Instructor teaches

23

Comp. Sci. Finance null

ID dept_name

10101 12121 76766

name

Srinivasan Wu null

course_id

CS-101 FIN-201 BIO-101

Comp. Sci. Finance Music null

ID dept_name

10101 12121 15151 76766

name

Srinivasan Wu Mozart null

course_id

CS-101 FIN-201 null BIO-101

CMSC 508 Database Theory Relational algebra

 Division • Let r and s be relations on schemas R and S respectively and S ⊆ R • R / S is a relation on schema R – S • A tuple t is in R/S if satisfies both:

o t is in ∏R-S(r) o For every tuple ts in s, there is a tuple tr in r satisfying both:

• tr[S] = ts[S] • tr[R-S] = t

24

CMSC 508 Database Theory Relational algebra

 Division

25

A

B1

B2

B3

A/B1 A/B2 A/B3

CMSC 508 Database Theory Relational model

 Summary • The relational data model is based on a collection of tables

• The schema refers to the logical design, instance refers to its contents

• The schema includes attributes, types of the attributes and constraints

• A superkey is a set of attributes guaranteed to identify tuples uniquely

• A candidate key is a minimal superkey

• The primary key is one of the candidate keys

• A foreign key is a set of attributes in a referencing relation

• A schema diagram shows the relations, attributes, and keys

• The relational algebra provides a set of operations between relations

26

CMSC 508 Database Theory Relational model

 Review Terms

Table Relation Tuple Attribute Domain Atomic domain Null value Database schema Database instance Relation schema Relation instance Keys Superkey Candidate key

Primary key Foreign key Referential integrity constraint Schema diagram Query language Procedural language Nonprocedural language Operations on relations Selection Projection Joins Cartesian product Set operations Relational algebra

27

CMSC 508 Database Theory

CMSC 508 Database Theory

The relational model and relational algebra

Dr. Alberto Cano Assistant Professor Department of Computer Science

Chapters 2,6 from Database System Concepts, 6th Ed. by Silberschatz, Korth, Sudarshan, 2011 Chapters 3,4 from Database Management Systems, 3rd Ed. by Ramakrishnan, Gehrke, 2003

Relational model

28

__MACOSX/._2. The relational model and relational algebra(2).pdf

Homework1.pdf

CMSC 508 – Database Theory

Introduction to databases, relational model and relational algebra

1. List four significant differences between a file-processing system and a DBMS. Why would you

choose a database system instead of simply storing data in operating system files? When would

it make sense not to use a database system?

2. Explain the concept of physical data independence, and its importance in database systems.

3. Explain the difference between two-tier and three-tier architectures.

4. What is a transaction? Define ACID.

5. What is a foreign key constraint? Why are such constraints important? What is referential

integrity?

6. A university database contains information about professors (identified by social security

number, or SSN) and courses (identified by courseid). Professors can teach the same course in

several semesters, and only the most recent such offering needs to be recorded. Draw a diagram

that describes it (assuming no further constraints hold).

7. Consider the instance of the Students relation shown below.

1. Give an example of an attribute (or set of attributes) that you can deduce is not a

candidate key, based on this instance being legal.

2. Is there any example of an attribute (or set of attributes) that you can deduce is a

candidate key, based on this instance being legal?

3. Which of the candidate keys are valid for primary keys? Which one would you choose?

8. Given two relations R1 and R2, where R1 contains N1 tuples, R2 contains N2 tuples, and

N2 > N1 > 0, give the minimum and maximum possible sizes (in tuples) for the resulting relation

produced by each of the following relational algebra expressions. In each case, state any

assumptions about the schemas for R1 and R2 needed to make the expression meaningful:

(1) R1 ∪ R2, (2) R1 ∩ R2, (3) R1−R2, (4) R1 × R2, (5) σa=5(R1), (6) πa(R1), and (7) R1/R2

9. Consider the following bank database. What are the appropriate primary and foreign keys?

branch (branch name, branch city, assets)

customer (customer name, customer street, customer city)

loan (loan number, branch name, amount)

borrower (customer name, loan number)

account (account number, branch name, balance)

depositor (customer name, account number)

10. Consider the following relational database.

employee (person name, street, city)

works (person name, company name, salary)

company (company name, city)

manages (person name, manager name)

Give an expression in the relational algebra to express each of the following queries:

a. Find the names of all employees who work for “First Bank Corporation”.

b. Find the names and cities of residence of all employees who work for “First Bank Corporation”.

c. Find the names, street address, and cities of residence of all employees who work for “First

Bank Corporation” and earn more than $10,000.

d. Find the names of all employees in this database who live in the same city as the company for

which they work.

e. Assume the companies may be located in several cities. Find all companies located in every

city in which “Small Bank Corporation” is located.

11. Consider the following relations, calculate the operations:

a. Graduates U Managers

b. Graduates ∩ Managers

c. Graduates – Managers

12. Consider the following relations, calculate the operation:

13. Define and give an example of the division operator using two relations.

__MACOSX/._Homework1.pdf