Discussion Post + Quiz (SQL)
Data Modeling
Man is a knot, a web, a mesh into which
relationships are tied. Only those
relationships matter
Saint-Exupéry
Data modeling
A technique for modeling data
A graphical representation of a database
The goal is to identify the facts to be stored in a database
Data modeling is a partnership between the client and analyst
Modeling
Scope Model Technology
Motivation why Goals Business model canvas Groupware
People who Business units Organization chart Systems interface
Time when Key events PERT chart Scheduling
Data what Key entities Data model Relational database
Network where Locations Logistics network System architecture
Process how Key processes Process model Programming
5W + H model of journalism
The building blocks
Entity
Attribute
Relationship
Identifier
Data model quality
A well-formed data model
A high fidelity image
A well-formed data model
Construction rules obeyed
No ambiguity
All entities, attributes, relationships, and identifiers are defined
Names are meaningful to the client
A high fidelity image
Faithfully describes the world it is supposed to represent
Relationships are of the correct degree
Data model is complete, understandable, and accurate
The data model makes sense to the client
Quality improvement
Is the level of detail correct?
Are all exceptions handled?
Is the model accurate?
Pure geography
Can a nation have more than one capital? Can a city be the capital of more than one state?
Geography revised
Family matters - take 1
Can we generalize?
Family matters - take 2
Family matters - take 3
What about couples who are not officially married but have cohabited
for an extended period?
Family matters - take 4
Family matters - take 5
Bookish matters - take 1
Should copyno be an attribute of book?
Bookish matters - take 2
This model records only the current
borrower of a copy of a book
Identification shift
A UPC or EAN identifies a type of product
220g Vegemite
How would a data model for a retailer change if every product were uniquely identified, such as with a RFID tag?
History - take 1
Can an employee work in multiple
departments?
History - take 2
How do we keep track of an employee’s pay
checks?
History - take 3
How is an instance of PAYSLIPLINE
identified?
History - take 4
A ménage à trois for entities -
take 1
Where do we store information about
the lease?
A ménage à trois for entities -
take 2
Why is start date part of the composite
primary key?
Golf statistics - take 1
Golf statistics – take 2
Planning and doing - take 1
Planning and doing - take 2
Cardinality
Cardinality Modality Meaning
0,1 Optional There can be zero or one instances of
the entity relative to the other entity
0,n There can be zero or many instances
of the entity relative to the other
entity
1,1 Mandatory There is exactly one instance of the
entity relative to the other entity
1,n The entity must have at least one and
can have many instances relative to
the other entity
Minimalist approach
Focus has been on identifying the basic cardinality (1:m or m:m?)
Now add greater precision
There must be 1 instance
Learn the basics and then add more detail
Modality
Also known as optionality
Cardinality indicates the range of instances in a relationship
Modality defines the minimum number of instances
Cardinality and modality are linked
Modality and Cardinality
Optional entity
Cardinality is 0
O
Mandatory entity
Cardinality is 1
|
Cardinality
--------------
Modality
Mandatory
0 or n
of 1:m
Optional
1 of
1:m
Modality
Can a lineitem exist
without a sale?
Can an item exist
without a lineitem?
Modality
Does every department have
a boss?
Is every employee a
department boss?
Modality
Modality
Why is it optional for a monarch to have a
successor?
Modality
• Optional for a product to have components • Optional for a product to be a component • Every assembly must have products
Modality
Adds additional information to a data model
If a relationship is mandatory then add a constraint
Could be
• Referential integrity constraint
• Application logic
Entity types
Independent
Dependent
Associative
Aggregate
Subordinate
Independent
Often a starting point
Prominent in the client's mind
Often related to other independent entities
Dependent
Relies on another entity for its existence and identification
Can become independent if given an arbitrary identifier
Associative
A by-product of an m:m relationship
Typically between independent entities
Can store current or historical data
Can become independent if given an arbitrary identifier
Aggregate
Created from several different entities that have a common prefix or suffix
Commonly used with addresses or names
Subordinate
An entity with data that can vary among instances
Generalization
A relationship between a more general element and a more specific element
Generalization
Map with one table for each entity
For each of the subtype entities the primary key is that of the supertype entity
You must also make this column a foreign key so that a subtype cannot be inserted without the presence of the matching supertype
UML aggregation
Aggregation is a part-whole relationship between two entities
UML shared aggregation
One entity owns another entity, but other entities can own that entity as well
UML composite aggregation
One entity exclusively owns the other entity
Data model contraction
Hints on data modeling
The model will expand and contract
Invent identifiers where necessary
Identifiers should have only one purpose – identification
A data model does not imply ordering
Create an attribute if ordering of instances is required
An attribute’s meaning must be consistent
Names and addresses
The query test
If an attribute has parts, are any of the parts ever likely to appear in a query?
Have an understanding on representing names and addresses in a data model
Post code
A US zip code is CHAR(5) because leading zeroes are displayed
Boston MA 02201
Alternatively, use INTEGER
Format with LPAD(zipcode, 5,'0')
Full US zip is CHAR(10)
30602-6273
VARCHAR(20) probably covers all countries
Hints on data modeling
Single instance entities are OK
Select names carefully
Synonyms—different words have the same meaning
Get clients to settle on a common word or use views
Homonyms—same word has different meanings
Clarify to avoid confusion
Naming associative entities
Concatenate entity names if there is no obvious real world name
Hints on data modeling
Uncover all exceptions
Label relationships to avoid ambiguity
Keep the data model well-formed and accurate
Meaningful identifiers
An identifier is meaningful when some attributes of the entity can be inferred from the identifier’s value
Advantages Disadvantages
Recognizable and rememberable Identifier exhaustion
Administrative simplicity Reality changes
Loss of meaningfulness
Recommendation
Nothing, however, is lost and much is gained by using non-meaningful identifiers
Non-meaningful identifiers serve their sole purpose well
To uniquely identify an entity
Attributes are used to describe the characteristics of the entity
A clear distinction between the role of identifiers and attributes creates fewer data management problems
The seven habits of highly
effective data modelers
Immerse
Challenge
Generalize
Test
Limit
Integrate
Complete
Key points
A high-fidelity data model handles all exceptions
Identifiers need identify only an instance
Data modeling skills take time to develop