Case. Amazon

Andrexa Peluso
caseamazonhttpsyoutu_befcojt2klc9k.zip

RULES OF THUMB.pdf

Relational Database Design: Rules of Thumb

Thomas H. Grayson 23 January 2002

Database Design Rules of Thumb

Keep data items atomic (e.g., first and last names are separate). Concatenating columns together later on-the-fly is generally easy, but separating them is not.

o What is an example of where parsing subfields from a column may go awry? o When might you want to include the combined fields in a column anyway?

Define the primary key first. Use a descriptive name (PARCEL_ID, not ID)

In fact, use descriptive names that give a new user a decent chance of guessing what they mean for all your columns! (E.g., use PARCEL_COUNT rather than PACT)

Use a single column for the primary key whenever possible; multi-column primary keys are appropriate for many-to-many relationships

Use lookup tables rather than storing long values

Use numeric keys whenever possible (What about ZIP codes?)

Avoid intelligent keys (exception: lookup tables)

Avoid using multiple columns to represent a one-to-many relationship (e.g., columns such as CHILD1, CHILD2 in a table called PARENT rather than putting the children in a separate table.

For readability, use the primary key name for foreign keys unless the same foreign key is used multiple times in the same table (e.g., state of work and state of residence for a person might both be foreign keys thatreference a table of states)

Do not include two columns whose values are linked together (e.g., county name and county ID) unless one of the columns is the primary key of the table

Avoid allowing NULL values in columns that have a discrete range of possible values (e.g., integers between 1 and 10, inclusive) (not applicable to DBF files, which do not support NULLs)

Avoid using multiple tables with similar structures that represent minor variants on the same entity (e.g., putting Boston parcels and Cambridge parcels in separate tables). Why is this rule often hard to practice with GIS?

Plan ahead for transferring data to a different database. For example, you may want to move data from Oracle to DBF, or Microsoft Access to Oracle.

o Avoid column names with characters with other than UPPER CASE letters (A-Z), digits (0-9), and the underscore (_). Other characters may not be accepted by a database. Some database systems may be case sensitive with regard to column names, while others are not.

o Keep your column names relatively short. Different databases support different numbers of characters in column names (e.g., 30 for Oracle, 64 for Microsoft Access, 10 for DBF). Try to make column names differ in the first few characters rather than at the end to avoid column name duplication if the names are truncated during the conversion process (e.g., use COL1 and COL2, not LONG_COLUMN_NAME_1 and LONG_COLUMN_NAME_2).

Note that keeping column names short may be at odds with keeping your column names meaningful for neophytes. Be aware that you are making a tradeoff!

Remember that these are rules of thumb, not absolute laws! Bend the rules if you must but have a justification for your decision. The limitations of a GIS software package often provide a good reason.

  • Relational Database Design: Rules of Thumb
    • Database Design Rules of Thumb

SQL.pdf

1

1.204 Lecture 3

SQL: Basics, Joins

SQL • Structured query language (SQL) used for

– Data d th

e b finition (DDL): tables and views (virtual tables). These

are the basi ti t t d t d l tic operations to convert a data model to a database

– Data manipulation (DML): user or program can INSERT, DELETE, UPDATE or retrieve (SELECT) data.

– Data integrity: referential integrity and transactions. Enforces keys.

– Access control: security – Data sharing: by concurrent users

• NotNot a completea complete language like Javalanguage like Java – SQL is sub-language of about 30 statements

• Nonprocedural language – No branching or iteration – Declare the desired result, and SQL provides it

2

SQL SELECT

• SELECT constructed of clauses to get columns and rows from one or more SELECT constructed of clauses to get

tables or views. Clauses must be in order: – SELECT columns/attributes – INTO new table – FROM table or view

WHERE– specific rowsWHERE specific rows oror aa join is createdjoin is created – GROUP BY grouping conditions (columns) – HAVING group-property (specific rows) – ORDER BY ordering criterion ASC | DESC

Orders OrderNbr Cust Prod Qty Amt Disc

1 211 Bulldozer 7 $31,000.00 0.2 2 522 Riveter 2 $4,000.00 0.3 3 522 Crane 1 $500,000.00 0.4

CustNbr Company CustRep CreditLimit 211 Connor Co 89 $50,000.00 522 AmaratungaEnterprise 89 $40,000.00 890 Feni Fabricators 53 $1,000,000.00

RepNbr Name RepOffice Quota Sales 53 Bill Smith 1 $100 000 00 $0 00

Customers

SalesReps 53 Bill Smith 1 $100,000.00 $0.00 89 Jen Jones 2 $50,000.00 $130,000.00

Offices

p

OfficeNbr City State Region Target Sales Phone 1 Denver CO West $3,000,000.00 $130,000.00 970.586.3341 2 New York NY East $200,000.00 $300,000.00 212.942.5574

57 Dallas TX West $0.00 $0.00 214.781.5342

Example tables

3

Example schema

+OrderID

Using SQL Server and Management Studio Express

• Your SQ when y

Q gL Server database engine should start by default our system starts

y

– Ask TA for help if needed at office hours • Start Management Studio Express (MSE) from Start-

>Programs->SQL Server 2008 • Open Lecture3CreateDB.sql with MSE in Windows Explorer

– Download the .sql file from Stellar and double-click on it • Select• Select ‘Execute from’ toolbarExecute from toolbar

– Database MIT1204 will be created and data inserted for examples during this class

• Open Lecture3Examples.sql for all SQL code in this lecture – Experiment with it

4

SQL queries: SELECT

• ClickClick ‘NewNew QueryQuery’ inin MSE;MSE; tytypepe thesethese statements:statements: • List the sales reps

– SELECT Name, Sales, Quota FROM SalesReps • Find the amount each rep is over or under quota

– SELECT Name, Sales, Quota, (Sales-Quota) FROM SalesReps

• Find the slackers – SELECT Name, Sales, Quota, (Sales-Quota) FROM

SalesReps WHERE Sales < Quota

RepNbr Name RepOffice Quota Sales 53 Bill Smith 1 $100,000.00 $0.00 89 Jen Jones 2 $50,000.00 $130,000.00

SQL queries: calculation, insert, delete, update

• Find the average sale – SELECTSELECT AAVVG(G(AAmt)mt) FFROMROM OOrders;rders;

• Find the average sale for a customer – SELECT AVG(Amt) FROM Orders WHERE Cust = 211;

• Add an office – INSERT INTO Offices (OfficeNbr, City, State, Region, Target, Sales,

Phone) VALUES (‘55’, ‘Dallas’,‘TX’,‘West’, 200000, 0, ‘214.333.2222’); • Delete a customer

– DELETE FROM Customers WHERE Company = ‘Connor Co’; – (Syntax is valid but command will fail due to referential integrity)

• Raise a credit limit – UPDATE Customers

SET CreditLimit = 75000 WHERE Company = ‘Amaratunga Enterprises’;

5

SELECT: * and duplicates

• Select all columns (fields)• Select all columns (fields) – SELECT * FROM Offices;

• Duplicate rows: query will get two instances of ‘West’ – SELECT Region FROM Offices;

• Eliminate duplicates:• Eliminate duplicates: – SELECT DISTINCT Region FROM Offices;

NULLs

• NULL values evaluate to NOT TRUE in all cases. – Insert ‘NewRep’ with NULL (blank or empty) Quota

• The following two queries will not give all sales reps: – SELECT Name FROM SalesReps WHERE Sales > Quota; – SELECT Name FROM SalesReps WHERE Sales <= Quota; – A new rep with a NULL quota will not appear in either list

• Check for NULLS byy: – SELECT Name FROM SalesReps WHERE Quota IS NULL;

6

SELECT Operators

• SELECT * FROM <table> – WHERE Disc*Amt > 50000; (Orders) – WHERE Quota BETWEEN 50000 AND 100000; (SalesReps)

• Range is inclusive (>=50000 and <=100000) – WHERE State IN (‘CO’, ‘UT’, ‘TX’); (Offices) – WHERE RepNbr IS NOT NULL; (SalesReps) – WHERE Phone NOT LIKE ‘21%’; (Offices)

• SQL standard only has 2 wildcards • % anyy g string of zero or more characters ((* in Access)) • _ any single character (? in Access)

• Most databases have additional/different wildcards. SQL Server has:

• [list] match any single character in list, e.g., [a-f] • [^list] match any single character not in list, e.g. [^h-m]

SELECT: COUNT, GROUP BY PartID Vendor

123 A 234 A 345 B 362 A

2345 C 3464 A 4533 C

Parts

• Number of parts from vendor A – SELECT COUNT(*) FROM Parts WHERE Vendor = ‘A’; – Result: 4

•• NumberNumber ofof partsparts ffromrom eacheach vendorvendor – SELECT Vendor, COUNT(*) AS PartsCount FROM Parts

GROUP BY Vendor; – Result: Vendor PartsCount

A 4 B 1 C 2

7

Examples

• What is the average credit limit of customers What whose

is credit limit the average

is less than $1,000,000? credit limit of customers

– SELECT AVG(CreditLimit) FROM Customers WHERE CreditLimit < 1000000;

• How many sales offices are in the West region? – SELECT Count(*) FROM Offices WHERE Region= 'West‘;

• Increase the price of bull orders

dozers by 30% in all orders – UPDATE Orders SET Amt= Amt*1.3 WHERE Prod=

'Bulldozer‘; • Delete any sales rep with a NULL quota

– DELETE FROM SalesReps WHERE Quota IS NULL;

Joins

• Relational model permits you to bring data from separate tables into relationships

new and unanticipated relationships.

• Relationships become explicit when data is manipulated: when you query the database, not when you create it. – This is critical; it allows extensibility in databases. – You can join on any columns in tables, as long as data

need types

to match and the operation

be keys though they usually makes sense. They

are don’t

need to be keys, though they usually are. • Good joins

– Join columns must have compatible data types – Join column is usually key column:

• Either primary key or foreign key – Nulls will never join

8

Joins • List all orders, showing order number and amount, and name and credit

limit of customer – Orders has order number and amount, but no customer names or credit

limits – Customers has customer names and credit limit, but no order info

• SELECTSELECT OOrderderNbr,rNbr, AmAmtt,, CompanyCompany,, CreditLimitCreditLimit FFROMROM CCuustomers,stomers, Orders WHERE Cust = CustNbr; (Implicit syntax)

• SELECT OrderNbr, Amt, Company, CreditLimit FROM Customers INNER JOIN Orders ON Customers.CustNbr = Orders.Cust; (SQL-92)

OrderNbr Cust Prod Qty Amt Disc 1 211 Bulldozer 7 $31,000.00 0.2 22 522522 Ri tRiveter 22 $4 000 00$4,000.00 0 30.3 3 522 Crane 1 $500,000.00 0.4

Join CustNbr Company CustRep CreditLimit 211 Connor Co 89 $50,000.00

522 Amaratunga Enterprises 89 $40,000.00

890 Feni Fabricators 53 $1,000,000.00

Join with 3 tables • List orders over $25,000, including the name of the salesperson who

took the order and the name of the customer who placed it. – SELECT OrderNbr, Amt, Company, Name FROM Orders, Customers,

SalesReps WHERE Cust = CustNbr AND CustRep = RepNbr AND Amt >= 25000; (Implicit syntax)

CustNbr Company CustRep CreditLimit 211 Connor Co 89 $50,000.00 522 Amaratunga Enterprises 89 $40,000.00

$

OrderNbr Cust Prod Qty Amt Disc 1 211 Bulldozer 7 $31,000.00 0.2 2 522 Riveter 2 $4,000.00 0.3 3 522 Crane 1 $500,000.00 0.4

Join

RepNbr Name RepOffice Quota Sales 53 Bill Smith 1 $100,000.00 $0.00 89 Jen Jones 2 $50,000.00 $130,000.00

890 Feni Fabricators 53 $1,000,000.00

OrderNbr Amt Company Name 1 $31,000.00 Connor Co Jen Jones 3 $500,000.00 AmaratungaEnterprise Jen Jones

Result:

Join

9

Join notes • SQL-92 syntax for previous example:

– SELECT OrderNbr, Amt, Company, Name FROM SalesReps INNER C t

JOIN Cu C t

sto R

mers ON SalesReps.RepNbr = Customers.CustRep INNER JOIN Orders ON Customers.CustNbr = Orders.Cust WHERE Amt >= 25000;

• Use * carefully in joins – It gives all columns from all tables being joined

• If a field has the same name in the tables being joined, qualify the field name: – Use table1.fieldname,, table2.fieldname – Customers.CustNbr, Orders.Amt, etc.

• You can join a table to itself (self-join). See text.

JOIN types

• INNER jj join: returns just rows with matching keys (join column values)

g y (j

• RIGHT join: returns all rows from right (second) table, whether they match a row in the first table or not

• LEFT join: returns all rows from left (first) table, whether they match a row in the second table or not

• OUTER join: Returns all rows from both tables, whether they match or not

10

Examples

• List customer names whose credit limit is g sales rep’s quota. Also list the credit limit and

greater than their quota.

– SELECT CreditLimit, Quota, Company FROM SalesReps INNER JOIN Customers ON SalesReps.RepNbr = Customers.CustRep WHERE CreditLimit>Quota;

• List each rep’s name and phone number – SELECT Name, Phone FROM Offices INNER JOIN SalesReps

ON Offices.OfficeNbr = SalesReps.RepOffice; • Display all customers with orders or credit limits > $50,000.

– SELECT DISTINCT CustNbr FROM Customers LEFT JOIN Orders ON CustNbr = Cust WHERE (CreditLimit > 50000 OR Amt > 50000)

MIT OpenCourseWare http://ocw.mit.edu

1.204 Computer Algorithms in Systems Engineering Spring 2010

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.