CSCI 6633 — Advanced Databases— 14 Winter HW 4, Due: 03/19/14

• This homework is due 03/19/14 and will not be accepted after 03/26/14

• Quiz 4 (open book, open notes) will be on 03/19/14 and will cover the following topics: Query Processing and Optimization Chapter 19.

• The student presentations will be on 03/12/14 and 03/19/14

• The final exam (open book, open notes) will be on 03/26/14 and will cover all the material covered in week 1 through week 12; material covered only in student presentations will not be on the final.

• The Extra Credit problems are due 03/19/14 and will not be accepted late.

1. (20 points)

Give brief and clear answers to the following:

(a) Briefly explain why the query optimizer works with relational algebra rather than directly with the SQL query.

(b) Briefly explain the difference between pipelining and materialization.

(c) A file of 8000 blocks is to be sorted with an available buffer space of 32 blocks. How many passes will be needed in the merge phase of the external sort-merge algorithm ?

2. (45 points)

(a) Suppose the relation R has an attribute A with the following 17 records:

19, 15, 22, 6, 27, 28, 26, 3, 31, 18, 19, 14, 7, 13, 5, 2, 21

Show how the sort-merge external sorting algorithm will sort these values, by showing

i. what the initial sorted runs will look like.

ii. what the runs will look like after each merge pass

iii. what the final sorted file will look like (split into blocks).

Please note that you do not have to show any further calculations, just show what the file blocks look like for each of the above.

You can assume that each block can store 2 records, and that the number of buffer blocks available is nB = 3

(b) Suppose the operation is EMPLOY EE ./SSN=ESSN WORKSON, and we are using J1, the nested-loop join approach. Further suppose that the buffer size nB = 6, the EMPLOY EE file has 10000 records in 1000 blocks, and the WORKSON file has 3000 records in 500 blocks. Calculate the number of disk accesses if

i. EMPLOY EE is the outer relation.

ii. WORKSON is the outer relation.

(c) There are two relations R(A,B) and S(B,C), and the operation is R ./B=B S, and we are using J2, the single-loop join approach. The R file has 2000 records in 200 blocks. The S file has 300 records in 10 blocks. Assume the relationship between R and S is 1-1 and partial-total i.e. for every R tuple, there is at most one S tuple, and for every S tuple, there is exactly one R tuple. For R, there is a secondary index on the field B. For S, there is a secondary index on the field B. The index levels are χR = 6, χS = 3. Calulate the number of disk accesses if

i. R is the outer relation and we use J2.

ii. S is the outer relation and we use J2.

3. (35 points) For each of these queries based on the Elmasri company database, do the following

i Show the initial query tree which is a direct translation of the given relational algebra ex- pression. Please note that your tree should be similar to the tree in Figure 19.4 (a), and not the canonical tree in Figure 19.4 (b).

ii Show the final, more efficient query tree you will get after you have applied the different transformations. Please note that you just have to show the final tree and not the inter- mediate steps i.e. you only have to show a final tree similar to Elmasri Figure 19.5 (e), and not the intermediate trees like 19.5 (c), (d).

You should try and get the most efficient tree you can.

(a) Get the names of the dependents of employees. whose salary is less than 50,000. The initial relational algebra expression is:

πDependentName(σSalary<50,000(σSSN=ESSN (EMPLOY EE X DEPENDENT))).

(b) Find the last names of employees whose salary is more than 50,000 and who work on a projects controlled by the Accounting department. The initial relational algebra expres- sion is:

πLName(σSalary>50,000(σDname=‘Accounting′ ( (EMPLOY EE ./SSN=ESSN WORKSON) ./Pno=Pnumber (PROJECT ./Dnum=Dnumber DEPARTMENT) ))).

Extra Credit 12: Do some research and find out and explain the difference between how Oracle and SQLServer store files.

Extra Credit 13: Do some research and find out and explain the difference between how Oracle and SQLServer do query optimization.

Extra Credit 14: In the language of your choice write a program to implement the Sort Merge External Sorting algorithm. What you need to turn in is a printout of the source code, and the printout of the sample run, which will use the same values as given in the required problem above. You also need to email the program to the TA.