Intro to Algorithms(skip lists,hashing,graphs) assignment
Cs445 — Homework #3. Due Oct 17 2017 before class
Instructions.
1. Solution may not be submitted by students in pairs.
2. You may submit a pdf of the homework, either printed or hand-written and scanned, as long as it is easily readable.
3. If your solution is illegible not clearly written, it might not be graded.
4. Unless otherwise stated, you should prove the correctness of your answer. A correct answer without justification may be worth less.
5. If you have discussed any problems with other students, mention their names clearly on the homework. These discussions are not forbidden and are actually encouraged. However, you must write your whole solution yourself.
6. Unless otherwise specified, all questions have same weight.
7. You may refer to data structure or their properties studied in class without having to repeat details, and may reference theorems we have studied without proof. If your answer requires only modifications to one of the algorithms, it is enough to mention the required modifications, and the effect (if any) on the running time and on other operations that the algorithm performs.
8. In general, a complete solution should contain the following parts:
(a) A high level description of the data structures (if needed). E.g. We use a binary balanced search tree. Each node contains, a key and pointers to its children. We augment the tree so each node also contains a field...
(b) A clear description of the main ideas of the algorithm. You may include pseu- docode in your solution, but this may not be necessary if your description is clear.
(c) Proof of correctness (e.g. show that your algorithm always terminates with the desired output).
(d) A claim about the running time, and a proof showing this claim.
1
1. Let h be a hash function mapping U to indexes [0 . . .m − 1]. Assume that |U| ≥ m2. Could an adversary could pick a set K of m/2 keys from U which are all mapped into the same cell?
What is the value of ∑
x,y∈K Fh(x,y) in this case ? Fh(x,y) is the function defined in the slides on the section discussing Universal Families of Hash functions.
2. Assume m is a prime around 232. We compute a hash value of a very long key k using the dot product, by splitting k into L ‘characters’, as explained in the slide, and then compute
( L∑ i=1
kiai )
mod m.
However we compute it by first computing the value s = ( ∑L
i=1 kiai) and then compute s mod m.
(a) Assume s is stored as a variable of type long int represented by 32 bits. How large should L be so there is a danger of overflow with s ?
(b) Repeat the question, but now s is stored as a long long int (64 bits).
(c) It is reasonable to store a hash table that has m cells, for the value of m mentioned above, all in the main memory of your cellphone, or would you have to use paging ? Assume each cell in the table contains 2 long long ints, each requires 64 bits.
(d) Panopto recording of a lecture in this course, compressed, used about 4GB. Would computing h(file) on one of this files, is it possible that s would overflow ? If yes, how could you computer h(file) without risking overflow.
3. Let H = {h1 . . .hm} be the family of m hash functions where hi(k) = (k · i) mod m.
(a) Is H universal when U is the set of all integers and m is a prime ?
(b) Is H universal when U is the set of all integers in the range [m... 2m− 1] and m is a prime ?
(c) Is H universal when U is the set of all integers in the range [m... 2m−1] and m = 2` for an integer ` > 0.
(d) Is H universal when U = {0, 1}.
4. The question deals with a collection of lists L = {l1 . . . ln}. Each contains some number of integers integers, each between 1 and 106. Two lists are identical if they contain the same numbers, maybe not in the same order. For example, the lists l1 = {10, 30, 20} and l2 = {20, 10, 30} are identical. Let mi be the length of li (for i = 1 . . .n).
Suggest an algorithm, as efficient as possible, that receives collection of n lists, and de- termines whether the input contains at least two identical lists. Analyze the algorithm’s running time. (as a function of n and of mi). It is quite possible the that sum of number is one list is equal to the sum of keys in another list, despite not being identical lists.
More examples: l3 = {10, 30, 20} is not identical to l4 = {30, 10, 30, 20} l5 = {55, 30, 20, 30, 55} is identical to l5 = {55, 30, 55, 20, 30}
2
5. Design a perfect SkipList L for a set = {x0,x1 . . .xn−1} of n keys, if only 2 levels are allowed. What the the time for a query in the worst case.
Further explanation. Assume after the SkipList is constructed, and adversary will perform find(x), where the adversary picks what is x. You goal is to decide how to build the SL, so the time for this operation is as small as possible.
What would be the structure of the SL, if three levels are allowed.
Hint: Assume that there is some value ∆, such that the keys that participates in level 2 are x0,x∆,x2∆,x3∆ . . . . That is, the gaps between these keys is fixed. How would you pick ∆ to obtain optimal search time?
For example, if ∆ = 10, then the second level contains x0,x10,x20,x30 . . .
6. Let L be a perfect SL containing the n keys {x1 . . .xn} and let S = {y1 . . .yn} be sorted list of m keys. Suggest an algorithm, as efficient as you could for finding the keys of S that also appear in L
Assume for simplicity n = 2`, |S| = 2k (for integers `,k). Show that the running time of your algorithm is O(m log(n/m)).
7. Assume that you are sorting n keys in a hash table. You are using the dot product for computing the hash values, while, taking into account that functions are picked from a universal family of hash functions. However, the size m of the hash table is not a prime.
Will you still be able to retrieve your data ? That is, will the issue be only with efficiency, or could it be that some of operations might fail in any reasonable time period.
Suggest ‘reasonable’ critiria for choosing the coefficients ai in the dot product hash func- tion, so performences are good.
This is a bit an open ended question. There are no keys that always works perfectly. However, you could definitely see ‘asking for troubles’ even before the first data item showed up.
8. Let G(V,E) be an undirected graph. We say that vertices u and v are connected if there is a path connecting them, and all edges of this path are in E in G,
Let F = {C1 . . .Ck} be a collection of subset of vertices of V . We say that F partition V into Connected Components (abbreviated: CC ) iff
(a) No Ci is empty, and
(b) Each vertex v ∈ V appears in exactly one Ci. And (c) If a vertex u is also in Ci, then every vertex connected to u is also in Ci.
For example, if E is a long path, then G has a single connected component. If G(V,E) has no edges, then G has |V | connected components.
(a) Assume G(V,E) is connected, and has no parallel edges. What is the smallest and what is the largest values of m ? Express your answer as a function of n ?
(b) Repeat the previous question, but now assume that the graph has k Connected Com- ponents. So k > 1. What is the smallest and what is the largest values of m ? Express your answer as a function of k and n ?
3
(c) Assume the graph is given as an adjacency list. This in an array of n vertices, where all neighbors of vertex vj are contained in a doubly-connected link list.
Explain how to find, in time O(m + n), how many connected components G has. Explain exactly every operation performed on the data structure.
9. A path v1,v2 . . .vk is a cycle in G(V,E) if for every i, (vi,vi+1) is an edge of E, and v1 = vk. Assume that an undirected graph G(V,E) is connected but cycle free. How many edges (as a function of n) does it have ? Give upper and lower bounds.
Of course n is arbitrary large.
Prove your answer.
10. You are given a directed graph G(V,E) where every vertex vi ∈ V is associated with a weight wi > 0. The length of a path is the sum of weights of all vertices along this path.
Given s,t ∈ V , suggest an O((n + m) log n) time algorithm for finding the shortest path from s to t.
As usual, n = |V | and m = |E|.
11. Let G(V,E) be an undirected graph with positive weights on its edges. Assume that edges are given in an increasing order of weights. So E = {e1 . . .em} where
0 < w(e1) ≤ w(e2) < · · · < w(em).
Given a real value r, let Gr denote the graph obtained from G(V,E) be removing from E every edge whose weight is strictly larger than r. That is, Gr contains only the edges whose weight is r or smaller than r.
For example, Gw(e1)−0.000001 contains no edges, Gw(e5) contains 5 edges, and Gw(em) con- tains all the edges of E.
We say that β is the critical value of G(V,E) if Gβ is connected, by Gβ−0.00001 is not connected.
Suggest an O(m log m) time algorithm for finding β.
4