Paper Writing

profileItachi_uchiha1
FusionTrees_JCSS1993.pdf

JOURNAL OF COMPUTER AND SYSTEM SCIENCES 41, 424436 (1993)

Surpassing the Information Theoretic Bound with Fusion Trees

MICHAEL L. FREDMAN*

Rutgers University, New Brunswick, New Jersey, Bellcore, and University of Califonia, San Diego, California

AND

DAN E. WILLARD?

State University of New York, Albany, New York

Received October 23, 1990; revised September 2, 1991

This paper introduces a new sublogarithmic data structure for searching, the fusion tree. These trees lead to improved worst-case algorithms for sorting and searching, surpassing the limitations of the information theoretic lower bound. G 1993 Academic PKSS, IIIC.

1. INTRODUCTION

This paper introduces a new sublogarithmic data structure for searching, the fusion tree. These trees lead to improved worst-case algorithms for sorting and searching, surpassing the limitations of the information theoretic lower bound. (We define search operations so that an unsuccessful search returns the predecessor of the search key.)

The information theoretic bound asserts that sorting N numbers requires N log N comparisons in the worst case. The degree of universality of this bound has not been fully resolved. Paul and Simon [4] have established that any unit cost random access machine algorithm whose operations include addition, subtraction, multiplication, and comparison with zero (but not division or bit-wise Boolean operations) requires SZ(Nlog N) time to sort N numbers in the worst case. On the other hand, with the inclusion of division and Boolean operations, they obtain a linear time algorithm. This linear time algorithm, however, suffers from an abuse of the unit cost assumption. In particular, the algorithm generates operands during the course of its computation whose individual lengths are N* times the length of the largest of the N numbers to be sorted.

* Research supported partially by NSF Grant CCR-8504245. + Research supported partially by NSF Grant CCR90-60509 and by Bellcore.

424 0022~0000/93 $5.00 Copyright 0 1993 by Academic Press, Inc. All rights of reproduction in any form reserved.

THE INFORMATION THEORETIC BARRIER 425

What seems to be needed is a computational model that avoids the potential abuses of the unit cost random access machine, but allows for unit cost operations among operands of “reasonable” size, i.e., operands of size commensurate with the sizes of the numbers to be sorted. Accordingly, we consider a reformulation of the sorting problem wherein our machine has a b-bit word size, and each of the N input numbers is assumed to be a non-negative integer less than 2’ (and hence each tits into one word). Moreover, it is desirable that a sorting algorithm use only O(N) words of memory.

We assume throughout this paper that the number of data items that are present never exceeds 2’, the universe size. (To cope with a larger number of items, we could proceed by storing bucket pointers in our data structure, with equal items placed in common buckets.)

Our “finite universe” reformulation of the sorting problem suggests the use of finite universe priority queues and similar types of data structures. For example, there is the priority queue of Van Emde Boas et al [5] which accommodates priority queue operations in time log log U per operation, where U = 2’ is the universe size. This structure requires U words of memory, however. An alternative is Willard’s Y-fast tries [6] in conjunction with the dynamic perfect hashing methods of Dietzfelbinger et al. [3], which brings the space down to linear in N, and accommodates priority queue operations in amortized randomized time log log U per operation. Besides the fact that either too much space of randomi- zation is required, these approaches improve upon conventional sorting only for sufticiently large values of N (i.e., when log N% log log U).

Another finite universe data structure for implementing priority queue operations is described in Ajtai, Fredman, and Komlos [ 11. This data structure is defined only within the cell probe model of computation and cannot be implemented with normal machine instructions. However, it is tantalizing in that it accommodates priority queue operations in constant time per operation for N sufficiently small compared with U. Ajtai et al. [l] state as an open problem whether some variant of their data structure can be implemented in a more realistic machine model. This paper contains a partial solution to this problem; one, however, which is good enough to be embedded in a new type of search tree data structure: the “fusion tree.”

Fusion trees accommodate dynamic search operations in amortized time O(log N/log log N) per operation on a random access machine with b-bit word size, whose instruction set includes addition, subtraction, multiplication, comparison, and bit-wise AND operations. With the use of fusion trees, we can construct a linear space O(N log N/log log N) worst case time-sorting algorithm. We emphasize that the implicit constant inside the O-notation is independent of b. Moreover, if we allow randomization and integer division, we show that dynamic searching and sorting can be accomplished in times, respectively, O(m) and O(Nm).

Our algorithms have theoretical interest only; the constant factors involved in the execution times preclude practicality.

426 FREDMAN AND WILLARD

2. NOTATION AND OTHER PRELIMINARIES

We let bin(a,, a2, . ..) denotes the sum 2”’ + 202 + . . . for non-negative ai. To refer to the bits of a binary number, we consecutively number its bit positions with position zero corresponding to its least significant bit. Thus, if a, < a2 < . . ., then the quantity bin(a,, a2, . ..) contains ones precisely in bit positions a,, a*, . . . .

We assume that the multipliction instruction produces a two-word product, one consisting of the most significant b bits, the other containing the least significant b bits. This fact enables us to effect a right shift; upon multiplying x by bin(b - Y), we obtain the result of right shifting x by r bits (in the significant portion of the product). This procedure depends, however, on having available the quantity bin(b - r), either as a program constant or as the result of a previous computation.

We occasionally require operands that slightly exceed the size of a single machine word. In such circumstances, we use double precision methods.

We observe that the bitwise Boolean operations, OR and XOR (exclusive OR), can be implemented on our machine by making use of the constant 2’- 1 = bin(O, 1, . . . . b - l), subtraction (so that we can complement a word), along with the AND operation.

Given a set of numbers S and a number x, we denote by rank,(x) the value I{t I tes, Kx}l.

3. OVERVIEW

The fusion tree can be roughly regarded as an implementation of a B-tree [2] such that (a) the approximate degree B of an internal node is an increasing, unbounded function of N, and (b) when performing a key search, the correct child of each node along the search path can be determined in constant time, independently of B. This suMices to achieve o(log N) search time. It will be the case, however, that updating an internal node (inserting or deleting a child, splitting the node, etc.) will require time B4. To keep the amortized cost of update operations under control, we modify our structure so that it can be viewed as a B-tree whose leaves are identified as the roots of weight balanced binary search trees of size approximately B4. For example, when the size of one of these binary search trees becomes too large, it is split into two trees with its associated root (B-tree leaf) likewise splitting. This reduces the amortized number of updates to the internal portion of the B-tree by a factor of B4, while increasing the search times only by an additive amount of log B. (We require that every key in the set represented by the Fusion tree reside in these binary trees; when a key gets deleted it is unnecessary to remove its presence from the B-tree nodes, where it serves only as a search discriminator.) Since we will be able to maintain log B = @(log log N) using routine methods, we achieve a search tree whose operations can be performed in O(log N/log log N) amortized time.

THE INFORMATION THEORETIC BARRIER 427

Our algorithms require a fixed number of constants whose values depend on the word size b. We assume that these constants are given as part of the algorithms, and we are not charged for their computations, which could depend on the word size, b. This introduces a small amount of non-uniformity (in terms of b) into our algorithms.

The core of our data structure lies in our ability to represent a B-tree node so that (a) we can determine the correct child of a node in constant time when conducting a search, and (b) we can update a node in time B4. This aspect of our data structure is motivated by an extends the work of Ajtai et al. [ 11. We turn our attention to these considerations.

Consider a B-tree node consisting of k keys (B/2 6 k < B) given by S= {U 1, .a*, uk}. In order to determine the correct child of the node when conducting a search for U, it suffices to compute rank,(u). This is to be accomplished in constant time. Central to the representation of our B-tree node is the notion of compressed key representation, which we proceed to describe.

4. COMPRESSED KEY REPRESENTATION

Given a set S= {ul, . . . . uk} consisting of b-bit numbers, we proceed to construct a set B(S) of distinguishing bit positions and a set K(S) = {ti,, . . . . S,} of r-bit numbers, where I = IB(S)j <k - 1.

Our definition of B(S) proceeds recursively. If ISJ = 1, then B(S) is the empty set. Otherwise, let p be the most significant bit position that distinguishes between two of the numbers in S. Let S, = { ol, . . . . u,} consist of those UPS in S for which ui has a zero in bit position p, and let S, = {wi, . . . . wh} consist of those uls for which ui has a one in bit position p. Note that g, h > 1 and g + h = k. Then B(S) = {p} u Wo) u B(S,).

Now define tii to be the result of deleting from ui all bits except for those occupying positions contained in B(S). More formally, let ci < c2 < .. . < c, denote the elements of B(S) in sorted order. Then the bit in position j- 1 of tii is the c,th bit of ui. We refer to iii as the compressed key representative of ui, and we define K(S) = {a,, . ..) a,}. Observe that this compression operation is order preserving among the elements of S.

Given an arbitrary b-bit number u not necessarily in S, we define a(S) likewise to be the result of extracting just those bits of u occupying positions in B(S). The following lemma presents an important property of the compressed key representation.

LEMMA 1. Let cl< . . . < c, be the elements of B(S) in sorted order and defme co= -1, c,+1 = b. Let tii be an arbitrary compressed key in K(S), and suppose that for an arbitrary b-bit number u # ui, the most significant bit position in which u(S) and t’ii differ is position m - 1. (In other words, the full keys u and ui agree in bit positions c, + 1, . . . . c,, but disagree in position c,. If u(S) = li,., then we define m = 0.) Assume that the most signtficant bit position p in which u and ui dzyfer satisfies

428 FREDMAN AND WILLARD

p > c,. Then rank,(u) is uniquely determined by the interval (c,~ , , cj) containing p, together with the relative order between u and ui.

ProoJ First, observe that the assumption p > c,, together with the fact that u and ui agree in bit positions, c, + 1, . . . . c, imply the p cannot be any of the ci)s. We proceed by induction on k = (S(. If k = 1, then the result is trivial. Now suppose p E (c,, c,, 1). By definition of c,, all elements in S agree in the bit positions more significant than position c,, and therefore, rank,(u) = 0 or k, depending on whether u < ui or u > ui. Now suppose that p < c,. Then u and ui agree down to bit position c,. Assume without loss of generality that u and ui have a zero in bit position c,. Then (using the notation preceding the statement of the lemma), U, ES, = fv 1, . . . . vg}, and rank,(u) <g since u < w,, . . . . w,, (as each of the wi)s have a one in bit position c,). It follows that rank,(u)= rank,,,(u), and we proceed (below) to reduce to the case S= So and apply the induction hypothesis. We observe that B(S,) c B(S), and therefore, the sorted sequence c; CC; < .. ., which comprises the elements of B(S,), is a subsequence of c1 < c2 < .. . . The most significant bit position ci in which u and ui differ (from among just the positions c; , c;, . ..) clearly satisfies cb Q c,. Therefore, p > ck. Applying the induction hypothesis for So and observing that the intervals (c,- 1, cj) refine the intervals (cl- 1, c,!), the lemma follows.

The above lemma enables us to reduce the computation of rank,(u) to computing rank,,,(li(S)) as we now sketch. By computing rank,&ti(S)), we determine the predecessor iij and the successor Cj + 1 of ti( S) in K(S) (one of the two quantities ij or fij+ 1 may not exist). Next, we compare u with ui and uj+ 1 ; we are done if uj ,< u Q uj+ 1. Otherwise choose tii (i=j or j-t 1) to be the value among tii and Cj+l having the longest prefix of significant bits in common with d(S). Since u<u, or u>u~+~ (by assumption), it must be the case that (using the notation of Lemma 1) p>cm, and therefore, Lemma 1 can be applied to determine the rank of u once we know the interval (cr, c,-+ 1) containing p.

A convenient alternative description for the set B(S) that will be used below is given as follows. Assume that the ui)s are sorted, so that u1 < u2 < . . . Let di be the most significant bit position in which ui and ui+ 1 differ. Then B(S) = Id,, 4, . ..> 4-J.

A difliculty with using compressed keys as well as with computing the transfor- mation that maps u into z?(S) (in constant time) is that there is no obvious way to relocate the appropriate extracted bits so that they form a consecutive block of bits. The desired properties of the representation, however, are preserved by any similar transformation that relocates the extracted bits into (possibly nonconsecutive) positions without changing their relative order. We assume that the unused positions are tilled with zeros. Because we ultimately need to concatenate the compressed representations of the keys in S into a single b-bit word, it is desirable that the extracted bits be relocated into a relatively narrow, if not consecutive, field of bits. This can be accomplished by an appropriate multiplication along with bitwise AND operations (to clear out the unwanted bits) as we proceed to describe.

THE INFORMATION THEORETIC BARRIER 429

Let c,< ... < c, be the bit positions (elements of B(S)) that need to be relocated, and let u be the quantity from which we wish to extract these bits. Let C= bin(c,, c2, . . . . c,). Assuming that the quantity C is available to us, we compute u = u AND C to zero out the irrelevant portion of u. Next, we multiply v by a suitable quantity M to relocate the appropriate bits of u into a narrow field, and once again use AND to zero out irrelevant bits. Our task is to prove the existence (and ultimately construct within our update algorithm) this quantity M. Writing M= bin(m,, m2, . . . . m,), if we ignore for the moment the interfering effects of cross terms and carries that occur with multiplication, we can regard the effect of multiplying u and M as the relocation of the bits in positions cl, c2, . . . . c, to the respective locations c1 + m,, . . . . c, + m,. Our choosing to ignore the other cross terms will be justified if we can also arrange for the Y* sums, ci + mj, 1~ i, j< r, to be distinct (which, in particular, implies that no carries can take place). Our goal, therefore, is to construct a set of integers, m,, . . . . m, such that (a) m, + cl < m2 + c2 < . . . < m, + c,, (b) the mi + cj are distinct, and (c), the sums in (a) fail into a “small” interval. Lemma 2 (below) provides the desired result. We further require that m, + c1 = b, so that our newly compressed key is right justified in the significant portion of the product. This requirement is easily satisfied by translating the rnls of Lemma 2 by a suitable constant. We redefine the notation l;(S) (and likewise the definition of the compressed keys ti,, . . . . tik), to reflect the computationally accessible variant offered by this approach, noting again that the desirable properties are preserved.

LEMMA 2. Given a sequence of Y integers, cl =C c1 < ... CC,, there exists a sequence of r integers m,, m2, . . . . m, such that

(a) m,+c,<m,+c,< ... <m,+c,

(b) the r2 sums ci + m,i, 1 < i, j 6 r are distinct,

(cl (m,+c,)-(ml+cl)<r4.

Proof: First, we prove that there exist integers m;, . . . . rn: such that rn,! + c, $ rn,! + c,(mod r3) whenever i # j. Assuming for the moment that such integers can be found, it follows that the r2 (non-reduced) sums ml + cj are all distinct, and moreover, by adding suitable multiples of r3 to the rn: to obtain the final values mi, we can further satisfy the requirement mi + ci < mi+ 1 + ci+, < mi + ci + r3, so that (a), (b), and (c) are satisfied.

To prove the existence of ml,, . . . . rn:, suppose that t < r and that we have succeeded in choosing rn; , . . . . rn: satisfying ml + cg $ rn; + c,(mod r3) when i # j. We show how to construct a suitable rn: + 1, Observing that rni + 1 + ci E rn: + cj implies m:+l= rn: + cj - ci, we can choose rni, 1 to be the least residue not represented among the fewer than r3 residues of the form rn; + cj- ci, 1 <s < t, 1~ i, j < r. In this way, we successively compute the quantities m;, . . . . mi.

430 FREDMAN AND WILLARD

5. FUSION

We proceed to describe the representation of a B-tree node. An example is provided at the conclusion of this section. Assume that the keys stored in the node are given by u1 < ... < uk, k < B. Let S= {u,, . . . . u,} and let B(S) = {c,, c2, . . . . c,}. Our B-tree node contains the quantity C= bin(c,, cZ, . ..) as well as the quantity M= bin(m, , mz, . ..) which p rovides the multiplier used in the computation of c(S). The final AND operation performed in the computation of ti(S) requires the quantity, D = bin(c, + m,, c2 + m2, . ..). which is likewise to be made available, Also present is an array containing the ~4;)s in sorted order and an array containing the 0,‘s in sorted order.

We will describe below a constant time method for computing rank,,,,(c(S)) and Rand 1, h m w ere m is an integer in [0, b]. We will also describe a constant time method for computing msb(u, v), which is defined to give the most significant bit position in which the two b-bit quantities u and v differ. For the present, we assume these capabilities.

As outlined in the previous section, after computing j= rank,(,,(li(S)), we compare u with uj and uj+ ,. Assuming that u does not fall in the interval [u,, ui+ r 1, we determine which of tii and ti, + r has the longest prefix of significant bits in common with ti(S) by comparing l;(S) XOR zIij with I;(S) XOR li,+r (the smaller value corresponds to the longer prefix). Assuming that zi,, h = j or j+ 1, has the longer prefix in common with G(S), we compute m = msb(u, uh) and then identify the interval (c,, c, + , ) containing m by computing i = rank,(,,(m). We finally obtain rank,(u) by executing a table look-up indexed on h, i, and whether (a) u < uh or (b) u > Us. The space requirement for our B-tree node is dominated by the size of this table, which is O(B2). Since the number of B-tree nodes is O(N/B4), the total amount of space required to represent the fusion tree is O(N).

The constant time computation of rank,,,,(ti(S)) proceeds as follows. Contained within our B-tree node is a quantity K, into which the compressed keys from K(S) are “fused” together-hence the name of our structure. More specifically, the bits of K, are partitioned into E= /-!I”~] equal size fields. (If b is not divisible by E, then the rightmost ELb/EJ bits are partitioned into equal size fields.) Each compressed key will occupy a separate field. As a consequence, the maximum allowable degree B of a node is constrained to satisfy B< E. Since, as indicated earlier, we can assume that N Q 2’, we have sufficiently many fields to maintain log B= Q(log log N), which achieves the stated speed-up over conventional sorting. Our limitation on the size of E comes from the fact that our compressed keys require up to k4 bits per key. The compressed keys are right justified in their respective fields, and the leading two bits in each such field are zeros. The fields which do not contain compressed keys are filled with all ones except for a leading zero. The structure of K, is shown below:

K,=Ol...l . . . 0 0 . . tik OO...zi-l ... u__

%. Field E Filed k Field k ~ 1 Field 1

THE INFORMATION THEORETIC BARRIER 431

Now by a executing a suitable multiplication and bitwise OR operation involving z?(S), we obtain the number Y shown below (the fields of Y correspond to those of KS):

Field E

..’ ~ lO.~~Oti(S). Field 1

Next, we subtract KS from Y and zero out all but the lead bits of the fields (performing a bitwise AND). The result contains a one only in those positions that are leading bits of fields for which K, contained a z& with ii < G(S). A suitable multiplication will now sum these bits, so that a portion of the resulting product will contain the desired rank, which is then extracted by an AND operation.

We use the same method to compute rank B(S,(m) in constant time; instead of using the quantity K, this computation utilizes the quantity Bs in which the elements of B(S) are fused together.

Our next task is to describe the constant time computation of msb(u, v), the most significant bit position in which the b-bit quantities u and u differ. For the additional purpose of performing update operations on B-tree nodes as described later, we wish to augment this computation so that as a by-product of computing m = msb(u, u), we also obtain the quantities, bin(m) and bin(b -m). Our computa- tion trivially reduces to determining the position of the leftmost one in u XOR u, which we can write as Llg(u XOR V) _I (lg denotes the binary logarithm). We will need the following lemma.

LEMMA 3. We say that a number x is d-sparse provided that the positions of all of its one bits belong to a set of the form, Y = (a + di ) 0 d i < d}, which consists of d consecutive terms of an arithmetic progression with common difference d. (Not all of these positions have to be occupied by ones, however.) If x is d-sparse, then there exist constants y, and y, such that for z = (ylx) AND y,, the ith bit of the significant part of z equals the bit in the position a + di of x, 0 < i < d. In other words, z is the result of “perfectly compressing” x.

Proof Let a,=a+di and let ti=b-a+(l-d)i, Odi<d. Then a,+ti=b+i and all sums ai + tj, 0 < i, j< d are distinct. We choose y, = bin( t 1, t,, . ..). and the lemma follows easily.

The computation for Llg x_l is divided into two phases. We assume that x # 0. Let s = r,/% + 11. Consider a partitioning of the b bits of our word x into right justified contiguous blocks of s bits. (The leftmost block has possibly fewer than s bits.) The number of blocks is given by rb/sl. The first phase of the computation determines the leftmost block in which x contains a one, extracts, and then right justifies this block. The second phase of the computation finds the leftmost one in this extracted block.

Define C, to have ones precisely in the leftmost bit position of each block: i.e., C1 = bin(s - 1, 2s - 1, . ..). Define C, to be the bit-wise complement of C,. We first

432 FREDMAN AND WILLARD

compute the function, lead(x), which enjoys the property that for each of its blocks, the leftmost bit within the block equals one if and only if the corresponding block in x contains a one. (All other bits in lead(x) equal zero.) lead(x) is given by (Ci - [(C, - x AND C,) AND C,]) OR (x AND C,). Because lead(x) is s-sparse, we can apply Lemma 3 to obtain (in constant time) a compression of the leftmost bits from each of the blocks of lead(x). Let compress(x) denote the result of this compression. Now let P = {bin(O), . . . . bin(rb/sl - 1)) denote the first [b/s] non- negative powers of two. We set b, = rank,(compress(x)). This rank computation proceeds in the same manner in which we computed rank,(,,, except that we use [b/s1 blocks each consisting of s bits in which the relevant quantities are right justified. In place of the quantity KS, a constant consisting of the powers of two fused together in right-to-left ascending order is used. (It is also necessary to use double precision arithmetic since the quantities involved slightly fill up more than single machine words.) The quantity b, identifies the block number (counting from the right) of the leftmost block of x containing a one.

In addition to obtaining the block number of the leftmost block containing a one, we can also obtain the quantity bin(r), where r is the position of the leftmost one of lead(x). Reviewing the computation for rank,(compress(x)), after performing the subtraction and zeroing of all but the lead bits from each of the blocks of s bits (the moment just before we multiply), we obtain a quantity which we denote as L. Among the lead bits of the blocks of L, the rightmost bl of them will be ones, and the others will be zeros. Because the ones of L appear periodically with period s, we can extract the leftmost such one (the other ones being replaced by zeros) by subtracting from L the result of right shifting L by s bits. The position of this extracted one, however, exactly coincides with the position of the leftmost one of lead(x) since the blocks of lead(x) and L have a common size of s bits. In a similar manner, we can also obtain bin(b - Y) by redoing our rank computation, but using instead the powers of two fused together in left-to-right ascending order. Now if bl = 1 then we proceed directly to the next phase (only the rightmost block contains a one). Otherwise, we multiply x by bin(b - r + s - 1) = bin(b - r). bin(s- 1) to extract (and right justify in the significant portion of the product) the leftmost block of x containing a one. This completes the description of the first phase of the computation for Llg x _I.

The second phase of the computation computes the position j of the leftmost one in our extracted block of x consisting of s bits. As before, we perform a rank computation of these s bits relative to the set consisting of the first s powers of two. To obtain bin(j), we proceed as in the first phase, and then employ Lemma 3 to perfectly compress the leading bits of the blocks. We obtain bin(s -j) by repeating this computation, but reversing the order in which the powers of two are concatenated. The results, b,, bin(v), bin(b-r), j, bin(j), and bin(s-j), of these two phases can now be combined to yield the result m = Llg x J of the msb(u, u) computation, augmented to include bin(m) and bin(b - m).

To complete our discussion of the fusion tree, we need to indicate how a B-tree node can be built in time B4. The various update operations on B-tree nodes can

THE INFORMATION THEORETIC BARRIER 433

be regarded as special instances of the node building operation. We assume that we are given the set of discriminator keys, S= (ul, u2, . . . . Q}. First, we sort the keys obtaining (say) u1 < . . . < uk. Next, we compute the set B(S) which is given by (msb(u,, ui+ 1) 1 1~ i < k}. The augmented computation of the ci = msb(ui, ui+ ,) also yields the bin(c,) values. These values are sorted to obtain distinct copies and then summed to obtain the constant C = bin(c,, . . . . c,). The proof of Lemma 2 provides an algorithm to compute the m, positions which define the multiplier M. However, we actually need the values bin(m,). Since the rnls of Lemma 2 satisfy b < ci + m,< b + r4, by starting with the bin(b - ci) values (provided by the augmented msb computations), we can obtain each bin(mi) value by performing O(lg Y) multiplications (in effect constructing addition chains). All of the required computations for the construction of a B-tree node, including the construction of the look-up table stored in such a node, involve straightforward algorithms that can be performed in time polynomial in k. (k4 time is easily seen to be sufficient.)

The inclusion of a fixed number of constants in our algorithms (whose values depend only on b) is necessitated by considerations typified by the following. The quantity K, (in which the compressed keys are fused together) contains b’j6 fields, all but the right-most k of which have the form (*) Oil... 1. Let w denote the constant obtained by concatenating together fields having the form (*), and let w’ = bin(f), where f is the common size of these fields. Using w and w’ we can compute K, in O(k) time. (This assumes we have already computed the com- pressed keys which make up K,.) Similarly, given I;(S), the constant time compu- tation of the quantity Y used in the rank,&ti(S)) computation utilizes certain constants.

EXAMPLE. The important aspects of our B-tree node representation are illustrated by the following detailed example. Our node contains the key set S = {x, y, z}. The values of X, y, and z are given below along with the values of the various constructs associated with the node as discussed above. Again, bit positions are numbered from right to left starting with zero:

X = 001001001110 y = 001001010110 z=001100000111.

The distinguishing bit positions are c1 = 4 and c2 = 8, so that

B(S) = (4, S}.

C = 000100010000, and we choose our multiplier to be A4 = 000100100000.

Observe that our multiplier M shifts the distinguishing bits into positions 12 and 13, or equivalently, the two rightmost positions in the significant portion of the product. Thus, to extract these bits, we use the mask

D = 000000000011.

434 FREDMAN AND WILLARD

Accordingly our compressed keys are given by (rightmost bits):

~-00, p=o1, i=lO,

and the concatenation K, of these compressed keys is given by

K,=OOlO 0001 0000. -I_- f .fi *

The constant B,, the concatenation of the distinguishing bit positions, is given by

B, = 001000 000100. -- (‘2 (‘1

The look-up table giving rank,(u) as a function of h, i, and whether (a) u < uh or (b) u > uh (as described in the third paragraph of Section 5) is

u. < llh U>U/I I1 i i

0 1 2 0 1 2

1 0 0 0 1 2 3 2 1 0 0 2 2 3 3 2 2 0 3 3 3

Now suppose we wish to compute rank,(u), where u=000100010111. Utilizing the constants C, M, and D, we obtain ti(S) = 11. Utilizing KS we determine that rank,(,,(ti(S)) = 3. Finding that u <z (z being the third ranking element of S) and seeing that S does not have an element of rank 4, we conclude h = 3. Next, computing m = msb( u, z), we find that m = 9. Computing i = rank,(,,(m), we obtain i = 2. Indexing into our table with the values h = 3, i = 2, and u <z (z = u,), we conclude that rank,(u) = 0.

6. EXTENSIONS

We remark first that by suitably relining our fusion tree structure, we obtain the worst case as opposed to amortized time bounds.

If we are willing to either (a) relax the linear space restriction or (b) preserve the linear space restriction but use randomization and integer division, then we can achieve m amortized time bounds for dynamic search operations. This is accomplished by using the data structure of van Emde Boas et al. [S] in the case of (a) (or, in the case of (b), using dynamic .Y-fast tries [3.6]) when N exceeds

2(lw bh36 (*I

THE INFORMATION THEORETIC BARRIER 435

For such N, these data structures perform search operations in time O(log log U) = O(log b) = O(m). For smaller N, we use the fusion tree, but it is modified to maintain B= 0(2-). With this choice for B, the amortized complexities of our operations are given by O(log B + log N/log B) = O(m). (We also need to check that the constraint B < E is satisfied when N does not exceed (*).) Thus, with the use of randomization and integer division, sorting can be accomplished in time O(N m) and linear space.

7. APPLICATIONS

Sorting and searching frequently occur as algorithmic components. Moreover, the commonly occurring priority queue data structure can be implemented as a special case of searching. The field of computational geometry offers some immediate applications of the fusion tree or simple modifications of the fusion tree. For instance, we have improved algorithms for planar convex hull construction, priority search trees and rectangle intersections, fractional cascading, and certain orthogonal range query problems.

Another application leads to an improved version of Willard’s Q-fast tries [7]. As described above, the Y-fast trie in conjunction with dynamic perfect hashing [3,6] provides a dynamic linear space data structure for searching with randomized amortized O(log log U) time operations. This data structure provides the best known performance for large subset sizes. The Q-fast trie, on the other hand, provides the best known performance among linear space deterministic data structures, again for large subset sizes. Balanced search trees constitute a component part of the Q-fast trie. We can improve the performance of Q-fast tries by using fusion trees to implement these search tree components and by changing the parameters of the trie so that its height is given by Jlog U/log log U, and its node degree is given by 2J’Og ““Og’Og u. The resulting data structure provides Jlog U/log log U operation times, improving upon the original Q-fast trie by a JlogTogU factor.

8. OPEN QUESTIONS

Two immediate open questions are: How fast can we sort? and How fast can we search? Some interesting variations of these questions are as follows. First, we ask if there exist possible exotic machine instructions that can lead to yet faster sorting and searching algorithms. Put another way, Is RISC Risky? We note that the instructions used in our algorithms belong to the class NC’. This can be considered a reasonable restriction to impose on possible exotic instructions, although the question remains interesting even without this restriction. Several people have pointed out to the authors that while comparison is an AC0 operation, our

436 FREDMAN AND WILLARD

algorithms use multiplication, which does not belong to AC’. Thus, it would be interesting to know if there exist fast sorting and searching algorithms that can be implemented with AC0 instructions. Put another way, Can fusion trees be implemented with brisk RISC?

9. CONCLUDING REMARK

The decision tree model of computation enjoys the properties of being reasonably general, tractable, and oftentimes quite challenging. The beguiling successes of this appealing model of computation have led to many claims that various Nlog N algorithms are optimal, conclusions that should not be taken too seriously.

REFERENCES

1. M. AJJTAI, M. FREDMAN, AND J. KOMLOS, Hash functions for priority queues, Inform. and Comput. 63 (1984), 217-225.

2. R. BAYER AND E. MCCREIGHT, Organization and maintenance of large ordered Indices, Acta Znform. 1 (1972), 173-189.

3. M. DIETZFELBINGER, A. KARLIN, K. MEHLHORN, F. MEYER AUF DER HEIDE, H. ROBERT, AND R. E. TARJAN, Dynamic perfect hashing: Upper and lower bounds, in “Proceedings, 29th IEEE Symposium on Foundations of Computer Science, 1988,” pp. 524-533.

4. W. PAUL AND J. SIMON, Decision trees and random access machines, in “Symposium uber Logik und Algolrithmik, Zurich 1980”; K. MEHLHORN, “Sorting and Searching,” pp. 85-97, Springer-Verlag, New York/Berlin, 1984.

5. P. VAN EMDE BOAS, R. KAAS, AND E. ZIJLSTRA, Design and implementation of an efficient priority queue, Math. Systems Theory 10 (1977), 99-127.

6. D. WILLARD, Log-logarithmic worst case range queries are possible in space O(N), Inform. Process. Lsett. (1983), 81-84.

7. D. WILLARD, New trie data structures which support very fast search operations, J. Comput. System Sci. 28 (1984), 379-394.