Search Results for “van emde boas tree”

Source: Van Emde Boas tree

A van Emde Boas tree (Dutch pronunciation: [vɑn ˈɛmdə ˈboːɑs]), also known as a vEB tree or van Emde Boas priority queue, is a tree data structure which implements an associative array with m-bit integer keys. It was invented by a team led by Dutch computer scientist Peter van Emde Boas in 1975. It performs all operations in O(log m) time (assuming that an

m

{\displaystyle m}

bit operation can be performed in constant time), or equivalently in

O
(
log
⁡
log
⁡
M
)

{\displaystyle O(\log \log M)}

time, where

M
=

2

m

{\displaystyle M=2^{m}}

is the largest element that can be stored in the tree. The parameter

M

{\displaystyle M}

is not to be confused with the actual number of elements stored in the tree, by which the performance of other tree data-structures is often measured.
The standard vEB tree has inadequate space efficiency. For example, for storing 32-bit integers (i.e., when

m
=
32

{\displaystyle m=32}

, it requires

M
=

2

32

{\displaystyle M=2^{32}}

bits of storage. However, similar data structures with equally good time efficiency and with space efficiency of

O
(
n
)

{\displaystyle O(n)}

exist, where

n

{\displaystyle n}

is the number of stored elements, and vEB trees can be modified to require only

O
(
n
log
⁡
M
)

{\displaystyle O(n\log M)}

space.

Supported operations

tree

Function

tree

= FindNext

tree

i
=
⌊
x

/

M

⌋

{\displaystyle i=\lfloor x/{\sqrt {M}}\rfloor }

. If x
function FindNext(T, x)
if x < T.min then
return T.min
if x ≥ T.max then // no next element
return M
i = floor(x/

M

{\displaystyle {\sqrt {M}}}

)
lo = x mod

M

{\displaystyle {\sqrt {M}}}

if lo < T.children[i].max then
return (

M

{\displaystyle {\sqrt {M}}}

i) + FindNext(T.children[i], lo)
j = FindNext(T.aux, i)
return (

M

{\displaystyle {\sqrt {M}}}

j) + T.children[j].min
end

Note that, in any case, the algorithm performs

O
(
1
)

{\displaystyle O(1)}

work and then possibly recurses on a subtree over a universe of size

M

1

/

2

{\displaystyle M^{1/2}}

(an

m

/

2

{\displaystyle m/2}

bit universe). This gives a recurrence for the running time of

T
(
m
)
=
T
(
m

/

2
)
+
O
(
1
)

{\displaystyle T(m)=T(m/2)+O(1)}

, which resolves to

O
(
log
⁡
m
)
=
O
(
log
⁡
log
⁡
M
)

{\displaystyle O(\log m)=O(\log \log M)}

.

= Insert

=
The call insert(T, x) that inserts a value x into a vEB tree T operates as follows:

If T is empty then we set T.min = T.max = x and we are done.
Otherwise, if x Otherwise, if x>T.max then we insert x into the subtree i responsible for x and then set T.max = x. If T.children[i] was previously empty, then we also insert i into T.aux
Otherwise, T.min< x < T.max so we insert x into the subtree i responsible for x. If T.children[i] was previously empty, then we also insert i into T.aux.
In code:

function Insert(T, x)
if T.min

x || T.max

x then // x is already inserted
return
if T.min > T.max then // T is empty
T.min = T.max = x;
return
if x < T.min then
swap(x, T.min)
if x > T.max then
T.max = x
i = floor(x /

M

{\displaystyle {\sqrt {M}}}

)
lo = x mod

M

{\displaystyle {\sqrt {M}}}

Insert(T.children[i], lo)
if T.children[i].min == T.children[i].max then
Insert(T.aux, i)
end

The key to the efficiency of this procedure is that inserting an element into an empty vEB tree takes O(1) time. So, even though the algorithm sometimes makes two recursive calls, this only occurs when the first recursive call was into an empty subtree. This gives the same running time recurrence of ⁠

T
(
m
)
=
T
(
m

/

2
)
+
O
(
1
)

{\displaystyle T(m)=T(m/2)+O(1)}

⁠ as before.

= Delete

=
Deletion from vEB trees is the trickiest of the operations. The call Delete(T, x) that deletes a value x from a vEB tree T operates as follows:

If T.min = T.max = x then x is the only element stored in the tree and we set T.min = M and T.max = −1 to indicate that the tree is empty.
Otherwise, if x == T.min then we need to find the second-smallest value y in the vEB tree, delete it from its current location, and set T.min=y. The second-smallest value y is T.children[T.aux.min].min, so it can be found in O(1) time. We delete y from the subtree that contains it.
If x≠T.min and x≠T.max then we delete x from the subtree T.children[i] that contains x.
If x == T.max then we will need to find the second-largest value y in the vEB tree and set T.max=y. We start by deleting x as in previous case. Then value y is either T.min or T.children[T.aux.max].max, so it can be found in O(1) time.
In any of the above cases, if we delete the last element x or y from any subtree T.children[i] then we also delete i from T.aux.
In code:

function Delete(T, x)
if T.min

T.max

x then
T.min = M
T.max = −1
return
if x == T.min then
hi = T.aux.min *

M

{\displaystyle {\sqrt {M}}}

j = T.aux.min
T.min = x = hi + T.children[j].min
i = floor(x /

M

{\displaystyle {\sqrt {M}}}

)
lo = x mod

M

{\displaystyle {\sqrt {M}}}

Delete(T.children[i], lo)
if T.children[i] is empty then
Delete(T.aux, i)
if x == T.max then
if T.aux is empty then
T.max = T.min
else
hi = T.aux.max *

M

{\displaystyle {\sqrt {M}}}

j = T.aux.max
T.max = hi + T.children[j].max
end

Again, the efficiency of this procedure hinges on the fact that deleting from a vEB tree that contains only one element takes only constant time. In particular, the second Delete call only executes if x was the only element in T.children[i] prior to the deletion.

= In practice

=
The assumption that log m is an integer is unnecessary. The operations

x

M

{\displaystyle x{\sqrt {M}}}

and

x

mod

M

{\displaystyle x{\bmod {\sqrt {M}}}}

can be replaced by taking only higher-order ⌈m/2⌉ and the lower-order ⌊m/2⌋ bits of x, respectively. On any existing machine, this is more efficient than division or remainder computations.
In practical implementations, especially on machines with shift-by-k and find first zero instructions, performance can further be improved by switching to a bit array once m equal to the word size (or a small multiple thereof) is reached. Since all operations on a single word are constant time, this does not affect the asymptotic performance, but it does avoid the majority of the pointer storage and several pointer dereferences, achieving a significant practical savings in time and space with this trick.
An optimization of vEB trees is to discard empty subtrees. This makes vEB trees quite compact when they contain many elements, because no subtrees are created until something needs to be added to them. Initially, each element added creates about log(m) new trees containing about m/2 pointers all together. As the tree grows, more and more subtrees are reused, especially the larger ones. In a full tree of M elements, only O(M) space is used. Moreover, unlike a binary search tree, most of this space is being used to store data: even for billions of elements, the pointers in a full vEB tree number in the thousands.
The implementation described above uses pointers and occupies a total space of O(M) = O(2m), proportional to the size of the key universe. This can be seen as follows. The recurrence is

S
(
M
)
=
O
(

M

)
+
(

M

+
1
)
⋅
S
(

M

)

{\displaystyle S(M)=O({\sqrt {M}})+({\sqrt {M}}+1)\cdot S({\sqrt {M}})}

.
Resolving that would lead to

S
(
M
)
∈
(
1
+

M

)

log
⁡
log
⁡
M

+
log
⁡
log
⁡
M
⋅
O
(

M

)

{\displaystyle S(M)\in (1+{\sqrt {M}})^{\log \log M}+\log \log M\cdot O({\sqrt {M}})}

.
One can, fortunately, also show that S(M) = M−2 by induction.

Similar structures

The O(M) space usage of vEB trees is an enormous overhead unless a large fraction of the universe of keys is being stored. This is one reason why vEB trees are not popular in practice. This limitation can be addressed by changing the array used to store children to another data structure. One possibility is to use only a fixed number of bits per level, which results in a trie. Alternatively, each array may be replaced by a hash table, reducing the space to O(n log log M) (where n is the number of elements stored in the data structure) at the expense of making the data structure randomized.
x-fast tries and the more complicated y-fast tries have comparable update and query times to vEB trees and use randomized hash tables to reduce the space used. x-fast tries use O(n log M) space while y-fast tries use O(n) space.
Fusion trees are another type of tree data structure that implements an associative array on w-bit integers on a finite universe. They use word-level parallelism and bit manipulation techniques to achieve O(logw n) time for predecessor/successor queries and updates, where w is the word size. Fusion trees use O(n) space and can be made dynamic with hashing or exponential trees.

Implementations

There is a verified implementation in Isabelle (proof assistant). Both functional correctness and time bounds are proved.
Efficient imperative Standard ML code can be generated.