Initializing an Array in the Context of Studying Data Structures

Question 1

I am reading CLRS's Introduction to Algorithms and there is question 11.1 Exercise 4 in the book under the section Direct-Address Tables :

We wish to implement a dictionary by using direct addressing on a huge array. At
the start, the array entries may contain garbage, and **initializing** the entire array
is impractical because of its size. Describe a scheme for implementing a direct address
dictionary on a huge array. Each stored object should use O(1) space;
the operations SEARCH, INSERT, and DELETE should take O(1) time each; and
initializing the data structure should take O(1) time. (Hint: Use an additional array,
treated somewhat like a stack whose size is the number of keys actually stored in
the dictionary, to help determine whether a given entry in the huge array is valid or
not.)

I understand the solution is just to create another array, and have it store pointers to this array for elements that exist.

But I'm slightly confused as to the meaning of "initialize" in this context. If the array is not initialized, how can we even access the data (i.e. get the value at the i-th position with A[i])?

I'm also not sure why the question states this memory constraint. Suppose we could initialize the array, how would the answer change?

Question 2

The problem is that initializing an array of length N -- setting all the elements to a known value like NULL -- takes O(N) time.

If you have an array that is initialized to NULL, then implementing a direct access table is super easy -- A[i] == NULL means there is no value for i, and if there is a value for i, then it's stored in A[i].

The question is about how to avoid the O(N) initialization cost. If the array is not initialized, then the initial values for all A[i] could be anything at all... so how do you tell if it's a real value or just the initial garbage?

The solution is not just to create another array that stores pointers to the original -- you would have to initialize that other array and then you've wasted O(N) time again.

To avoid that cost altogether, you have to be more clever.

Make 3 arrays A, B, and C, and keep a count N of the total number of values in the dictionary.

Then, if the value for i is v:

A[i] = v;
0 <= B[i] < N; and
C[B[i]] = i

This way, the B and C arrays let you keep track of which indexes in A have been set to a real value, without initializing any of the arrays. When you add a new item, you check conditions (2) and (3) to see if the index valid, and if it isn't, then you do:

A[i] = NULL
B[i] = N
C[N++] = i

This marks index i as valid, and conditions (2) and (3) will then pass for all future checks.

Because of the amount of memory it takes, this technique isn't often used in practice, BUT it does mean that theoretically, you never have to count the cost of array initialization when calculating run time complexity.

Question 3

Thanks for this additional detail. My original post should have said "create another linked list" instead of "array". I should say to initialize an empty doubly linked list B, and that when we insert an item, we just insert at the end of this linked list, which takes O(1) time, and have it point to the correct element in A, and also have A point back to this node in B. Then to search the i-th element in A, we just check if the content in A[i] is a pointer to any element in B.

Question 4

"check if A[i] is a pointer to any element in B" is not O(1) time

Question 5

Why not? I do not need to traverse through the entire linked list B, since when I insert in A[k], I store a reference in A[k] to the newly created element in B, say B[j].

Question 6

How do you know that the value in A[k] is a pointer to an element in B, and not just random garbage?

Question 7

Any element in A that is not garbage has a pointer to an element in B, which should take me back to the same element in A. Otherwise, it would be garbage. Does this work?

Question 8

In that context, initializing means setting the values inside the array to NULL, 0 or the empty value for the stored type. The idea is that when allocating the memory for the array, the content of that allocated memory is random, so the array ends up containing random values. In this situation initializing the values means setting them to the "empty" value.

Matt Timmermans 61k3 gold badges58 silver badges107 bronze badges · Accepted Answer · 2020-08-29 16:33:36Z

The problem is that initializing an array of length N -- setting all the elements to a known value like NULL -- takes O(N) time.

If you have an array that is initialized to NULL, then implementing a direct access table is super easy -- A[i] == NULL means there is no value for i, and if there is a value for i, then it's stored in A[i].

The question is about how to avoid the O(N) initialization cost. If the array is not initialized, then the initial values for all A[i] could be anything at all... so how do you tell if it's a real value or just the initial garbage?

The solution is not just to create another array that stores pointers to the original -- you would have to initialize that other array and then you've wasted O(N) time again.

To avoid that cost altogether, you have to be more clever.

Make 3 arrays A, B, and C, and keep a count N of the total number of values in the dictionary.

Then, if the value for i is v:

A[i] = v;
0 <= B[i] < N; and
C[B[i]] = i

This way, the B and C arrays let you keep track of which indexes in A have been set to a real value, without initializing any of the arrays. When you add a new item, you check conditions (2) and (3) to see if the index valid, and if it isn't, then you do:

A[i] = NULL
B[i] = N
C[N++] = i

This marks index i as valid, and conditions (2) and (3) will then pass for all future checks.

Because of the amount of memory it takes, this technique isn't often used in practice, BUT it does mean that theoretically, you never have to count the cost of array initialization when calculating run time complexity.

Thanks for this additional detail. My original post should have said "create another linked list" instead of "array". I should say to initialize an empty doubly linked list B, and that when we insert an item, we just insert at the end of this linked list, which takes O(1) time, and have it point to the correct element in A, and also have A point back to this node in B. Then to search the i-th element in A, we just check if the content in A[i] is a pointer to any element in B.
"check if A[i] is a pointer to any element in B" is not O(1) time
Why not? I do not need to traverse through the entire linked list B, since when I insert in A[k], I store a reference in A[k] to the newly created element in B, say B[j].
How do you know that the value in A[k] is a pointer to an element in B, and not just random garbage?
Any element in A that is not garbage has a pointer to an element in B, which should take me back to the same element in A. Otherwise, it would be garbage. Does this work?

CollectivesTM on Stack Overflow

Initializing an Array in the Context of Studying Data Structures

2 Answers 2

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

2 Answers 2

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related