Research program for examining array list arithmetic contractions

Question 1

Introduction

Suppose we have a dynamic table \$T\$. Let \$\vert T \vert\$ denote the number of elements currently stored in \$T\,ドル and \$\vert \vert T \vert \vert\$ denote the capacity of \$T\$. Also, suppose we are given two parameters for \$T\$: \$m \in \mathbb{N}\$ is the initial (minimum) capacity; \$T\$ can't be contracted below \$m\$. Also, we are given a \$d \in \mathbb{N}\$: when \$T\$ ends up full (\$\vert T \vert = \vert \vert T \vert \vert\$), the table is expanded by \$d\$ elements. This is expansion, yet we want to investigate contraction that happens when there emerges \$d\$ free element spots in the tail of the \$T\$ as the list is popped from the back.

Here, my claim is that the work done for removing all \$n\$ elements runs in the following number of steps $$ W(n, m, d) = n + \sum_{i = 0}^{\big\lceil \frac{n - m}{d} \big\rceil - 1} (di + m) = \begin{cases} W(n - 1, m, d) + n & \text{if } (n - 1 - m) \mod d = 0, \\ & \\ W(n - 1, m, d) + 1 & \text{otherwise.} \end{cases} $$

Code

Array.java:

public final class Array {
 private final int m;
 private final int d;
 private int capacity;
 private int size = 0;
 private int expansions = 0;
 
 public Array(int m, int d) {
 this.m = m;
 this.d = d;
 this.capacity = m;
 }
 
 public Array(Array array) {
 this.m = array.m;
 this.d = array.d;
 this.capacity = array.capacity;
 this.expansions = array.expansions;
 this.size = array.size;
 }
 
 public int getM() {
 return m;
 }
 
 public int getD() {
 return d;
 }
 
 public int add() {
 if (isFull()) {
 capacity += d;
 size++;
 return m + 1 + d * (expansions++);
 }
 
 size++;
 return 1;
 }
 
 public int add(int k) {
 int work = 0;
 
 for (int i = 0; i < k; i++) {
 work += add();
 }
 
 return work;
 }
 
 public int remove() {
 if (size > m && shouldContract()) {
 capacity -= d;
 size--;
 return m + 1 + (--expansions) * d;
 } else {
 size--;
 return 1;
 }
 }
 
 public int remove(int k) {
 int count = 0;
 
 for (int i = 0, end = Math.min(size, k); i < end; i++) {
 count += remove();
 }
 
 return count;
 }
 
 public int size() {
 return size;
 }
 
 public int getCapacity() {
 return capacity;
 }
 
 public boolean canContract(int n) {
 return (n - m - 1) % d == 0;
 }
 
 boolean canContract() {
 return Array.this.canContract(size);
 }
 
 public int discharge() {
 return remove(size);
 }
 
 public int getWork(int n) {
 return n + getSum(n);
 }
 
 public int getWorkRecursively(int n) {
 if (n == 0) {
 return 0;
 }
 
 int x = n - 1 - m;
 
 return (x % d == 0 ? n : 1) + getWorkRecursively(n - 1);
 }
 
 @Override
 public String toString() {
 StringBuilder sb = new StringBuilder();
 
 for (int i = 0; i < size; i++) {
 if (i > 0 && (i - m) % d == 0) {
 sb.append("|");
 }
 
 sb.append("X");
 }
 
 for (int i = size; i < getCurrentCapacity(); i++) {
 sb.append(".");
 }
 
 return sb.toString();
 }
 
 private int getSum(int n) {
 int sum = 0;
 int endIndex = (int)(Math.ceil((double)(n - m) /
 (double)(d)
 )
 ) - 1;
 
 int startIndex = 0;
 
 for (int i = startIndex; i <= endIndex; i++) {
 sum += d * i + m;
 }
 
 return sum;
 }
 
 private int getCurrentCapacity() {
 return m + d * expansions;
 }
 
 private boolean shouldContract() {
 return getCurrentCapacity() - size == d - 1;
 }
 
 private boolean isFull() {
 return size == capacity;
 }
}

ArrayContraction.java:

import java.util.Scanner;
public class ArrayContraction {
 public static void main(String[] args) {
 Scanner scanner = new Scanner(System.in);
 Array array = new Array(3, 3);
 
 while (true) {
 try {
 System.out.print(">>> ");
 String line = scanner.nextLine().trim();
 String[] tokens = line.split("\\s+");
 if (tokens[0].equals("new")) {
 int m = Integer.parseInt(tokens[1]);
 int d = Integer.parseInt(tokens[2]);
 array = new Array(m, d);
 System.out.println(array);
 } else if (tokens[0].equals("print")) {
 System.out.println(array);
 } else if (tokens[0].equals("add")) {
 int k = tokens.length > 1 ? Integer.parseInt(tokens[1]) : 1;
 int work = array.add(k);
 System.out.printf("Work: %d.\n", work);
 System.out.println(array);
 } else if (tokens[0].equals("cond")) {
 int n = tokens.length > 1 ? 
 Integer.parseInt(tokens[1]) :
 array.size();
 
 System.out.println(array.canContract(n));
 } else if (tokens[0].equals("remove")) {
 
 int k = tokens.length > 1 ? Integer.parseInt(tokens[1]) : 1;
 int work = array.remove(k);
 
 System.out.printf("Work: %d.\n", work);
 System.out.println(array);
 } else if (tokens[0].equals("alln")) {
 
 int workClosedForm = array.getWork(array.size());
 int workRecursive = array.getWorkRecursively(array.size());
 
 Array arr = new Array(array);
 int workDischarge = arr.discharge();
 System.out.printf(
 "Closed form work: " + 
 "%d, recursive: %d, discharged: %d.\n",
 workClosedForm,
 workRecursive, 
 workDischarge);
 } else if (tokens[0].equals("work")) {
 int n = Integer.parseInt(tokens[1]);
 System.out.println(array.getWork(n));
 } else if (tokens[0].equals("rwork")) { 
 int n = Integer.parseInt(tokens[1]);
 System.out.println(array.getWorkRecursively(n));
 } else if (tokens[0].equals("discharge")) {
 Array other = new Array(array);
 System.out.println(other.discharge());
 } else if (tokens[0].equals("exit") 
 || tokens[0].equals("quit")) {
 System.out.println("Bye!");
 System.exit(0);
 } else if (tokens[0].equals("help")) {
 System.out.println(
 """
 HELP MESSAGE:
 add - Add an element.
 add k - Add k elements.
 alln - Print the total works.
 cond - Delegates to 'cond size'.
 cond n - Print condition for n.
 discharge - Discharge the entire array.
 exit - Halt this program.
 new m d - Create new array.
 print - Print the array.
 quit - Halt this program.
 remove - Remove an element.
 remove k - Remove k elements.
 rwork n - Get work recursively for n parameter.
 work n - Get work for n parameter.
 help - Print this help message.
 
 """);
 }
 } catch (Exception ex) {
 System.out.printf("Error: %s.\n", ex.getMessage());
 }
 }
 }
}

ArrayTest.java:

import org.junit.Test;
import static org.junit.Assert.*;
public class ArrayTest {
 
 private Array array;
 
 @Test
 public void testAdd_0args() {
 array = new Array(2, 3);
 
 assertEquals(0, array.size());
 assertEquals(2, array.getCapacity());
 
 assertEquals(1, array.add());
 
 assertEquals(1, array.size());
 assertEquals(2, array.getCapacity());
 
 assertEquals(1, array.add());
 
 assertEquals(2, array.size());
 assertEquals(2, array.getCapacity());
 
 assertEquals(3, array.add());
 
 assertEquals(3, array.size());
 assertEquals(5, array.getCapacity());
 }
 @Test
 public void testAdd_int() {
 array = new Array(2, 3);
 assertEquals(2, array.add(2));
 assertEquals(2, array.size());
 assertEquals(4, array.add(2));
 assertEquals(4, array.size());
 assertEquals(5, array.getCapacity());
 }
 
 @Test
 public void testRemove_0args() {
 array = new Array(2, 3);
 array.add(8);
 
 assertFalse(array.canContract());
 assertEquals(1, array.remove());
 assertEquals(7, array.size());
 
 assertFalse(array.canContract());
 assertEquals(1, array.remove());
 assertEquals(6, array.size());
 
 assertTrue(array.canContract());
 assertEquals(6, array.remove());
 assertEquals(5, array.size());
 
 assertFalse(array.canContract());
 assertEquals(1, array.remove());
 assertEquals(4, array.size());
 
 assertFalse(array.canContract());
 assertEquals(1, array.remove());
 assertEquals(3, array.size());
 
 assertTrue(array.canContract());
 assertEquals(3, array.remove());
 assertEquals(2, array.size());
 
 assertFalse(array.canContract());
 assertEquals(1, array.remove());
 assertEquals(1, array.size());
 
 assertFalse(array.canContract());
 assertEquals(1, array.remove());
 assertEquals(0, array.size());
 }
 @Test
 public void testRemove_int() {
 array = new Array(2, 3);
 
 array.add(8);
 // Here, array = xx|xxx|xxx
 
 assertEquals(1 + 1 + 6, array.remove(3));
 assertEquals(1 + 1 + 3, array.remove(3));
 assertEquals(1 + 1, array.remove(2));
 }
 @Test
 public void testDischargeGetWork() {
 array = new Array(2, 3);
 array.add(100);
 
 for (int i = 0; i < array.size(); i++) {
 Array aux = new Array(array.getM(), 
 array.getD());
 
 aux.add(i);
 
 assertEquals(array.getWork(i), aux.discharge());
 assertEquals(array.getWork(i), array.getWorkRecursively(i));
 }
 }
}

Demo REPL session

>>> new 2 3
..
>>> add 8
Work: 15.
XX|XXX|XXX
>>> alln
Closed form work: 15, recursive: 15, discharged: 15.
>>> work 7
14
>>> rwork 7
14
>>> remove 4
Work: 9.
XX|XX.
>>> alln
Closed form work: 6, recursive: 6, discharged: 6.
>>>

Critique request

As always, tell me whatever comes to mind.

Question 2

The junit ArrayTest is very nice, thank you for it.

I didn't see much motivation for insisting on positive \$m\$, and would be happy to take it as always being zero. Or, almost the same thing, take it to always equal \$d\$, meaning the initial block is never freed.

problem statement

Before even reading the code, I got hung up on the Problem Statement.

\$T\$ is variously referred to as a "table", a "list", and in the code as class Array, but it seems the envisioned access pattern is that \$T\$ is a "stack" on which we will {push, pop} elements. I will assume that callers specify an index \$i\$ to read elements from \$T\$, that due to good locality of reference most requests specify an index near the end, and such references shall be fast.

The OP proposed datastructure for addressing these needs seems to be a contiguous native java array. This is terrific for \$O(1)\$ constant time on each reference, but it suggests we may be doing a lot of memcpy when growing it. If this was a C program (no GC) then we'd be interested in how the application interacts with a slab or buddy allocator. In a JVM context, we really want to know about the particular GC scheme we plan to run under. Assume that a memory hungry web cache thread, running in the background, immediately devours all available memory the moment we shrink \$T\$, gobbling it up with a weak ref to some cached content. In other words, no cheating; when we free, the RAM actually is consumed by another thread.

chunks

Under these assumptions, I see little motivation for one giant contiguous region of memory. It seems more appropriate to allocate \$d\$-sized chunks. A reference \$i\$ has to be mapped to a chunk, and then we find the offset within the chunk. Fortunately that first part can usefully be cached, since we assume the caller exhibits good locality of reference, typically making accesses within the last chunk.

One implementation would be to use an OrderedMap to go from "initial index" to the corresponding chunk. Typical cost is \$O(1)\$ assuming cache hit, and \$O(\log \vert T \vert)\$ for a cache miss. This is not unlike hoping for a TLB hit when running on modern CPUs, something that almost always works out to be a good bet.

amortized cost

Let us assume there's a requirement to use just a single giant chunk of memory. Ok, fine. Then surely we wouldn't want a fixed increment \$d\$, right? I mean, we probably want to be able to talk about time complexity of client code that uses \$T\$, with some relatively attractive cost like \$O(\vert T \vert \log \vert T \vert)\$. That is, we'd like to avoid \$O(\vert T \vert^2)\$ quadratic cost.

Typical approach is to use a multiplicative factor. So we might double the allocation upon expansion, or grow it by \30ドル\%\$, something like that. We still do painfully large memcpy's, but fewer of them, so amortized cost of allocating each entry is just \$O(\log \vert T \vert)\$.

Question 3

We can do better. Suppose \$\alpha_T = \vert T \vert / \vert \vert T \vert \vert\$ is the load factor of \$T\$. Now, let us choose \0ドル < q_t < q_c = q_t \gamma < 1.\$ (\$\gamma > 1\$). The interpretation is that when \$\alpha_T\$ drops below \$q_t\,ドル we contract the \$T\$ to capacity of \$q_c\vert\vert T \vert \vert\$. I have a proof in my master thesis (not yet published) that under that arrangements the Pop-Back operation runs, in fact, in amortized \$\Theta(1)\$.

Question 4

For example, in CS, often, we set \$\gamma = 2\,ドル so that \$q_t = 1 / 4, q_c = 1 / 2\$ or \$q_t = 1 / 3, q_c = 2 / 3\$.

Question 5

Sure, that's cool, I'm with you, \$\alpha_T\$ is a useful measure. My difficulty is that it will asymptotically tend toward unity when we're using fixed \$d\$ over the lifetime of a growing stack, which sounds like more memcpy() work than desired. (It's a sawtooth, the flip side of \$\frac{d}{\vert \vert T \vert \vert}\,ドル and the item of interest is what that ratio looks like immediately after each new allocation of \$d\$ entries.) Shrinking in constant time I'm willing to believe, since no motion is required. It's the cost of realloc()'s as we're growing which worries me.

J_H J_H 41.3k3 gold badges38 silver badges157 bronze badges · Accepted Answer · 2024-04-28 17:08:49Z

The junit ArrayTest is very nice, thank you for it.

I didn't see much motivation for insisting on positive \$m\$, and would be happy to take it as always being zero. Or, almost the same thing, take it to always equal \$d\$, meaning the initial block is never freed.

problem statement

Before even reading the code, I got hung up on the Problem Statement.

\$T\$ is variously referred to as a "table", a "list", and in the code as class Array, but it seems the envisioned access pattern is that \$T\$ is a "stack" on which we will {push, pop} elements. I will assume that callers specify an index \$i\$ to read elements from \$T\$, that due to good locality of reference most requests specify an index near the end, and such references shall be fast.

The OP proposed datastructure for addressing these needs seems to be a contiguous native java array. This is terrific for \$O(1)\$ constant time on each reference, but it suggests we may be doing a lot of memcpy when growing it. If this was a C program (no GC) then we'd be interested in how the application interacts with a slab or buddy allocator. In a JVM context, we really want to know about the particular GC scheme we plan to run under. Assume that a memory hungry web cache thread, running in the background, immediately devours all available memory the moment we shrink \$T\$, gobbling it up with a weak ref to some cached content. In other words, no cheating; when we free, the RAM actually is consumed by another thread.

chunks

Under these assumptions, I see little motivation for one giant contiguous region of memory. It seems more appropriate to allocate \$d\$-sized chunks. A reference \$i\$ has to be mapped to a chunk, and then we find the offset within the chunk. Fortunately that first part can usefully be cached, since we assume the caller exhibits good locality of reference, typically making accesses within the last chunk.

One implementation would be to use an OrderedMap to go from "initial index" to the corresponding chunk. Typical cost is \$O(1)\$ assuming cache hit, and \$O(\log \vert T \vert)\$ for a cache miss. This is not unlike hoping for a TLB hit when running on modern CPUs, something that almost always works out to be a good bet.

amortized cost

Let us assume there's a requirement to use just a single giant chunk of memory. Ok, fine. Then surely we wouldn't want a fixed increment \$d\$, right? I mean, we probably want to be able to talk about time complexity of client code that uses \$T\$, with some relatively attractive cost like \$O(\vert T \vert \log \vert T \vert)\$. That is, we'd like to avoid \$O(\vert T \vert^2)\$ quadratic cost.

Typical approach is to use a multiplicative factor. So we might double the allocation upon expansion, or grow it by \30ドル\%\$, something like that. We still do painfully large memcpy's, but fewer of them, so amortized cost of allocating each entry is just \$O(\log \vert T \vert)\$.

We can do better. Suppose \$\alpha_T = \vert T \vert / \vert \vert T \vert \vert\$ is the load factor of \$T\$. Now, let us choose \0ドル < q_t < q_c = q_t \gamma < 1.\$ (\$\gamma > 1\$). The interpretation is that when \$\alpha_T\$ drops below \$q_t\,ドル we contract the \$T\$ to capacity of \$q_c\vert\vert T \vert \vert\$. I have a proof in my master thesis (not yet published) that under that arrangements the Pop-Back operation runs, in fact, in amortized \$\Theta(1)\$.
For example, in CS, often, we set \$\gamma = 2\,ドル so that \$q_t = 1 / 4, q_c = 1 / 2\$ or \$q_t = 1 / 3, q_c = 2 / 3\$.
Sure, that's cool, I'm with you, \$\alpha_T\$ is a useful measure. My difficulty is that it will asymptotically tend toward unity when we're using fixed \$d\$ over the lifetime of a growing stack, which sounds like more memcpy() work than desired. (It's a sawtooth, the flip side of \$\frac{d}{\vert \vert T \vert \vert}\,ドル and the item of interest is what that ratio looks like immediately after each new allocation of \$d\$ entries.) Shrinking in constant time I'm willing to believe, since no motion is required. It's the cost of realloc()'s as we're growing which worries me.

Stack Exchange Network

Research program for examining array list arithmetic contractions

Introduction

Code

Demo REPL session

Critique request

1 Answer 1

problem statement

chunks

amortized cost

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Research program for examining array list arithmetic contractions

Introduction

Code

Demo REPL session

Critique request

1 Answer 1

problem statement

chunks

amortized cost

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions