Skip to main content
Code Review

Return to Question

Added the purpose of the code.
Source Link
M. Al Jumaily
  • 123
  • 1
  • 1
  • 5

Edit 1: Some wonder what is the purpose of this. This is not a coding question but an implementation to improve upon an M.Sc. thesis implementation that deals with finding the minimum distance of a Linear Codes , an NP-Hard type of a problem. Essentially, a large number (2^40 or more) of vectors, represented as longs, is required to be stored inside of some data structure to perform operations on. Such problems use hundreds of Gigabytes and are executed on servers. The idea of using compression algorithms and writing to disk are not practical options they consume extra unneeded time (at least in my case). fastutil implemented something like this via BigArrays but the implementation here is tailored to a specific case to maximize the performance and is not based fastutil's implementation.

Edit 1: Some wonder what is the purpose of this. This is not a coding question but an implementation to improve upon an M.Sc. thesis implementation that deals with finding the minimum distance of a Linear Codes , an NP-Hard type of a problem. Essentially, a large number (2^40 or more) of vectors, represented as longs, is required to be stored inside of some data structure to perform operations on. Such problems use hundreds of Gigabytes and are executed on servers. The idea of using compression algorithms and writing to disk are not practical options they consume extra unneeded time (at least in my case). fastutil implemented something like this via BigArrays but the implementation here is tailored to a specific case to maximize the performance and is not based fastutil's implementation.

Source Link
M. Al Jumaily
  • 123
  • 1
  • 1
  • 5

Java - Breaking a large 1-D array into a 2-D array

Java's largest array length is 2^31 – 8 = 2,147,483,639, which is an int consisting of 32-bits. In the case where we want to store more elements, we are better off creating multiple 1-D arrays or use a 2-D array which is an array of arrays.

A program is coded to accept a length (typically larger than 2^31 – 8) to create a 2-D array with the appropriate number of cells. It has the methods get(long index) and set(long index, byte value) to get and set a specific index, respectively. Here, the type long is used which is a 64-bit whole number.

  • Given the number of elements as size, create the correct number of 1-D arrays containing exactly size number of cells in total
  • Given some element at index, find the appropriate 1-D array containing that element as well as find it in that array
  • Since we are creating arrays taking Gigabytes of RAM, the type byte, which is an 8-bit number, is used
  • For testing, the sequence 0, 1, ..., 126, 127, 0, 1, ... is stored in the large array
    • A for loop is then executed to loop through each cell and ensure the sequence follows though
    • The program has been tested for days without encountering any bugs

There are two tests:

  • The first to ensure the lengths of the 1-D arrays are created correctly
    • The data structure used is a 1-D where each cell represent the length of that 1-D array at the ith index
    • This will eliminate the need of Terabytes of RAM
  • The second is a complete implementation where the get and set methods are implemented properly but requires multiple Gigabytes of RAM to run

The request: Are there any bugs that have not been taken care of? Are there any improvements that can be applied?

import java.util.Arrays;
/**
 * Uses a 1-D array where each cell stores the number of elements that needs
 * to be stored there.
 */
public class SimpleTest {
 /**
 * The structure used to store the data.
 */
 private int[] array;
 /**
 * The amount of cells that are excluded from an array because VMs reserve
 * some header words in an array which reduces the <em>true</em> maximum
 * value. It could be at least 8 but 16 is used for extra precaution.
 */
 private static byte offset = 16;
 /**
 * The number of cells in the array.
 */
 private long size;
 /**
 * The number of 1-D arrays required.
 */
 private int segments;
 /**
 * The maximum size of each 1-D array.
 */
 private static int maxSegmentLength = Integer.MAX_VALUE - offset;
 /**
 * The size of last segment which is between {@code 1} and {@code
 * maxSegmentLength} (both inclusive).
 */
 private int ancillarySegmentLength;
 public SimpleTest(long size) {
 if (size <= 0) {
 throw new RuntimeException("Size cannot be less than 1");
 }
 this.size = size;
 //number of 1-D arrays with full capacity
 segments = (int) (size / (maxSegmentLength));//can be zero or larger
 //number of cells in the last 1-D array
 //or there will be only a single 1-D array if size < maxSegmentLength
 if ((int) (size % (maxSegmentLength)) != 0) {
 segments++;
 ancillarySegmentLength = (int) (size % (maxSegmentLength));
 } else {//Last 1-D array is full (all 1-D arrays are perfectly full)
 ancillarySegmentLength = maxSegmentLength;
 }
 array = new int[segments];
 if (segments == 1) {//a single 1-D array required
 array[0] = (int) size;
 } else {//
 for (int i = 0; i < segments - 1; i++) {
 array[i] = maxSegmentLength;
 }
 if (ancillarySegmentLength != 0) {
 array[segments - 1] = ancillarySegmentLength;
 }
 }
 System.out.println("==========");
 System.out.println("Size: " + size);
 System.out.println("Total Arrays needed: " + segments);
 System.out.println("Size of last array: " + ancillarySegmentLength);
 System.out.println("==========");
 }
 public String toString() {
 return Arrays.toString(array);
 }
 public static long power(long x, long n) {
 long result = 1;
 for (byte i = 0; i < n; i++) {
 result = result * x;
 }
 return result;
 }
 public static void main(String[] args) {
 System.out.println("Running CompleteTest...");
 //extra is a value between 0 and maxSegmentLength, including both
 long extra = (long) (Math.random() * (maxSegmentLength + 1));
 long size = maxSegmentLength * 1L + extra;//
 SimpleTest data = new SimpleTest(size);
 System.out.println("Printing the array: ");
 System.out.println(data);
 System.out.println("Run completed.");
 }
}

/**
 * A data structure for storing a large 1-D array into smaller 1-D array
 * stored inside a 2-D array.
 */
public class CompleteTest {
 /**
 * The structure used to store the codewords of the code.
 */
 public byte[][] array;
 /**
 * The amount of cells that are excluded from an array because VMs reserve
 * some header words in an array which reduces the <em>true</em> maximum
 * value. It could be at least 8 but 16 is used for extra precaution.
 */
 private static byte offset = 16;
 /**
 * The number of cells in the array.
 */
 private long size;
 /**
 * The number of 1-D arrays required.
 */
 private int segments;
 /**
 * The maximum size of each 1-D array.
 */
 private static int maxSegmentLength = Integer.MAX_VALUE - offset;
 /**
 * The size of last segment which is between {@code 1} and {@code
 * maxSegmentLength} (both inclusive).
 */
 private int ancillarySegmentLength;
 public CompleteTest(long size) {
 //assume size >= 1
 this.size = size;
 //number of 1-D arrays with full capacity
 segments = (int) (size / (maxSegmentLength));//can be zero or larger
 //number of cells in the last 1-D array
 //or there will be only a single 1-D array if size < maxSegmentLength
 if ((int) (size % (maxSegmentLength)) != 0) {
 segments++;
 ancillarySegmentLength = (int) (size % (maxSegmentLength));
 } else {//Last 1-D array is full (all 1-D arrays are perfectly full)
 ancillarySegmentLength = maxSegmentLength;
 }
 array = new byte[segments][];
 //The last segment: if ancillarySegmentLength is 0, it's the same
 //as the other segments. Otherwise, allocate ancillarySegmentLength
 //cells where ancillarySegmentLength < maxSegmentLength
 if (ancillarySegmentLength == 0) {
 array[segments - 1] = new byte[maxSegmentLength];
 } else {
 array[segments - 1] = new byte[ancillarySegmentLength];
 }
 if (segments == 1) {//a single 1-D array required
 array[0] = new byte[(int) size];
 } else {
 //populate all the segments except the last one
 for (int i = 0; i < segments - 1; i++) {
 array[i] = new byte[maxSegmentLength];
 }
 //populate the last 1-D array
 if (ancillarySegmentLength != 0) {
 array[segments - 1] = new byte[ancillarySegmentLength];
 }
 }
 }
 /**
 * Returns a specific element in the combination array.
 *
 * @param index the index of the element
 * @return the element in the index specified
 */
 public long get(long index) {
 int s = (int) (index / maxSegmentLength);
 int i = (int) (index % maxSegmentLength);
 return array[s][i];
 }
 /**
 * Sets the value specified at the appropriate index.
 *
 * @param index the index of the element to set
 * @param value the value to be assigned to
 */
 public void set(long index, byte value) {
 int s = (int) (index / (long) maxSegmentLength);
 int i = (int) (index % maxSegmentLength);
 array[s][i] = value;
 }
 /**
 * Returns the number of elements.
 *
 * @return the number of elements
 */
 public long length() {
 return size;
 }
 /**
 * Checks if the value specified is exists in the combination array.
 *
 * @param value the value to look for
 * @return {@code true} if the value exists, {@code false} otherwise
 */
 public boolean contains(long value) {
 for (int i = 0; i < array.length; i++) {
 for (int j = 0; j < array[i].length; j++) {
 if (array[i][j] == value) return true;
 }
 }
 return false;
 }
 /**
 * Prints the elements in the combination array on screen. The values
 * will be printed in base 10.
 */
 public void print() {
 long counter = 0;
 for (int i = 0; i < array.length; i++) {
 for (int j = 0; j < array[i].length; j++) {
 System.out.format("(%d, %d) = %d \n", i, j, get(counter));
 counter++;
 }
 System.out.println();
 }
 }
 /**
 * Executes the program.
 *
 * @param args the arguments specified but will be ignored
 */
 public static void main(String[] args) {
 System.out.println("Running CompleteTest...");
 //extra is a value between 0 and maxSegmentLength, including both
 long extra = (long) (Math.random() * (maxSegmentLength + 1));
 long size = maxSegmentLength * 1L + extra;//
 CompleteTest array = new CompleteTest(size);
 System.out.println("Size: " + array.size);
 for (int i = 0; i < array.array.length; i++) {
 System.out.println(array.array[i].length);
 }
 //store the sequence 0, 1, ..., 126, 127, 0, 1, ... in the array
 for (long i = 0; i < size; i++) {
 array.set(i, (byte) (i % 128));
 }
 //check each cell to ensure the above sequence is present
 for (long i = 0; i < size; i++) {
 if (array.get(i) != (byte) (i % 128)) {
 System.out.println(
 "i = " + i + ", " +
 "value stored = " + array.get(i) + ", " +
 "correct value = " + (byte) (i % 128)
 );
 }
 }
 //array.print();
 System.out.println("Run completed.");
 }
}
lang-java

AltStyle によって変換されたページ (->オリジナル) /