Return to Answer

added 711 characters in body

edited May 4, 2018 at 23:31

11.7k
19
40

for (int j = 0; j < width; j++) {
 elements[i].emplace_back(0.0f);

Instead of writing a loop, use the constructor form to create a vector of the desired length; all elements will be initialized to zero automatically.

for (int i = 0; i < dest.width; i++)
 for (int j = 0; j < dest.height; j++)
 dest.set_element(i, j, t_elements[i][j]);
}

Calling set_element on every element to copy the results — Copy the elements in the order in which they appear in the row, and use an iterator to place them efficiently.

for (int k = 0; k < left.width; k++)
 elementVal += (left.get_element(k, i) * right.get_element(j, k));

Likewise for the get_element function, called repeatedly for every output value. For the inner dimension, just use iterator increment to advance to the next value.

Here is the way we did raster graphics in the old days: Don’t make a vector of vectors. Make a single vector, and an access function that multiplies the row by the row size and adds the column to produce a single index.

Now, given a pointer to any cell, you can efficiently move to the next in any direction. Going right, increment by one. Going down, increment by the row size. So trace through the source matrices this way, one going right, one going down.

If the compiler grasps this and auto-vectorizes the code, you are golden! So you might find out what coding idioms are understood by the compiler. Having the loop structured to traverse both inputs with constant (though different) strides is probably key, though. Getting the compiler to see you are doing a dot-product (on each row) is the most significant speed-up you can do.

Of course, you can invest in something like Intel IPP library or find some free code that uses the AVX2 instructions. A dot-product with two inputs and an output taking different strides is a very general function you can find to reuse.

Oh, if you’re not compiling in 64-bit, do so: you get more vector registers.

for (int j = 0; j < width; j++) {
 elements[i].emplace_back(0.0f);

Instead of writing a loop, use the constructor form to create a vector of the desired length; all elements will be initialized to zero automatically.

for (int i = 0; i < dest.width; i++)
 for (int j = 0; j < dest.height; j++)
 dest.set_element(i, j, t_elements[i][j]);
}

Calling set_element on every element to copy the results — Copy the elements in the order in which they appear in the row, and use an iterator to place them efficiently.

for (int k = 0; k < left.width; k++)
 elementVal += (left.get_element(k, i) * right.get_element(j, k));

Likewise for the get_element function, called repeatedly for every output value. For the inner dimension, just use iterator increment to advance to the next value.

for (int j = 0; j < width; j++) {
 elements[i].emplace_back(0.0f);

Instead of writing a loop, use the constructor form to create a vector of the desired length; all elements will be initialized to zero automatically.

for (int i = 0; i < dest.width; i++)
 for (int j = 0; j < dest.height; j++)
 dest.set_element(i, j, t_elements[i][j]);
}

Calling set_element on every element to copy the results — Copy the elements in the order in which they appear in the row, and use an iterator to place them efficiently.

for (int k = 0; k < left.width; k++)
 elementVal += (left.get_element(k, i) * right.get_element(j, k));

Likewise for the get_element function, called repeatedly for every output value. For the inner dimension, just use iterator increment to advance to the next value.

Oh, if you’re not compiling in 64-bit, do so: you get more vector registers.

added 790 characters in body

Source Link

edited May 4, 2018 at 23:13

JDługosz

edited May 4, 2018 at 23:13

JDługosz

11.7k
19
40

for (int j = 0; j < width; j++) {
 elements[i].emplace_back(0.0f);

Instead of writing a loop, use the constructor form to create a vector of the desired length; all elements will be initialized to zero automatically.

for (int i = 0; i < dest.width; i++)
 for (int j = 0; j < dest.height; j++)
 dest.set_element(i, j, t_elements[i][j]);
}

Calling set_elementset_element on every element to copy the results in makes it multiply out the row index. Copy— Copy the elements in the order in which they appear in the row, and use an iterator to place them efficiently.

for (int k = 0; k < left.width; k++)
 elementVal += (left.get_element(k, i) * right.get_element(j, k));

Likewise for the get_element function, called repeatedly for every output value. For the inner dimension, just use iterator increment to advance to the next value.

for (int j = 0; j < width; j++) {
 elements[i].emplace_back(0.0f);

Instead of writing a loop, use the constructor form to create a vector of the desired length; all elements will be initialized to zero automatically.

for (int i = 0; i < dest.width; i++)
 for (int j = 0; j < dest.height; j++)
 dest.set_element(i, j, t_elements[i][j]);
}

Calling set_element on every element to copy the results in makes it multiply out the row index. Copy the elements in the order in which they appear

for (int j = 0; j < width; j++) {
 elements[i].emplace_back(0.0f);

Instead of writing a loop, use the constructor form to create a vector of the desired length; all elements will be initialized to zero automatically.

for (int i = 0; i < dest.width; i++)
 for (int j = 0; j < dest.height; j++)
 dest.set_element(i, j, t_elements[i][j]);
}

Calling set_element on every element to copy the results — Copy the elements in the order in which they appear in the row, and use an iterator to place them efficiently.

for (int k = 0; k < left.width; k++)
 elementVal += (left.get_element(k, i) * right.get_element(j, k));

Likewise for the get_element function, called repeatedly for every output value. For the inner dimension, just use iterator increment to advance to the next value.

Source Link

answered May 4, 2018 at 23:07

JDługosz

answered May 4, 2018 at 23:07

JDługosz

11.7k
19
40

for (int j = 0; j < width; j++) {
 elements[i].emplace_back(0.0f);

Instead of writing a loop, use the constructor form to create a vector of the desired length; all elements will be initialized to zero automatically.

for (int i = 0; i < dest.width; i++)
 for (int j = 0; j < dest.height; j++)
 dest.set_element(i, j, t_elements[i][j]);
}

Calling set_element on every element to copy the results in makes it multiply out the row index. Copy the elements in the order in which they appear

lang-cpp