10
\$\begingroup\$

I have the following VHDL function that multiples a given mxn matrix a by a nx1 vector b:

function matrix_multiply_by_vector(a: integer_matrix; b: integer_vector; m: integer; n: integer)
return integer_vector is variable c : integer_vector(m-1 downto 0) := (others => 0);
begin
 for i in 0 to m-1 loop
 for j in 0 to n-1 loop
 c(i) := c(i) + (a(i,j) * b(j));
 end loop;
 end loop;
 return c;
end matrix_multiply_by_vector;

It works well but what does this actually implement in hardware? Specifically, what I want to know is if it is smart enough to realize that it can parallelize the inner for loop, essentially computing a dot product for each row of the matrix. If not, what is the simplest (i.e. nice syntax) way to parallelize matrix-vector multiplication?

asked Jun 1, 2018 at 14:17
\$\endgroup\$
1
  • 1
    \$\begingroup\$ If it wasn't, you would have to have some kind of memory and serially load all of the values and "execute" them pipeline style \$\endgroup\$ Commented Jun 1, 2018 at 17:41

2 Answers 2

10
\$\begingroup\$

In 'hardware' (VHDL or Verilog) all loops are unrolled and executed in parallel.

Thus not only your inner loop, also your outer loop is unrolled.

That is also the reason why the loop size must be known at compile time. When the loop length is unknown the synthesis tool will complain.


It is a well known trap for beginners coming from a SW language. They try to convert:

int a,b,c;
 c = 0;
 while (a--)
 c += b;

To VHDL/Verilog hardware. The problem is that it all works fine in simulation. But the synthesis tool needs to generate adders: c = b+b+b+b...b;

For that the tool needs to know how many adders to make. If a is a constant fine! (Even if it is 4.000.000. It will run out of gates but it will try!)

But if a is a variable it is lost.

Ale..chenski
43.1k3 gold badges46 silver badges115 bronze badges
answered Jun 1, 2018 at 15:28
\$\endgroup\$
1
  • \$\begingroup\$ In this case it's just multiplication, so a could just be the multiplicand and therefore be variable... \$\endgroup\$ Commented Jun 1, 2018 at 16:54
1
\$\begingroup\$

This code will parallelize both loops, since you haven't defined an event to control any subset of the processing. Loops just generate as much hardware as they need to generate the function; you need a PROCESS.

A process has a sensitivity list that tells VHDL (or the synthesizer) that the process is not invoked unless one of the nodes in the list changes. This can be used to synthesize latches, and expand beyond the realm of pure combinatorial implementation.

answered Jun 1, 2018 at 16:14
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.