I would like to have a synthesizable and optimized solution for the modulus operator in Verilog (%) for the nonpower of two integers.
cnt_mod <= cnt % 3;
For power of two integers (n_arg), it could be seen as a left shifting process followed by truncation to n_arg bits.
How it can be implemented for the nonpower of two integers?
Thanks
2 Answers 2
It's actually possible to implement modulus by a constant with only addition and a small look-up table.
Suppose you have an 8 bit binary number. Look at all powers of two representable in 8 bits, which are the following:
00000001 =わ 1
00000010 =わ 2
00000100 =わ 4
00001000 =わ 8
00010000 =わ 16
00100000 =わ 32
01000000 =わ 64
10000000 =わ 128
Now apply the modulus operator (% 3 in our case) to these powers of two:
00000001 -> 1
00000010 -> 2
00000100 -> 1
00001000 -> 2
00010000 -> 1
00100000 -> 2
01000000 -> 1
10000000 -> 2
This gives us the value of each bit in the number mod 3. Using the fact that (a + b)%x = (a%x + b%x)%x, we can derive a smaller number that has the same value mod 3 as the original number:
n2 = cnt[0] + 2*cnt[1] + cnt[2] + 2*cnt[3] + cnt[4] + 2*cnt[5] + cnt[6] + 2*cnt[7];
This expression is maximal when all bits in cnt are set. By simple addition, we get that n2 is at most 12. Therefore n2 is a 4 bit number.
We have now "compressed" our original 8 bit number into a 4 bit number that has the same value mod 3. That is, cnt%3 = n2%3.
To get the actual modulus, we just have to take that small 4-bit number mod 3 now, which the synthesis will map to a look-up table:
always @(*) begin
if (n2 < 3) cnt_mod3 = n2;
else if (n2 < 6) cnt_mod3 = n2 - 3;
else if (n2 < 9) cnt_mod3 = n2 - 6;
else if (n2 < 12) cnt_mod3 = n2 - 9;
else cnt_mod3 = n2 - 12;
end
So, in the end, all we needed to compute the mod3 expression was an adder and a 4-input look-up table.
This method is applicable to arbitrarily wide numbers and arbitrary moduli as long as the modulus is constant. The only thing that will change is the list of constants that each bit has to be multiplied with. If "n2" is still too wide (i.e. 6 bits or more) after applying the first round of bitwise "compression", just apply another round of it until it's small enough.
I suppose cnt is a counter.
If so, and if it starts from 0 (or a fixed number N), then why not to keep a second variable cnt_mod_3, that is incremented the same moment as cnt and reset to 0 when reaching 3.
That way, you don't have any modulo/division computation to do, and you have no additionnal delay.
In case cnt isn't a counter (or is initialized at runtime with numbers for which you don't have the modolus 3), you might look into the following trick : N=0b a(n) a(n-1) ... a(1) a(0). a(0) % 3 = +a(0) % 3 a(1) 0 % 3 = -a(1) % 3 a(2) 0 0 % 3 = +a(2) % 3 ... a(2i) 0...0 = +a(2i) % 3 a(2i+1) 0...0 = -(ai+1) % 3 ...
So N%3 = sum( a(i)*(-1)^i) %3
So you can reduce N into sum( a(i)*(-1)^i) for checking the modolus 3 (which is a far bigger number). Reduce by 1 or several such steps, either until you can garantee that the result is between 0 and 2, or until it is small enough to use a lookup table to finish.
NB : if you had rather used unigned numbers, you can use 2a(2i+1) instead of -a(2*i+1)