Any and all comments are welcome in this review.
##Problem
I've been doing a lot with numerical integration methods recently and have mostly been programming in Python. But, speedups and FPGAs are cool! Thusly, I'm attempting to implement the trapezoidal integration method in Verilog. I have never programmed in Verilog before and know almost nothing about it, hence this code review on a very short program that is probably also very crappy.
The trapezoidal method is at its heart very simple. Take two points on your function and call them y1 and y2. Then define a trapezoid with the two "bases" defined by the height from the x-axis to y1 and y2 and a height of the interval between the two corresponding x coordinates (i.e., x2 - x1; we'll call this value x for simplicity). Then plug this into the formula for the area of a trapezoid and you get $$A = \frac{x(y_1+y_2)}{2}$$ sum this for a bunch of points with a very small value of x and you've got the integral.
The other little quirk in what I'm doing is that I'm doing it cumulatively, i.e., I'm taking in a signal (which I call SIGNAL
in my code) and find A at that point and send it to the output (OUT
). Then on the board I'll wire OUT
to SUM
and effectively add the past OUT
to my new OUT
to get the total area up to that point.
##Code Description
I take in several inputs - a clock signal CLK
, the function-signal SIGNAL
(i.e., what I'm integrating) the distance between clock ticks x
, and the past SUM
(again, what OUT
is mapped to). I have one output, OUT
, which is the solution up to that point.
I begin by defining three 64 bit registers. The first two are for the two values y1 and y2, and the third is for the SUM
(I handle it oddly). Then I start an always
loop - whenever CLK
is high, I set yregtwo
equal to whatever is in yregone
and yregone
equal to the SIGNAL
(effectively shifting the y values) and then check if yregtwo
actually has something in it - i.e., that it's not step one. If this is true, then I perform the actual calculation detailed in the formula I gave above and add SUM
to it (not sum
). Finally, I set sum
equal to that calculation and set OUT
equal to sum
.
##Full Code
module trapverilog(
input CLK,
input signed [7:0] SIGNAL,
input signed [7:0] x,
input signed [7:0] SUM, // OUT pins are mapped to SUM pins on board
output reg OUTP,
output reg OUT1,
output reg OUT2,
output reg OUT3,
output reg OUT4,
output reg OUT5,
output reg OUT6,
output reg OUT7
);
reg[7:0] yregone;
reg[7:0] yregtwo;
reg[7:0] innerSumOutput;
reg[7:0] innerSum;
function [7:0] multiply;
input [7:0] a;
input [7:0] b;
reg [15:0] a1, a2, a3, a4, a5, a6, a7, a8;
begin
a1 = (b[0]==1'b1) ? {8'b00000000, a} : 16'b0000000000000000;
a2 = (b[1]==1'b1) ? {7'b0000000, a, 1'b0} : 16'b0000000000000000;
a3 = (b[2]==1'b1) ? {6'b000000, a, 2'b00} : 16'b0000000000000000;
a4 = (b[3]==1'b1) ? {5'b00000, a, 3'b000} : 16'b0000000000000000;
a5 = (b[4]==1'b1) ? {4'b0000, a, 4'b0000} : 16'b0000000000000000;
a6 = (b[5]==1'b1) ? {3'b000, a, 5'b00000} : 16'b0000000000000000;
a7 = (b[6]==1'b1) ? {2'b00, a, 6'b000000} : 16'b0000000000000000;
a8 = (b[7]==1'b1) ? {1'b0, a, 7'b0000000} : 16'b0000000000000000;
multiply = a1 + a2 + a3 + a4 + a5 + a6 + a7 + a8;
end
endfunction
always @(posedge CLK)
begin
yregtwo <= yregone;
yregone <= SIGNAL;
if (yregone != 0)
begin
innerSum <= multiply((yregone + yregtwo), x); //treats x as plain h, change if treated as h/2 // multiply defined by function shift-adds
innerSumOutput <= (innerSum <<< 1) + SUM; // <<< is signed one bit shift which = /2
if (innerSumOutput[0] == 1)
begin
OUTP <= 1;
end
OUT1 <= innerSumOutput[1];
OUT2 <= innerSumOutput[2];
OUT3 <= innerSumOutput[3];
OUT4 <= innerSumOutput[4];
OUT5 <= innerSumOutput[5];
OUT6 <= innerSumOutput[6];
OUT7 <= innerSumOutput[7];
end
end
endmodule
##User Config File
NET "CLK" LOC = P126;
NET "SIGNAL[0]" LOC = P35 | IOSTANDARD = LVCMOS33 | DRIVE = 8 | SLEW = FAST;
NET "SIGNAL[1]" LOC = P34 | IOSTANDARD = LVCMOS33 | DRIVE = 8 | SLEW = FAST;
NET "SIGNAL[2]" LOC = P33 | IOSTANDARD = LVCMOS33 | DRIVE = 8 | SLEW = FAST;
NET "SIGNAL[3]" LOC = P32 | IOSTANDARD = LVCMOS33 | DRIVE = 8 | SLEW = FAST;
NET "SIGNAL[4]" LOC = P30 | IOSTANDARD = LVCMOS33 | DRIVE = 8 | SLEW = FAST;
NET "SIGNAL[5]" LOC = P29 | IOSTANDARD = LVCMOS33 | DRIVE = 8 | SLEW = FAST;
NET "SIGNAL[6]" LOC = P27 | IOSTANDARD = LVCMOS33 | DRIVE = 8 | SLEW = FAST;
NET "SIGNAL[7]" LOC = P26 | IOSTANDARD = LVCMOS33 | DRIVE = 8 | SLEW = FAST;
NET "x[0]" LOC = P24 | IOSTANDARD = LVCMOS33 | DRIVE = 8 | SLEW = FAST;
NET "x[1]" LOC = P23 | IOSTANDARD = LVCMOS33 | DRIVE = 8 | SLEW = FAST;
NET "x[2]" LOC = P22 | IOSTANDARD = LVCMOS33 | DRIVE = 8 | SLEW = FAST;
NET "x[3]" LOC = P21 | IOSTANDARD = LVCMOS33 | DRIVE = 8 | SLEW = FAST;
NET "x[4]" LOC = P17 | IOSTANDARD = LVCMOS33 | DRIVE = 8 | SLEW = FAST;
NET "x[5]" LOC = P16 | IOSTANDARD = LVCMOS33 | DRIVE = 8 | SLEW = FAST;
NET "x[6]" LOC = P15 | IOSTANDARD = LVCMOS33 | DRIVE = 8 | SLEW = FAST;
NET "x[7]" LOC = P14 | IOSTANDARD = LVCMOS33 | DRIVE = 8 | SLEW = FAST;
NET "SUM[0]" LOC = P12 | IOSTANDARD = LVCMOS33 | DRIVE = 8 | SLEW = FAST;
NET "SUM[1]" LOC = P11 | IOSTANDARD = LVCMOS33 | DRIVE = 8 | SLEW = FAST;
NET "SUM[2]" LOC = P10 | IOSTANDARD = LVCMOS33 | DRIVE = 8 | SLEW = FAST;
NET "SUM[3]" LOC = P9 | IOSTANDARD = LVCMOS33 | DRIVE = 8 | SLEW = FAST;
NET "SUM[4]" LOC = P8 | IOSTANDARD = LVCMOS33 | DRIVE = 8 | SLEW = FAST;
NET "SUM[5]" LOC = P7 | IOSTANDARD = LVCMOS33 | DRIVE = 8 | SLEW = FAST;
NET "SUM[6]" LOC = P6 | IOSTANDARD = LVCMOS33 | DRIVE = 8 | SLEW = FAST;
NET "SUM[7]" LOC = P5 | IOSTANDARD = LVCMOS33 | DRIVE = 8 | SLEW = FAST;
I'm using the Mimas Spartan 6 board.
2 Answers 2
Why are the inputs signed
, are you working with negative dimensions?
Most FPGAs have dedicated logic for multiplication so you usually can simply write x*y
without having any issues. For better or worse, manually writing it out like you did could impact optimization. Be aware your multiply
truncates most of the MSB bits. An 8-bit times 8-bit input would return a 16-bit value.
<<< 1
is shift left and is effectively the same as multiplying by two.
Not sure why each output bit has its own port when you clearly could use a vector like the inputs.
You should add a reset and enable inputs.
Assuming your FPGA can handle \$ \frac{x(y1+y2)}{2} \$ in one clock. You could simply write:
module trapezoidal_integration(
input clk, rst_n, en,
input [7:0] x_in, y1_in, y2_in,
output reg [15:0] area_out,
output reg [15:0] cum_out, // cumulative (increase width ???)
output reg err_overflow);
wire [15:0] area = (x_in * (y1_in+y2_in)) / 4'h2;
always @(posedge clk) begin
if (!rst_n) begin
area_out <= 16'h0000;
cum_out <= 16'h0000;
err_overflow <= 1'b0;
end
else if (en) begin
area_out <= area;
{err_overflow,cum_out} <= cum_out + area;
end
end
endmodule
If you need to limit your pins or need to space out the operations, you could do something like below (untested example):
module trapezoidal_integration(
input clk, rst_n, en,
input [7:0] s_in,
output reg [15:0] cum_out, // cumulative (increase width ???)
output reg err_overflow) );
reg [1:0] state;
reg [16:0] tmp;
always @(posedge clk) begin
if (!rst_n) begin
state <= 3'b001;
tmp <= 17'h0_0000;
cum_out <= 16'h0000;
err_overflow <= 1'b0;
end
else begin
case(state)
2'b00 : begin
tmp[7:0] <= s_in; // y1
if (en) state <= 2'b01;
end
2'b01 : begin
tmp[8:0] <= tmp[7:0] + s_in; // y1+y2
state <= 2'b11; // gray code
end
2'b11 : begin
tmp <= s_in * tmp[8:0]; // x*(y1+y2)
state <= 2'b10; // gray code
end
2'b10 : begin
{err_overflow,cum_out} <= cum_out + tmp[16:1]; // tmp[16:1] === tmp/2
state <= 2'b00;
end
endcase
end
end
endmodule
In addition to the points made in the previous answer, here are some other coding style suggestions.
Documentation
Add a block comment to describe the design:
/*
Trapezoidal integration method
Add more description like you have in the text of the question
*/
module trapverilog (
Naming
trapverilog
is not a meaningful name for a module. We already know
you are using the Verilog language. The previous answer shows
an improved name.
x
is a poor choice of a signal name because it is easily confused
with the 4-state value x
(the unknown value). Perhaps delta_x
would be more meaningful.
Names like yregone
are hard to read and understand. It is customary
to use underscores: y_reg_one
Set once
You only assign a value to OUTP
once, and it is always set to 1.
Once it is set to 1, it can never be set to 0 again. This is a little
unusual.
Layout
Long lines are hard to read and understand:
innerSum <= multiply((yregone + yregtwo), x); //treats x as plain h, change if treated as h/2 // multiply defined by function shift-adds
It should be split into multiple lines with the comments above the code:
// treats x as plain h, change if treated as h/2
// multiply defined by function shift-adds
innerSum <= multiply((yregone + yregtwo), x);
Simpler
Comparisons to 1:
if (b[0]==1'b1)
are often omitted:
if (b[0])
Repeated 0 bit values like 5'b00000
and 16'b0000000000000000
are easier to understand using the replicated concatenation
operator:
{5{1'b0}}
{16{1'b0}}
The begin/end
statements are not needed in the function, and are often omitted.
function [7:0] multiply;
input [7:0] a;
input [7:0] b;
reg [15:0] a1, a2, a3, a4, a5, a6, a7, a8;
a1 = (b[0]) ? { {8{1'b0}}, a } : {16{1'b0}};
a2 = (b[1]) ? { {7{1'b0}}, a, {1{1'b0}} } : {16{1'b0}};
a3 = (b[2]) ? { {6{1'b0}}, a, {2{1'b0}} } : {16{1'b0}};
a4 = (b[3]) ? { {5{1'b0}}, a, {3{1'b0}} } : {16{1'b0}};
a5 = (b[4]) ? { {4{1'b0}}, a, {4{1'b0}} } : {16{1'b0}};
a6 = (b[5]) ? { {3{1'b0}}, a, {5{1'b0}} } : {16{1'b0}};
a7 = (b[6]) ? { {2{1'b0}}, a, {6{1'b0}} } : {16{1'b0}};
a8 = (b[7]) ? { {1{1'b0}}, a, {7{1'b0}} } : {16{1'b0}};
multiply = a1 + a2 + a3 + a4 + a5 + a6 + a7 + a8;
endfunction
To further simplify this code, you could replace the 8 separate 16-bit
signals with an array of 16-bit values and use the array sum
built-in method.
This syntax requires you to enable SystemVerilog features in your simulator,
if they are not already enabled by default: