Verilog implementation of trapezoidal integration method

Question 1

Any and all comments are welcome in this review.

##Problem

I've been doing a lot with numerical integration methods recently and have mostly been programming in Python. But, speedups and FPGAs are cool! Thusly, I'm attempting to implement the trapezoidal integration method in Verilog. I have never programmed in Verilog before and know almost nothing about it, hence this code review on a very short program that is probably also very crappy.

The trapezoidal method is at its heart very simple. Take two points on your function and call them y₁ and y₂. Then define a trapezoid with the two "bases" defined by the height from the x-axis to y₁ and y₂ and a height of the interval between the two corresponding x coordinates (i.e., x₂ - x₁; we'll call this value x for simplicity). Then plug this into the formula for the area of a trapezoid and you get $$A = \frac{x(y_1+y_2)}{2}$$ sum this for a bunch of points with a very small value of x and you've got the integral.

The other little quirk in what I'm doing is that I'm doing it cumulatively, i.e., I'm taking in a signal (which I call SIGNAL in my code) and find A at that point and send it to the output (OUT). Then on the board I'll wire OUT to SUM and effectively add the past OUT to my new OUT to get the total area up to that point.

##Code Description

I take in several inputs - a clock signal CLK, the function-signal SIGNAL (i.e., what I'm integrating) the distance between clock ticks x, and the past SUM (again, what OUT is mapped to). I have one output, OUT, which is the solution up to that point.

I begin by defining three 64 bit registers. The first two are for the two values y₁ and y₂, and the third is for the SUM (I handle it oddly). Then I start an always loop - whenever CLK is high, I set yregtwo equal to whatever is in yregone and yregone equal to the SIGNAL (effectively shifting the y values) and then check if yregtwo actually has something in it - i.e., that it's not step one. If this is true, then I perform the actual calculation detailed in the formula I gave above and add SUM to it (not sum). Finally, I set sum equal to that calculation and set OUT equal to sum.

##Full Code

module trapverilog(
 input CLK,
 input signed [7:0] SIGNAL,
 input signed [7:0] x,
 input signed [7:0] SUM, // OUT pins are mapped to SUM pins on board
 output reg OUTP,
 output reg OUT1,
 output reg OUT2,
 output reg OUT3,
 output reg OUT4,
 output reg OUT5,
 output reg OUT6,
 output reg OUT7
 );
reg[7:0] yregone;
reg[7:0] yregtwo;
reg[7:0] innerSumOutput;
reg[7:0] innerSum;
function [7:0] multiply;
 input [7:0] a;
 input [7:0] b;
 reg [15:0] a1, a2, a3, a4, a5, a6, a7, a8;
 begin
 a1 = (b[0]==1'b1) ? {8'b00000000, a} : 16'b0000000000000000;
 a2 = (b[1]==1'b1) ? {7'b0000000, a, 1'b0} : 16'b0000000000000000;
 a3 = (b[2]==1'b1) ? {6'b000000, a, 2'b00} : 16'b0000000000000000;
 a4 = (b[3]==1'b1) ? {5'b00000, a, 3'b000} : 16'b0000000000000000;
 a5 = (b[4]==1'b1) ? {4'b0000, a, 4'b0000} : 16'b0000000000000000;
 a6 = (b[5]==1'b1) ? {3'b000, a, 5'b00000} : 16'b0000000000000000;
 a7 = (b[6]==1'b1) ? {2'b00, a, 6'b000000} : 16'b0000000000000000;
 a8 = (b[7]==1'b1) ? {1'b0, a, 7'b0000000} : 16'b0000000000000000;
 multiply = a1 + a2 + a3 + a4 + a5 + a6 + a7 + a8;
 end
endfunction
always @(posedge CLK)
begin
 yregtwo <= yregone;
 yregone <= SIGNAL;
 
 if (yregone != 0)
 begin
 innerSum <= multiply((yregone + yregtwo), x); //treats x as plain h, change if treated as h/2 // multiply defined by function shift-adds
 innerSumOutput <= (innerSum <<< 1) + SUM; // <<< is signed one bit shift which = /2
 if (innerSumOutput[0] == 1)
 begin
 OUTP <= 1;
 end
 
 OUT1 <= innerSumOutput[1];
 OUT2 <= innerSumOutput[2];
 OUT3 <= innerSumOutput[3];
 OUT4 <= innerSumOutput[4];
 OUT5 <= innerSumOutput[5];
 OUT6 <= innerSumOutput[6];
 OUT7 <= innerSumOutput[7];
 end
end
endmodule

##User Config File

NET "CLK" LOC = P126;
NET "SIGNAL[0]" LOC = P35 | IOSTANDARD = LVCMOS33 | DRIVE = 8 | SLEW = FAST;
NET "SIGNAL[1]" LOC = P34 | IOSTANDARD = LVCMOS33 | DRIVE = 8 | SLEW = FAST;
NET "SIGNAL[2]" LOC = P33 | IOSTANDARD = LVCMOS33 | DRIVE = 8 | SLEW = FAST;
NET "SIGNAL[3]" LOC = P32 | IOSTANDARD = LVCMOS33 | DRIVE = 8 | SLEW = FAST;
NET "SIGNAL[4]" LOC = P30 | IOSTANDARD = LVCMOS33 | DRIVE = 8 | SLEW = FAST;
NET "SIGNAL[5]" LOC = P29 | IOSTANDARD = LVCMOS33 | DRIVE = 8 | SLEW = FAST;
NET "SIGNAL[6]" LOC = P27 | IOSTANDARD = LVCMOS33 | DRIVE = 8 | SLEW = FAST;
NET "SIGNAL[7]" LOC = P26 | IOSTANDARD = LVCMOS33 | DRIVE = 8 | SLEW = FAST;
NET "x[0]" LOC = P24 | IOSTANDARD = LVCMOS33 | DRIVE = 8 | SLEW = FAST;
NET "x[1]" LOC = P23 | IOSTANDARD = LVCMOS33 | DRIVE = 8 | SLEW = FAST;
NET "x[2]" LOC = P22 | IOSTANDARD = LVCMOS33 | DRIVE = 8 | SLEW = FAST;
NET "x[3]" LOC = P21 | IOSTANDARD = LVCMOS33 | DRIVE = 8 | SLEW = FAST;
NET "x[4]" LOC = P17 | IOSTANDARD = LVCMOS33 | DRIVE = 8 | SLEW = FAST;
NET "x[5]" LOC = P16 | IOSTANDARD = LVCMOS33 | DRIVE = 8 | SLEW = FAST;
NET "x[6]" LOC = P15 | IOSTANDARD = LVCMOS33 | DRIVE = 8 | SLEW = FAST;
NET "x[7]" LOC = P14 | IOSTANDARD = LVCMOS33 | DRIVE = 8 | SLEW = FAST;
NET "SUM[0]" LOC = P12 | IOSTANDARD = LVCMOS33 | DRIVE = 8 | SLEW = FAST;
NET "SUM[1]" LOC = P11 | IOSTANDARD = LVCMOS33 | DRIVE = 8 | SLEW = FAST;
NET "SUM[2]" LOC = P10 | IOSTANDARD = LVCMOS33 | DRIVE = 8 | SLEW = FAST;
NET "SUM[3]" LOC = P9 | IOSTANDARD = LVCMOS33 | DRIVE = 8 | SLEW = FAST;
NET "SUM[4]" LOC = P8 | IOSTANDARD = LVCMOS33 | DRIVE = 8 | SLEW = FAST;
NET "SUM[5]" LOC = P7 | IOSTANDARD = LVCMOS33 | DRIVE = 8 | SLEW = FAST;
NET "SUM[6]" LOC = P6 | IOSTANDARD = LVCMOS33 | DRIVE = 8 | SLEW = FAST;
NET "SUM[7]" LOC = P5 | IOSTANDARD = LVCMOS33 | DRIVE = 8 | SLEW = FAST;

I'm using the Mimas Spartan 6 board.

Question 2

Why are the inputs signed, are you working with negative dimensions?

Most FPGAs have dedicated logic for multiplication so you usually can simply write x*y without having any issues. For better or worse, manually writing it out like you did could impact optimization. Be aware your multiply truncates most of the MSB bits. An 8-bit times 8-bit input would return a 16-bit value.

<<< 1 is shift left and is effectively the same as multiplying by two.

Not sure why each output bit has its own port when you clearly could use a vector like the inputs.

You should add a reset and enable inputs.

Assuming your FPGA can handle \$ \frac{x(y1+y2)}{2} \$ in one clock. You could simply write:

module trapezoidal_integration(
 input clk, rst_n, en,
 input [7:0] x_in, y1_in, y2_in,
 output reg [15:0] area_out,
 output reg [15:0] cum_out, // cumulative (increase width ???)
 output reg err_overflow);
 wire [15:0] area = (x_in * (y1_in+y2_in)) / 4'h2;
 always @(posedge clk) begin
 if (!rst_n) begin
 area_out <= 16'h0000;
 cum_out <= 16'h0000;
 err_overflow <= 1'b0;
 end
 else if (en) begin
 area_out <= area;
 {err_overflow,cum_out} <= cum_out + area;
 end
 end
endmodule

If you need to limit your pins or need to space out the operations, you could do something like below (untested example):

module trapezoidal_integration(
 input clk, rst_n, en,
 input [7:0] s_in,
 output reg [15:0] cum_out, // cumulative (increase width ???)
 output reg err_overflow) );
 reg [1:0] state;
 reg [16:0] tmp;
 always @(posedge clk) begin
 if (!rst_n) begin
 state <= 3'b001;
 tmp <= 17'h0_0000;
 cum_out <= 16'h0000;
 err_overflow <= 1'b0;
 end
 else begin
 case(state)
 2'b00 : begin
 tmp[7:0] <= s_in; // y1
 if (en) state <= 2'b01;
 end
 2'b01 : begin
 tmp[8:0] <= tmp[7:0] + s_in; // y1+y2
 state <= 2'b11; // gray code
 end
 2'b11 : begin
 tmp <= s_in * tmp[8:0]; // x*(y1+y2)
 state <= 2'b10; // gray code
 end
 2'b10 : begin
 {err_overflow,cum_out} <= cum_out + tmp[16:1]; // tmp[16:1] === tmp/2
 state <= 2'b00;
 end
 endcase
 end
 end
endmodule

Question 3

In addition to the points made in the previous answer, here are some other coding style suggestions.

Documentation

Add a block comment to describe the design:

/*
Trapezoidal integration method
Add more description like you have in the text of the question
*/
module trapverilog (

Naming

trapverilog is not a meaningful name for a module. We already know you are using the Verilog language. The previous answer shows an improved name.

x is a poor choice of a signal name because it is easily confused with the 4-state value x (the unknown value). Perhaps delta_x would be more meaningful.

Names like yregone are hard to read and understand. It is customary to use underscores: y_reg_one

Set once

You only assign a value to OUTP once, and it is always set to 1. Once it is set to 1, it can never be set to 0 again. This is a little unusual.

Layout

Long lines are hard to read and understand:

innerSum <= multiply((yregone + yregtwo), x); //treats x as plain h, change if treated as h/2 // multiply defined by function shift-adds

It should be split into multiple lines with the comments above the code:

// treats x as plain h, change if treated as h/2
// multiply defined by function shift-adds
innerSum <= multiply((yregone + yregtwo), x);

Simpler

Comparisons to 1:

if (b[0]==1'b1)

are often omitted:

if (b[0])

Repeated 0 bit values like 5'b00000 and 16'b0000000000000000 are easier to understand using the replicated concatenation operator:

{5{1'b0}}
{16{1'b0}}

The begin/end statements are not needed in the function, and are often omitted.

function [7:0] multiply;
 input [7:0] a;
 input [7:0] b;
 reg [15:0] a1, a2, a3, a4, a5, a6, a7, a8;
 a1 = (b[0]) ? { {8{1'b0}}, a } : {16{1'b0}};
 a2 = (b[1]) ? { {7{1'b0}}, a, {1{1'b0}} } : {16{1'b0}};
 a3 = (b[2]) ? { {6{1'b0}}, a, {2{1'b0}} } : {16{1'b0}};
 a4 = (b[3]) ? { {5{1'b0}}, a, {3{1'b0}} } : {16{1'b0}};
 a5 = (b[4]) ? { {4{1'b0}}, a, {4{1'b0}} } : {16{1'b0}};
 a6 = (b[5]) ? { {3{1'b0}}, a, {5{1'b0}} } : {16{1'b0}};
 a7 = (b[6]) ? { {2{1'b0}}, a, {6{1'b0}} } : {16{1'b0}};
 a8 = (b[7]) ? { {1{1'b0}}, a, {7{1'b0}} } : {16{1'b0}};
 multiply = a1 + a2 + a3 + a4 + a5 + a6 + a7 + a8;
endfunction

To further simplify this code, you could replace the 8 separate 16-bit signals with an array of 16-bit values and use the array sum built-in method. This syntax requires you to enable SystemVerilog features in your simulator, if they are not already enabled by default:

Greg Greg 5231 gold badge10 silver badges16 bronze badges · Accepted Answer · 2019-07-24 20:17:33Z

Why are the inputs signed, are you working with negative dimensions?

Most FPGAs have dedicated logic for multiplication so you usually can simply write x*y without having any issues. For better or worse, manually writing it out like you did could impact optimization. Be aware your multiply truncates most of the MSB bits. An 8-bit times 8-bit input would return a 16-bit value.

<<< 1 is shift left and is effectively the same as multiplying by two.

Not sure why each output bit has its own port when you clearly could use a vector like the inputs.

You should add a reset and enable inputs.

Assuming your FPGA can handle \$ \frac{x(y1+y2)}{2} \$ in one clock. You could simply write:

module trapezoidal_integration(
 input clk, rst_n, en,
 input [7:0] x_in, y1_in, y2_in,
 output reg [15:0] area_out,
 output reg [15:0] cum_out, // cumulative (increase width ???)
 output reg err_overflow);
 wire [15:0] area = (x_in * (y1_in+y2_in)) / 4'h2;
 always @(posedge clk) begin
 if (!rst_n) begin
 area_out <= 16'h0000;
 cum_out <= 16'h0000;
 err_overflow <= 1'b0;
 end
 else if (en) begin
 area_out <= area;
 {err_overflow,cum_out} <= cum_out + area;
 end
 end
endmodule

If you need to limit your pins or need to space out the operations, you could do something like below (untested example):

module trapezoidal_integration(
 input clk, rst_n, en,
 input [7:0] s_in,
 output reg [15:0] cum_out, // cumulative (increase width ???)
 output reg err_overflow) );
 reg [1:0] state;
 reg [16:0] tmp;
 always @(posedge clk) begin
 if (!rst_n) begin
 state <= 3'b001;
 tmp <= 17'h0_0000;
 cum_out <= 16'h0000;
 err_overflow <= 1'b0;
 end
 else begin
 case(state)
 2'b00 : begin
 tmp[7:0] <= s_in; // y1
 if (en) state <= 2'b01;
 end
 2'b01 : begin
 tmp[8:0] <= tmp[7:0] + s_in; // y1+y2
 state <= 2'b11; // gray code
 end
 2'b11 : begin
 tmp <= s_in * tmp[8:0]; // x*(y1+y2)
 state <= 2'b10; // gray code
 end
 2'b10 : begin
 {err_overflow,cum_out} <= cum_out + tmp[16:1]; // tmp[16:1] === tmp/2
 state <= 2'b00;
 end
 endcase
 end
 end
endmodule

Stack Exchange Network

Verilog implementation of trapezoidal integration method

2 Answers 2

Documentation

Naming

Set once

Layout

Simpler

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Verilog implementation of trapezoidal integration method

2 Answers 2

Documentation

Naming

Set once

Layout

Simpler

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions