0
\$\begingroup\$

I have tried a few ways using for loops to find the least number in array, but having a hard time in updating the pointer whenever a new least value is encountered. I am following the below textbook approach :

  • Start from the beginning of the array and sweep through the entire range
  • If first element is lesser than a number at any index, store this number in a temporary register and log the index
  • The logged index becomes the new index
  • Repeat this until the array is exhausted

I tried to implement this as follows :

module min_in_array (
 input logic clk,
 input logic rstn,
 input logic [3:0][7:0] data_in,
 output logic [7:0] data_min
 );
logic [7:0] data_min_temp;
logic [3:0] new_min_pointer;
integer i,j;
logic [3:0] new_min_pointer_ff;
always_ff @ (posedge clk or negedge rstn) begin
 if (~rstn) begin 
 data_min_temp <= 0; 
 new_min_pointer <= 0;
 end
 else begin
 for (j = 0 ; j < 4 ; j = j+1) begin
 for (i = new_min_pointer_ff; i < 4 ; i = i+1) begin
 if (data_in[i] < data_in[j])
 data_min_temp <= data_in[i] ; 
 new_min_pointer <= i;
 end
 end
 end
end
always_ff @ (posedge clk or negedge rstn) begin
 if(~rstn) begin
 data_min <= 'b0;
 new_min_pointer_ff <= 'b0;
 end
 else begin
 data_min <= data_min_temp;
 new_min_pointer_ff <= new_min_pointer; 
 end
end
endmodule

Where am I going wrong?

The comparison happens in combo logic and pointer update in sequential. Based on the testbench that I have written, this code stops executing after finding the first least element. For example, if array = {17,26,15,14} , the code gives output 15.

asked Jul 4, 2024 at 16:51
\$\endgroup\$
2
  • \$\begingroup\$ Are you trying to make something that gives an output in a single clock cycle? Or something that will take multiple cycles to perform a search? \$\endgroup\$ Commented Jul 4, 2024 at 21:06
  • \$\begingroup\$ I suggest you come up with a circuit/architecture diagram first and then write the Verilog code. To get you started here are some architectures to sort data in hardware: digitalsystemdesign.in/sorting-architectures \$\endgroup\$ Commented Jul 5, 2024 at 4:33

2 Answers 2

1
\$\begingroup\$

For loops in Verilog are unrolled - in other words every iteration of the loop corresponds to some piece of hardware.

As a result you cannot have a synthesisable loop with variable number of iterations. To do so would essentially require hardware to magically appear and disappear - some clock cycles you would have more calculations than others which means the amount of logic changes.

Instead you have to disable the hardware that is unused in some iterations. For example, instead of this (pseudo code):

for (i = someVar; i < 4 ; i = i+1) begin
 someOutput = someValue
end

You would need this:

for (i = 0; i < 4 ; i = i+1) begin
 if (i >= someVar) begin
 someOutput = someValue
 end
end

This approach works as there are is a fixed amount of hardware, but for iterations where the value of i less than someVar, the hardware is disabled - bypassed by a multiplexer controlled by the i >= someVar calculation.

To see this in action, consider this full code

wire [3:0] someVal [3:0];
wire [2:0] startValue;
wire [5:0] result;
always @ (posedge clock) begin
 result = 0; // Initial value (e.g. if startValue > 4)
 for (i = 0; i < 4; i = i + 1) begin
 if (i >= startValue) begin
 result = result + someVal[i];
 end
 end
end

This will infer something like the following (probably optimised):

Multiplexer chain with add operation at each step

Notice how the there is the same hardware for each loop (an adder), but it is enabled or disabled (bypassed) using a conditional multiplexer. Notice also that this unrolling is done within the always block, which means it all happens in the same clock cycle - there is only one register inferred at the end of the calculation.


I would advise for your calculation to first think about what sort of hardware you actually need.

  • Which calculations with which numbers do you want to perform.
  • Can you split it into smaller subsections (partial calculations)
  • How many clock cycles do you want the calculation to take - more cycles = less logic per cycle.

Once you have an idea of the hardware you require, then start thinking about how to implement it. Perhaps start with a fully unrolled design for a specific size of input. Then see what commonalities there are and whether they can be better achieved using loops.

If it were me, I would consider a binary tree structure - pair the inputs off, find the smallest from each pair. Pair the results off and find the smallest. Repeat until you have one number left. You can store the index found at each level to track which index the minimum value was found at. It's very easy to generate a binary tree structure using a for loop.


Remember, Verilog is used to describe hardware (HDL), the process is very much not like writing software.

answered Jul 4, 2024 at 21:28
\$\endgroup\$
0
\$\begingroup\$

In general, when you want to apply an operation across an array of data that is presented all at once (in parallel), you should be thinking in terms of "binary tree" instead of "iterative loop".

Here's some example code. There are two modules. The first recursively constructs enough layers in the tree to handle the specified number of inputs \$N\$. The second module creates the required number of 2-input, 1-output processing blocks for each layer. In my case, I needed to add up a bunch of numbers, but you can substitute any associative binary operation for the addition, including your "minimum" function.

One key advantage to this approach is that it allows you to pipeline the whole operation for faster clock speeds, while keeping the overall latency to \$\log_2(N)\$ stages.

/* adder_tree.sv */
/* Generate a pipelined tree of adders for N input values.
 * Note that this works correctly even if N is 1, generating a simple pipeline
 * register and nothing else.
 */
module adder_tree #(
 parameter N = 32, /* number of inputs */
 parameter DW = 33, /* input data width in bits */
 parameter RW = DW + $clog2(N) /* output data width in bits */
) (
 input clock,
 input clock_en,
 input signed [DW-1:0] data [N-1:0],
 output signed [RW-1:0] result
);
 localparam SW = (N>1) ? DW + 1 : DW; /* sum data width in bits */
 localparam SUMS = (N+1)/2; /* number of sums */
 wire signed [SW-1:0] res [SUMS - 1:0];
 add_pairs #(
 .N (N),
 .DW (DW),
 .RW (SW)
 ) add_pairs_inst (
 .clock (clock),
 .clock_en (clock_en),
 .data (data),
 .result (res)
 );
 if (SUMS > 1) begin
 adder_tree #(
 .N (SUMS),
 .DW (SW)
 ) adder_tree_inst (
 .clock (clock),
 .clock_en (clock_en),
 .data (res),
 .result (result)
 );
 end else begin
 assign result = res;
 end
endmodule :adder_tree
/* Add pairs of input values together.
 * If there's an odd number of inputs (including just 1), pass the last input
 * directly through as the last output.
 */
module add_pairs #(
 parameter N = 32,
 parameter DW = 18,
 parameter RW = DW + 1,
 parameter SUMS = (N+1)/2
) (
 input clock,
 input clock_en,
 input signed [DW-1:0] data [N - 1:0],
 output signed [RW-1:0] result [SUMS - 1:0]
);
 genvar i;
 reg [RW-1:0] res [SUMS - 1:0]
 for (i = 0; i < N/2; i++) begin
 always_ff @(posedge clock) if (clock_en) res[i] <= data[2*i+1] + data[2*i];
 end
 /* Create a register for the odd input
 */
 if (SUMS == N/2 + 1) begin
 always_ff @(posedge clock) if (clock_en) res[SUMS-1] <= data [N-1];
 end
 assign result = res;
endmodule :add_pairs
answered Jul 4, 2024 at 23:03
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.