I am trying to implement a TPU like SoC and seem to have a bug in one of my modules.
Here is the code for that module:
`timescale 1ns / 1ps
module acc #(
parameter data_size = 8,
parameter array_size = 3
) (
input clk,
input en,
input wire [0:(data_size * array_size)-1] from_array, // all lines of the systolic array
output wire [0:data_size-1] conv // The final result on a discrete convolution
);
// The shift register size is always ((array_size-1)*2 + 1)*d where array_size is the size of the kernel, and d the data size
localparam shift_reg_size = (((array_size - 1) * 2 + 1) * data_size);
// The size of cpt is proportionate to the shift register size
reg [0:$clog2((array_size - 1) * 2 + 1)-1] cpt = 0;
reg [0:shift_reg_size - 1] shift = 0;
// The last cell of the shift register is always the result of the convolution stage.
assign conv = shift[shift_reg_size - data_size : shift_reg_size - 1];
// Generate the shift adder logic
genvar i;
generate
for (i = 1; i < array_size; i = i + 1) begin : shift_gen
always @(negedge clk) begin : shift_gen_level
localparam j = ((i-1)*2+1);
if (en && cpt > j) begin
shift[j*data_size:(j+1)*data_size-1] <= shift[j*data_size:(j+1)*data_size-1] + from_array[i*data_size:(i+1)*data_size-1];
end
end
end
endgenerate
always @(posedge clk) begin
if (en) begin
shift <= {from_array[0:data_size-1],shift[0:shift_reg_size - data_size - 1]};
// Stop incrementing when have reached the top
cpt <= cpt == {($clog2((array_size - 1) * 2 + 1)-1){1'b1}} ? cpt : cpt + 1;
end
end
endmodule
Let me try to explain the general functionality of this module without going into too much detail, as it may help to understand the issue. The shift register will receive the first data_size bits from the from_array input on every posedge. And of course the value stored in the last shift register cell will be overwritten. The counter cpt
will also increment, this counter allows for some additional logic to be performed during the negedge. On the negedge if the cpt is large enough, the value currently stored at certain cells of the shift register will get incremented by incoming values from the from_array.
One last thing is that the last data_size bits of the shift register are directly connected (via the assign), to the conv output.
The multidriven net error occurs on the generated portion of the code, so the adder logic of the design.
To complete the picture, I must add that the simulation works correctly and gives the desired functionality,
One idea I had is that happens because of the fact that I do the assign to the conv as well as distinct operations on the shift register during posedge and negedge. But that does not make much sense as I have used a similar design pattern elsewhere and it worked like a charm!
I have also changed my logic to work on the posedge only. I also added a state machine to alternate between the shift_state and the add_state. However I still seem to get the multi-driven net between my two assignments to shift. I don not understand why, because clearly both of those assignments are differed by the state register. Could someone give me a clue on what am I doing wrong?
`timescale 1ns / 1ps
module acc #(
parameter data_size = 8,
parameter array_size = 3
) (
input clk,
input en,
input wire [0:(data_size * array_size)-1] from_array, // all lines of the systolic array
output wire [0:data_size-1] conv // The final result on a discrete convolution
);
// The shift register size is always ((array_size-1)*2 + 1)*d where array_size is the size of the kernel, and d the data size
localparam shift_reg_size = (((array_size - 1) * 2 + 1) * data_size);
// The size of cpt is proportionate to the shift register size
reg [0:$clog2(((array_size - 1) * 2 + 1)*2)-1] cpt = 0;
reg [0:shift_reg_size - 1] shift = 0;
// Memorise
reg [0:(data_size * array_size)-1] shift_mem = 0;
reg [0:data_size-1] conv_ = 0;
// The last cell of the shift register is always the result of the convolution stage.
assign conv = conv_;
// State machine
localparam add_state = 0;
localparam shift_state = 1;
reg state = shift_state;
// Generate the shift adder logic
genvar i;
generate
for (i = 1; i < array_size; i = i + 1) begin : shift_gen
always @(posedge clk) begin : shift_gen_level
localparam j = ((i-1)*2+1);
if (en && cpt/2 > j && state == add_state && state != shift_state) begin
shift[j*data_size:(j+1)*data_size-1] <= shift[j*data_size:(j+1)*data_size-1] + shift_mem[i*data_size:(i+1)*data_size-1];
end
end
end
endgenerate
always @(posedge clk) begin
if (en) begin
if (state == shift_state && state != add_state) begin
state <= add_state;
end
else if (state == add_state && state != shift_state) begin
state <= shift_state;
end
if (state == shift_state && state != add_state) begin
shift <= {shift_mem[0:data_size-1],shift[0:shift_reg_size - data_size - 1]};
end
// Stop incrementing when have reached the top
cpt <= cpt == {{1'b1},{($clog2(((array_size - 1) * 2 + 1)*2)-1){1'b0}}} ? cpt : cpt + 1;
shift_mem <= from_array;
conv_ <= shift[shift_reg_size - data_size : shift_reg_size - 1];
end
end
endmodule
I would appreciate if someone could tell me where my error comes from.
1 Answer 1
You make assignments to shift
from 2 always
blocks. That is considered a multidriven net by the tool (Vivado/Xilinx). It does not matter to the tool that you are making assignments on the opposite edges of the clock.
1st always
block:
always @(negedge clk) begin : shift_gen_level
...
shift[j*data_size:(j+1)*data_size-1] <= shift[j*data_size:(j+1)*data_size-1] + from_array[i*data_size:(i+1)*data_size-1];
2nd always
block:
always @(posedge clk) begin
...
shift <= {shift_mem[0:data_size-1],shift[0:shift_reg_size - data_size - 1]};
You should only make assignments to a signal from a single always
block. You need a new design approach.