Update : I never got Newton-Raphson to work. I would suggest trying this with either Binary Search or a digit-by-digit approximation if you are facing similar issues.
I am trying to implement a cube rooter in Verilog which takes in a 32-bit signed/unsigned integer and outputs the answer in 16.16 format. I have been using the Newton-Raphson algorithm to calculate the values.
The Problem:
Initially all my output bits were stuck on 'X' (the Verilog "unknown" value). After adding an initial
block they start at all 0, go to X, come back to 0 then go back to X.
I have also messed around with various LLMs hoping to get a fix or alternate implementations, but they all seem to give either all bits as 0 or X as the output.
Edit:
I have updated the code as per @toolic 's advice, adding an intermediate 64 bit register to store the squared values. I have also changed the variable name of reg 'x' to 'val' now.
The output values are no longer x or 0, but seem to be progressing in a 167X pattern, regardless of the input integers in the testbench.
Edit- Updated Module Code:
module cube_rooter (
input wire clk,
input wire clk_en,
input wire signed [31:0] num_in,
output reg signed [31:0] root_out
);
reg signed [31:0] val;
reg signed [63:0] val_sq;
reg signed [31:0] num;
reg [2:0] iter;
localparam signed [31:0] TWO = 32'h00020000;
localparam signed [31:0] THREE = 32'h00030000;
localparam [2:0] MAX_ITER = 3'd7;
initial begin
num = 0;
val = 0;
val_sq = 0;
iter = 0;
root_out = 0;
end
always @(posedge clk) begin
if (clk_en) begin
if (iter == 0) begin
num <= num_in;
val <= (num_in >>> 1) + (1 << 16);
iter <= MAX_ITER;
end
else begin
if (val != 0) begin
val_sq = val*val;
//val_sq <= (val_sq) >>> 16;
val <= ((TWO * val) + (num / val_sq)) / THREE;
end
iter <= iter - 1;
end
end
end
always @(posedge clk) begin
if (clk_en && iter == 1) begin
root_out <= val;
end
end
endmodule
TestBench Code:
module cube_rooter_tb;
reg clk;
reg clk_en;
reg signed [31:0] num_in;
wire signed [31:0] root_out;
cube_rooter uut (
.clk(clk),
.clk_en(clk_en),
.num_in(num_in),
.root_out(root_out)
);
// Clock generation
initial begin
clk = 0;
forever #5 clk = ~clk;
end
initial begin
clk_en = 0;
num_in = 0;
#10 clk_en = 1; num_in = 32'h001B0000;
#80 clk_en = 0;
#20 clk_en = 1; num_in = 32'h00400000;
#80 clk_en = 0;
#20 clk_en = 1; num_in = 32'h007D0000;
#80 clk_en = 0;
#20 clk_en = 1; num_in = 32'h03E80000;
#80 clk_en = 0;
#100 $finish;
end
initial begin
$monitor("Time=%d, num_in=%h, root_out=%h", $time, num_in, root_out);
end
endmodule
Edit- New Timing Graph
Any help or advice would be greatly appreciated.
1 Answer 1
The cause of the unknown (X
) is the following expression:
(num / ((x * x) >>> 16)))
The simulator evaluates (x * x)
as 0, resulting in a divide-by-0 situation. From IEEE Std 1800-2023 section 11.4.3 Arithmetic operators:
For the division or modulus operators, if the second operand is a zero, then the entire result value shall be x.
Your waveforms do not show the signal named x
, but when I run the simulation, x
becomes 32'h_0021_0000
when num_in
becomes 32'h_0040_0000
.
num
and x
are 32-bit signals, which means x * x
is first evaluated as a 64-bit quantity because it is a self-determined expression. The intermediate result is:
32'h_0021_0000 * 32'h_0021_0000 = 64'h0000_0441_0000_0000
However, the full statement is:
x <= ((TWO * x) + (num / ((x * x) >>> 16))) / THREE;
All terms in the statement are 32-bit. Since the left-hand side of the statement is 32 bits, the right-hand side is now evaluated as a context-determined expression. Therefore, the intermediate 64-bit result is truncated, leaving the 32 LSBs, which are 32'h0000_0000
. Since (x * x)
is 0, the divide-by-0 results in the unknown value.
You need a different way to calculate this.
An unrelated observation is that x
is not a good name for a signal because it is easily confused with the Verilog unknown value x
. I suggest choosing a more meaningful signal name; perhaps that will also help others understand your code better, leading to a solution.
-
4\$\begingroup\$ A good way to know for sure is to replace that line with something far safer. I'm not a Verilog wizard, but I'm guessing that
x <= 32'hDEADBEEF;
would fit the bill. If your answers are full of dead beef but no unknowns, then you've either found the problem line, or at least found something that's happening after the problem. \$\endgroup\$TimWescott– TimWescott2025年02月16日 20:47:56 +00:00Commented Feb 16 at 20:47 -
\$\begingroup\$ @toolic I have replaced the variable 'x' with 'val' and added a 64 bit reg which stores the squared value of 'val'. This has fixed the output bits all being x problem and it generates values now but they seem to be incorrect. \$\endgroup\$Its_RT– Its_RT2025年02月17日 04:12:27 +00:00Commented Feb 17 at 4:12
-
\$\begingroup\$ @TimWescott adding the
x <= 32'hDEADBEEF;
gave the finalroot_out
output as32'hDEADBEEF
as well, confirming the problem. \$\endgroup\$Its_RT– Its_RT2025年02月17日 04:15:17 +00:00Commented Feb 17 at 4:15 -
\$\begingroup\$ Hey, welcome to debugging. If you beat your head against that brick wall long enough, eventually the wall will give way. And your head will feel so much better when you stop. \$\endgroup\$TimWescott– TimWescott2025年02月17日 05:04:21 +00:00Commented Feb 17 at 5:04
Explore related questions
See similar questions with these tags.