I am trying to start few threads in SystemVerilog, as can be seen in the code below:
module tb;
int done = 0;
initial
begin
for(int i = 0; i < 3; i++)
begin
automatic int val = i;
fork
begin
$display("+ThreadA %0d", val);
done[val] = 1;
end
join_none
end
for(int i = 0; i < 3; i++)
begin
@(done[i] == 1);
end
$display("All threads done");
end
endmodule
The output from that code:
# +ThreadA 2
# +ThreadA 1
# +ThreadA 0
Why is "ThreadA 2" displayed first, while the loop should have started "ThreadA 0" before that? My expected output was
# +ThreadA 0 # +ThreadA 1 # +ThreadA 2
-
\$\begingroup\$ join would stop the execution of the parent thread till all the child threads are done, right? Threads are needed in my case as the actual code is more complex and this was just a simpler example of code to present my question. \$\endgroup\$AlaBek– AlaBek2024年10月08日 10:34:32 +00:00Commented Oct 8, 2024 at 10:34
3 Answers 3
The fork/join_none
construct schedules a thread to start in the current event region when the parent thread suspends or terminates. The way your loop executes, three child threads get scheduled before parent thread suspends. Because of section 4.7 Nondeterminism in the IEEE 1800-2023 SystemVerilog LRM, those threads can start in any order. However, for debug ability, a particular version of a tool will never change the order you see.
But even if you see a particular starting order for each thread, it's technically possible for the first $display
statement in the thread to execute in one order, and the second assignment statement to execute in a different order. That same section of the LRM says events can be executed in any order between threads. The only guarantee is the two statements inside the begin/block
execute in order within each thread. This becomes an issue if you try to execute your simulation on a multicore platform.
Note that you want to use a level sensitive wait (done[i] == 1);
not the edge sensitive @
. If you loop in the opposite order that done gets set your loop hangs.
Threads are by their essence asynchronous — you may not predict when an individual thread begins, ends, or executes a particular command.
In other words, the order of your output lines is nondeterministic.
-
\$\begingroup\$ I am not sure I understand the idea here, but "join_none" is used right after the end of the fork block, if I change the order that would mean it comes after the for loop which creates a compile error. \$\endgroup\$AlaBek– AlaBek2024年10月08日 10:04:44 +00:00Commented Oct 8, 2024 at 10:04
-
\$\begingroup\$ In your code the
join_none
command is used 3 times, because you placed it into thefor
loop. The same for yourfork
command. \$\endgroup\$MarianD– MarianD2024年10月08日 10:12:19 +00:00Commented Oct 8, 2024 at 10:12 -
\$\begingroup\$ Yes so the intention is that each iteration would spawn a thread, and that the main thread does not wait for the thread to end right after creating hence I used join_none, and at the end done is used to indicate that each thread was completed. \$\endgroup\$AlaBek– AlaBek2024年10月08日 10:14:43 +00:00Commented Oct 8, 2024 at 10:14
-
\$\begingroup\$ OK, so I shortened my answer only to express the basic about multitasking / multithreading. \$\endgroup\$MarianD– MarianD2024年10月08日 10:39:11 +00:00Commented Oct 8, 2024 at 10:39
When I try to run the code you posted in your EDA Playground link, I get errors.
But, when I copy your code into EDAPlayground and run it on a different simulator (Synopsys), I get a different result (the result you expected):
+ThreadA 0
+ThreadA 1
+ThreadA 2
This demonstrates that the result is indeterminate. According to the IEEE Std 1800-2023, a simulator may implement the order in any way it sees fit. This is why we see different results with different simulators. You may even see different results with different versions of the same simulator.
In other words, when you write code like this, you can not rely on a specific order of execution of the $display
statements.