I am trying to create an output layer classifier for a neural network that is implemented on FPGA (in VHDL). The classifier should simply return the array index that contains the largest std_logic_vector
from output_counters
array. Currently the design works simply like in the code snippet below but would like the it to be configurable by generics like the rest of the network. Instead of having two output classes (0 and 1 in this case below), there could be up to 10 output classes. This would be handled by the synthesis tools without the designer having to modify the RTL code. Ideally a flip flop is used for each output class and the classifier will write a '1' to the winner class flip flop.
--- output layer basic classifier -----------------------------------------------------------------------------
class_0_winner <= '1' when output_counters(0) > output_counters(1) else '0';
class_1_winner <= '1' when output_counters(1) > output_counters(0) else '0';
Process (clk_i)
begin
if (rising_edge(clk_i)) then
if (rstn_i = '0' or classifier_regs_rst = '1') then
class_0 <= '0';
class_1 <= '0';
elsif (classifier_regs_en = '1') then
class_0 <= class_0_winner;
class_1 <= class_1_winner;
end if;
end if;
end Process;
class_0_o <= class_0;
class_1_o <= class_1;
---------------------------------------------------------------------------------------------------------------
I created the function shown below to return the array index of the largest std_logic_vector
. The array index can then be used to write to the corresponding output class flip flop. However if two std_logic_vectors
are equal, the classifier will show the first vector to be the highest and will lead to an incorrect classification.
--- finding largest vector ------------------------------------------------------------------------------------
Process(clk_100MHz, output_counters)
variable max_slv_temp : std_logic_vector(3 downto 0);
variable max_index_temp : std_logic_vector(3 downto 0);
begin
max_slv_temp := "0000";
if (rising_edge(clk_100MHz)) then
for i in 0 to output_counters'length-1 loop
if (rstn_i = '0') then
max_slv_temp := "0000";
max_index_temp := "0000";
elsif (output_counters(i) > max_slv_temp) then
max_slv_temp := output_counters(i);
max_index_temp := std_logic_vector(to_unsigned(i, max_index_temp 'length));
end if;
end loop;
end if;
max_index <= max_index_temp;
end Process;
---------------------------------------------------------------------------------------------------------------
I could also implement a FSM and comparator which can loop through the array to solve this problem as time is not so much of an issue here (100 cycles would be fine for example), and this would save resources, but would like a solution that does not involve this. How would you folks implement this classifier as a generic-able approach? (By the way, I have the generics already working and there is one for num_outputs
).
1 Answer 1
I would implement it as a clocked process. You said, you do not want to use a FSM, so my implementation is running forever. This is my code:
library ieee;
use ieee.std_logic_1164.all;
package find_largest_package is
type t_std_logic_vector_array is array (natural range <>) of std_logic_vector;
end package;
library ieee;
use ieee.std_logic_1164.all;
library work;
use work.find_largest_package.all;
entity find_largest is
generic (
constant g_num_outputs : natural := 10;
constant g_data_width : natural := 16
);
port (
clk_i : in std_logic;
output_counters_i : in t_std_logic_vector_array(g_num_outputs-1 downto 0)(g_data_width-1 downto 0);
res_i : in std_logic;
class_vector_o : out std_logic_vector(g_num_outputs-1 downto 0)
);
end entity find_largest;
architecture struct of find_largest is
signal index : natural;
signal index_maximum : natural;
signal maximum : std_logic_vector(g_data_width-1 downto 0);
begin
process(res_i,clk_i)
begin
if res_i='1' then
class_vector_o <= (others => '0');
maximum <= (others => '0');
index <= 0;
index_maximum <= 0;
elsif rising_edge(clk_i) then
if index<g_num_outputs then
index <= index + 1;
if output_counters_i(index)>maximum then
index_maximum <= index;
maximum <= output_counters_i(index);
end if;
else
class_vector_o <= (others => '0');
class_vector_o(index_maximum) <= '1';
index <= 0;
maximum <= (others => '0');
end if;
end if;
end process;
end architecture;
-
\$\begingroup\$ Thanks for the answer. Would you implement this with an FSM if you had the choice? \$\endgroup\$David777– David7772024年06月25日 15:49:29 +00:00Commented Jun 25, 2024 at 15:49
-
1\$\begingroup\$ If there is already a signal, which can be used to start the FSM, then yes: With a FSM it is easier to verify, because then there would be a ready signal of the FSM which could be used to trigger the check of the result. And of course only running the comparison when it is really needed, is a better solution than running it at all time. \$\endgroup\$Matthias Schweikart– Matthias Schweikart2024年06月25日 16:13:44 +00:00Commented Jun 25, 2024 at 16:13
-
\$\begingroup\$ Thanks, an FSM can be easily integrated so considering implementing it instead. \$\endgroup\$David777– David7772024年06月25日 16:23:32 +00:00Commented Jun 25, 2024 at 16:23