LATER EDIT:
1. I've also investigated visually the Kintex7 device after implementation (i.e. interconnections, etc.) and everything looks OK - no connections that would indicate things would not be right (of course, I didn't go all the way back to the inputs, as I assume that if the 2-FIFO-based loopback works it would be the adder that is broken in some way).
Also, I came across this Xilinx Answer Record. I tried to apply the suggested fix, but didn't seem to work.
2. For the second approach (manual input on Spartan6), I realised that the problem was a faulty switch that would not properly set the input value unless care was taken. I have tested carefully again and I obtained the right value this time. However, I changed that design to instead just take the inputs from ROM/LUT and display a byte at a time, depending on the selection switches given as inputs. No clock is involved anymore. The problem with the initial Vivado-based project persists, which leads me to believe that it might be a synthesis/implementation/faulty hardware problem.
ORIGINAL MESSAGE:
I have a xillybus-based PCIe design that is meant to become more complex, but for now, going incrementally, it only reads pairs of 32-bit values from a 32-bit input FIFO connected to xillybus and adds them together, writing them to a 32-bit output FIFO connected to xillybus.
Now the problem I see is as follows:
NOTE 1: the output waveform of the behavioral simulation (RTL-level, pre-synthesis) returns correct values, it is the actual implementation on the board that seems to have one or more bits flipped (opposite to the expected value, which is the same as the output of the simulation)
NOTE 2: I, for example, use the values 0x3F800000
and 0x3F800000
(yes, twice the same value), which added bit by bit should amount to 0x7F000000
- will mention below what the (wrong) result was in each scenario:
- If I try a loopback approach (by reading from input FIFO and outputting to the output FIFO - so using 2 FIFOs -, NOT by using only one FIFO connected to both the input and output from/to xillybus) the values are returned as expected. I haven't extensively tested this, but it seems to be fine for all the values I've tested.
- If I try to run the design on a NetFPGA-1G-CML board (Kintex-7 XC7K325T-1FFG676 FPGA), I get the result
0x7E000100
(remember, the loopback test works) - If I try to run the equivalent design (adapted as to how the values are inputted - i.e. not through PCIe from the PC, but rather manually using switches, monopulse-generator-filtered push buttons and output on LEDs) on a Atlys board (Spartan-6 XC6SLX45-3CSG324), then the result is
0x6F00000
(closer, as there is only one bit wrong, but still not enough)
Note that I get no complaints as to timing constraint violations - and I've anyways tried many ways to home in on the problem (i.e. filter out some possibilities), but I didn't come to any conclusion. For the record, I even transformed the simple adder into a 2-level pipeline, to "ensure" that there is enough time for computing the result, but the result was the same (wrong).
I apparently can't run post-synthesis or post-implementation functional or timing simulation, as I get an annoying error saying
ERROR: [VRFC 10-716] formal port o of mode out cannot be associated with actual port pcie_perst_b_ls of mode in [...]
Note that I am running Vivado 2014.2 (which is a bit old, but would that really be the problem?)
Below the code for the 2 approaches (xillybus-based PCIe-driven I/O and simple physical I/O). Sorry I couldn't format it better:
Approach for the Kintex-7 PCIe-based I/O:
library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.NUMERIC_STD.ALL; entity fsm is port ( data_in : in std_logic_vector(31 downto 0); data_out : out std_logic_vector(31 downto 0); rd_en : out std_logic; in_fifo_empty : in std_logic; wr_en : out std_logic; out_fifo_full : in std_logic; clk, rst : in std_logic ); end fsm; architecture Behavioral of fsm is component core port ( operand_A_ieee, operand_B_ieee : in std_logic_vector(31 downto 0); result_ieee : out std_logic_vector(31 downto 0); clk, rst : in std_logic ); end component; -- pipeline_depth and pipeline_wr_status are used (only) for pipelined cores to assert wr_en when needed -- ('1' added to the MSB of pipeline_wr_status when the second 32-bit operand is read and therefore the -- core processing starts with valid data, so that it signals when a valid result reached the end of the core) --constant pipeline_depth : integer := 10; --signal pipeline_wr_status : std_logic_vector(pipeline_depth - 1 downto 0) := (others => '0'); type state_type is ( start, readA, waitB, addAB ); signal state, next_state: state_type; signal operand_A_ieee : std_logic_vector(31 downto 0) := (others => '0'); signal result_ieee : std_logic_vector(31 downto 0); begin core_inst: core port map ( operand_A_ieee => operand_A_ieee, operand_B_ieee => data_in, result_ieee => data_out, clk => clk, rst => rst ); -- The loopback test (remove core_inst above) works as expected - in the out FIFO the value read from the in FIFO is saved --data_out <= data_in; SL: process (clk, rst, state, next_state, data_in)--, pipeline_wr_status) begin if rising_edge(clk) then state <= next_state; if state = readA then operand_A_ieee <= data_in; end if; -- needed if pipelined core --if next_state = addAB then --pipeline_wr_status <= "1" & pipeline_wr_status(pipeline_depth-1 downto 1); --else --pipeline_wr_status <= "0" & pipeline_wr_status(pipeline_depth-1 downto 1); --end if; end if; end process; -- wr_en flag has beem moved out of the case/process below, for simplicity wr_en <= '1' when state = addAB else '0'; --wr_en <= pipeline_wr_status(0); -- TODO: add rst signal as input to the state machine CL: process(rst, state, in_fifo_empty, out_fifo_full) begin case (state) is when start => if rst = '1' then next_state <= start; rd_en <= '0'; else if in_fifo_empty = '1' then next_state <= start; rd_en <= '0'; else next_state <= readA; rd_en <= '1'; end if; end if; when readA => if rst = '1' then next_state <= start; rd_en <= '0'; else if in_fifo_empty = '1' then next_state <= waitB; rd_en <= '0'; else next_state <= addAB; rd_en <= '1'; end if; end if; when waitB => if rst = '1' then next_state <= start; rd_en <= '0'; else if in_fifo_empty = '1' then next_state <= waitB; rd_en <= '0'; else if out_fifo_full = '1' then next_state <= waitB; rd_en <= '0'; else next_state <= addAB; rd_en <= '1'; end if; end if; end if; when addAB => -- aka readB (read of B operator happens here) if rst = '1' then next_state <= start; rd_en <= '0'; else if in_fifo_empty = '1' then next_state <= start; rd_en <= '0'; else next_state <= readA; rd_en <= '1'; end if; end if; when others => next_state <= start; rd_en <= '0'; end case; end process; end Behavioral; library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.NUMERIC_STD.ALL; entity core is port ( operand_A_ieee, operand_B_ieee : in std_logic_vector(31 downto 0); result_ieee : out std_logic_vector(31 downto 0); clk, rst : in std_logic ); end core; architecture Behavioral of core is component adder port ( A, B: in std_logic_vector(31 downto 0); R: out std_logic_vector(31 downto 0) ); end component; begin addition: adder port map (operand_A_ieee, operand_B_ieee, result_ieee); end Behavioral; library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.NUMERIC_STD.ALL; entity adder is port ( A, B: in std_logic_vector(31 downto 0); R: out std_logic_vector(31 downto 0) ); end adder; architecture Behavioral of adder is begin R <= std_logic_vector(unsigned(A) + unsigned(B)); end Behavioral;
[SIMPLIFIED COMPARED TO INITIAL, BUT NOT RELEVANT ANYMORE - SEE LATER EDIT ABOVE] Approach for the Spartan-6 physical (push buttons, switches, LEDs) I/O:
library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.NUMERIC_STD.ALL; entity top is port ( sw : in std_logic_vector(1 downto 0); led : out std_logic_vector(7 downto 0) ); end top; architecture Behavioral of top is signal a : std_logic_vector(31 downto 0) := x"3f800000"; signal b : std_logic_vector(31 downto 0) := x"3f800000"; signal r : std_logic_vector(31 downto 0); begin r <= std_logic_vector(unsigned(a) + unsigned(b)); led <= r(8 * (to_integer(unsigned(sw)) + 1) - 1 downto 8 * to_integer(unsigned(sw))); end Behavioral; # onBoard SWITCHES NET "sw<0>" LOC = "A10"; # Bank = 0, Pin name = IO_L37N_GCLK12, Sch name = SW0 NET "sw<1>" LOC = "D14"; # Bank = 0, Pin name = IO_L65P_SCP3, Sch name = SW1 # onBoard Leds NET "led<0>" LOC = "U18"; # Bank = 1, Pin name = IO_L52N_M1DQ15, Sch name = LD0 NET "led<1>" LOC = "M14"; # Bank = 1, Pin name = IO_L53P, Sch name = LD1 NET "led<2>" LOC = "N14"; # Bank = 1, Pin name = IO_L53N_VREF, Sch name = LD2 NET "led<3>" LOC = "L14"; # Bank = 1, Pin name = IO_L61P, Sch name = LD3 NET "led<4>" LOC = "M13"; # Bank = 1, Pin name = IO_L61N, Sch name = LD4 NET "led<5>" LOC = "D4"; # Bank = 0, Pin name = IO_L1P_HSWAPEN_0, Sch name = HSWAP/LD5 NET "led<6>" LOC = "P16"; # Bank = 1, Pin name = IO_L74N_DOUT_BUSY_1, Sch name = LD6 NET "led<7>" LOC = "N12"; # Bank = 2, Pin name = IO_L13P_M1_2, Sch name = M1/LD7
-
\$\begingroup\$ Please update to the newest Vivado version. Please also provide a proper testbench, so that other user can reproduce your problem. \$\endgroup\$Martin Zabel– Martin Zabel2016年03月19日 09:24:14 +00:00Commented Mar 19, 2016 at 9:24
2 Answers 2
I can't say for sure this will help, but I find the "pipeline" process in the "adder" entity to be unusual. I would use an elsif on the rising_edge of clock and remove A and B from the sensitivity list.
Also, using the rising_edge of step may not be a good idea since it is not a clock signal. An alternative solution would be to do something line this:
process(clk)
begin
if rising_edge(clk) then
step_r <= step;
if step = '1' and step_r='0' then -- Finds rising edge of step
<Logic>
end if;
end if;
end process;
-
\$\begingroup\$ NOTE: I have updated the original message because I realised that for the Spartan6 design the problem was only a faulty input switch that would work only under specific physical circumstances. That part is now solved, the first one - being the important one - still remains. \$\endgroup\$BogdanSorlea– BogdanSorlea2016年03月18日 19:45:20 +00:00Commented Mar 18, 2016 at 19:45
-
\$\begingroup\$ For the record: the pipeline process's purpose was to just break up that simple addition into two stages, just in case there could have been a timing issue (result not propagating in a Carry Propagate Adder / Ripple Carry Adder in one clock cycle - that was just to eliminate this scenario). Now that process is removed and that Spartan6 design simplified and working. \$\endgroup\$BogdanSorlea– BogdanSorlea2016年03月18日 19:49:58 +00:00Commented Mar 18, 2016 at 19:49
-
\$\begingroup\$ While I agree with the elsif on the rising_edge of clock, removing A and B from the sensitivity list would have not made sense, as the value modified in the process depends on the changes to A and B, as based on their values the process-modified value changes. \$\endgroup\$BogdanSorlea– BogdanSorlea2016年03月18日 19:51:12 +00:00Commented Mar 18, 2016 at 19:51
-
\$\begingroup\$ @BogdanSorlea generally speaking if you have a synchronous process the only things that need to be in the sensitivity list are the clock and asynchronous resets. The list is only used during simulation (for most tools, including Vivado) to let the simulator know during what events the process needs to be executed, which allows for potential decreases in simulation time. Leaving A and B in the sensitivity list will not change results but is unnecessary because the only event the simulator cares about is reset and clock edges. \$\endgroup\$ks0ze– ks0ze2016年03月20日 16:04:40 +00:00Commented Mar 20, 2016 at 16:04
-
\$\begingroup\$ Are you using a first word fall through (FWFT) FIFO? If not, operand_A_ieee looks like it will have invalid data during the first addition since the FIFO output would not be valid until one clock after rd_en = 1. Also, I would recommend clocking the addition otherwise it will negatively impact timing. And I would try not using a FWFT FIFO with the "valid" output enabled at least for simulation so you can see when your data_in is valid. \$\endgroup\$ks0ze– ks0ze2016年03月20日 16:18:29 +00:00Commented Mar 20, 2016 at 16:18
I have figured out in the meantime what the problem was. Basically, it has to do with byte-wise endianness - i.e. xillybus will present the bytes in a 32-bit value in the reverse order as they were written to the device file (and flip again in the same manner when writing from the core to the host).
For more details, see: https://forums.xilinx.com/t5/Implementation/Implemented-simple-binary-adder-returns-incorrect-result/m-p/689491/highlight/false#M15227