1
\$\begingroup\$

I was trying to induce the following functionality into SystemVerilog but i cant think of any efficient ways:

enter image description here

So above is a picture of two 6-bit input packets that come one after the other (triggered by a clock edge). Out of that i need to deliver three 4-bit output packets.

These output packets need to be sent next after the other. So from the first input packet i need to send out 4-bit output packet, store the last 2 bits of the packet and concatenate it with the first 2 bits from the next package. Finally, the last 4-bits are then sent as one 4-bit packet.

enter image description here

N.B: This is an example. The Input packets number can go upto like 1000 and its width can change which would mean the 'residue' or left out bits would change.

So please if anyone can guide me how to approach this problem, i'll be really grateful.

Edit: I think i've confused everyone. The problem is quite complex so i didn't include everything for the sake of simplicity. Here it goes:

The module i'm trying to make has a higher input parallelism as compared to the output which means that the data going inside the module is greater than data coming out. This would mean an accumulation of more data inside the module with every transaction.

The input packets would be arriving continuously. Each input packets lasts one clock cycle. The requirement is to push output packets (smaller than input packets) continuously aswell. The output packet should also last one clock cycle. The N.B sentence i've added is related to parameterization in which i can choose an arbitrary number of input and output parallelism. This is just to inform you the 6-bit input and 4-bit output packets are just an illustration. It could clearly be 16 bits input and 10-bit output.

This would clearly be done through a state machine but i don't know how i can concatenate the bits from the previous messages with the incoming messages and maintain a continuous flow of output packets with each clock cycle. Hopefully this clears everything.

Greg
4,4881 gold badge23 silver badges32 bronze badges
asked Aug 9, 2019 at 16:17
\$\endgroup\$
12
  • \$\begingroup\$ How fast are the data clocks? What is the relationship between the clock that loads the input packets vs the clock that updates the output packets? Are input packets coming in continuously? Does the packet size change during operation or is it fixed at design time? What is receiving the packets that are output by this block? \$\endgroup\$ Commented Aug 9, 2019 at 16:22
  • \$\begingroup\$ The system is synchronous (same clock for everything). Yeah Input packets are coming continuously. Packet size is fixed during operation. \$\endgroup\$ Commented Aug 9, 2019 at 16:27
  • \$\begingroup\$ Gonna need some more concrete details. This is on the bit level? The packets are an arbitrary number of bits in length? What are the input and output interfaces? If the output is narrower than the input, is the output clock fast enough to transfer one per cycle, or do you need to transition to a faster clo k? \$\endgroup\$ Commented Aug 9, 2019 at 17:15
  • \$\begingroup\$ How can you take two packets in and write three packets out on the same clock? Your description does not make sense. \$\endgroup\$ Commented Aug 9, 2019 at 17:33
  • \$\begingroup\$ Please elaborate on the "its width can change" section. Apart from that it is a trivial problem. \$\endgroup\$ Commented Aug 9, 2019 at 20:11

1 Answer 1

3
\$\begingroup\$

What you need is a FIFO with different input and output bit widths. This can be achieved with an array using two index pointers and a register keeping track of the numbers stored bits. The pointers will wrap around. The bit with of the stored bits need to be a common multiple of the input and output widths.

Here is some SystemVerilog code to get you started. Not fully tested or optimized, and I omitted the logic for empty / full / error (overflow) for you to figure out.

always_ff @(posedge clk) begin
 if (!rst_n) begin
 in_idx <= '0; // input pointer
 out_idx <= '0; // output pointer
 bitcnt <= '0; // number of stored bits
 out_vld <= 1'b0;
 end
 else begin
 bitcnt <= next_bitcnt;
 if (in_vld) begin
 store[ in_idx +: 6] <= in;
 in_idx <= (in_idx + 6) % STORE_BITS;
 end
 out_vld <= (bitcnt >= 4 || in_vld);
 if (out_vld) begin
 out_idx <= (out_idx + 4) % STORE_BITS;
 end
 end
end
always_comb begin
 next_bitcnt = bitcnt;
 if (in_vld) begin
 next_bitcnt += 6;
 end
 if (bitcnt >= 4 || in_vld) begin
 next_bitcnt -= 4;
 end
end
assign out = store[ out_idx +: 4 ];
answered Aug 20, 2019 at 20:04
\$\endgroup\$
2
  • \$\begingroup\$ +1, also if the buffer is very large you might get better area/performance with a shift register shifting by 4b each time to avoid a big combinational mux? The 6b input could probably still use bit indexing, though the index increment would be different. \$\endgroup\$ Commented Aug 20, 2019 at 20:52
  • \$\begingroup\$ As I said, it is not optimized. Considering I suspect this is a class assignment I'm intentionally leaving certain things out. I tried to synthesize my code with Yosys (v0.3.0) on EDAplayground which couldn't optimize store[ in_idx +: 6] <= in; (generated logic for in_idx equal to 0,1,2,3,4,... even though only 0,6,12,... are reachable). Had to add a for-loop and if-condition get better synthesis results. \$\endgroup\$ Commented Aug 21, 2019 at 21:23

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.