How to design interfaces for memory hungry circuits

Question 1

I'm new to hardware design and one thing I'm struggling with is how to structure the communication between circuits (components). In VHDL you use the port map keyword on entities which works well for circuits like simple FSMs, adders, small registers, etc. But what if you want to transmit more sophisticated data, like structs or arrays whose bounds are not static? For example, suppose I'm writing a circuit for sorting integer elements in an array. As there can be millions of integers in the array I cannot have one wire for every bit in every integer. So I need to create a serial interface for transmitting the array elements into and out of the sort circuit. How do I design that? It got to have some control signals so that it can, for example, tell the VHDL code using the sort circuit when the sort is done.

Or maybe this line of thinking is wrong and the sort circuit should operate on data stored in an external memory circuit instead?

In software this would be simple and you would just pass a pointer to the array and an integer denoting the size to a sort function. But I can't come up with what "the equivalent" VHDL entity definition would be.

Question 2

Not sure what the problem is. You just transmit data in chunks. One clock, one piece of data.

Question 3

For example, how large is the chunk? If it is larger than the size of an integer then you would need a method for packing and unpacking it, wouldn't you?

Question 4

This is a question of trade-offs. The bigger the chunk, the faster is the transfer, but more hardware is required.

Question 5

Asking for a VHDL equivalent of a pointer is like asking for a C equivalent of a PLL. It just doesn't exist and you have to solve the problem a different way.

Question 6

For sorting numbers you will need memory. Memories have well-established interfaces, which you can look at to get ideas.

Question 7

You have an "external" memory interface either way.

The array in VHDL is referenced by an index, and you pass the index you want to read or write through a signal, and pass data back and forth through a signal as well.

If the FPGA you are targeting has an embedded RAM component and the access patterns you use can be mapped on that, synthesis will already try to use block RAM, because that takes up significantly fewer resources. From the point of view of the logic element blocks in the fabric, that is an external interface with an address bus and data buses in both directions.

If you make this external memory interface explicit in the code, you can then decide whether you want to connect internal block RAM, or an external RAM -- you'd then pass the address and data lines through a port.

For larger applications, you'd use an FPGA with an integrated DDR controller, and your design's memory access port would be connected internally to this controller, which then in turn generates the necessary signals to drive the external RAM chips, which (with mid-range to high-end FPGAs) can be on normal DIMMs.

Question 8

Do you have any examples or links to guidelines so I can read more about this? Suppose I have multiple data processing units (e.g multiple sort units), wouldn't it increase latency and possibly cause contention if they only have indirect access to the data via a memory interface? I feel like I'm reinventing a wheel here because there must be some "standard patterns" I can use for my circuit designs.

Question 9

@BjörnLindqvist I don't think there is a standard pattern. You do the thing that works for you. You probably care about performance more than you care about reuse because you only have a limited number of gates.

Question 10

Memory access has a latency-vs-speed-vs-size tradeoff. Combinatorial memory gives results in the same cycle, but consumes logic space (so it's limited to a few hundred bytes) and reduces maximum frequency, embedded block RAM gives results in the next cycle, and you have a few kilobytes to a megabyte of it, and DDR has mega- to gigabytes, but a few cycles latency for a command to the active row, and more latency to open and close rows. A typical pattern would be external DDR and cache in block RAM (possibly per-unit L1 and shared L2).

Simon Richter Simon Richter 13.4k1 gold badge28 silver badges53 bronze badges · Answer 1 · 2022-06-13 18:41:11Z

You have an "external" memory interface either way.

The array in VHDL is referenced by an index, and you pass the index you want to read or write through a signal, and pass data back and forth through a signal as well.

If the FPGA you are targeting has an embedded RAM component and the access patterns you use can be mapped on that, synthesis will already try to use block RAM, because that takes up significantly fewer resources. From the point of view of the logic element blocks in the fabric, that is an external interface with an address bus and data buses in both directions.

If you make this external memory interface explicit in the code, you can then decide whether you want to connect internal block RAM, or an external RAM -- you'd then pass the address and data lines through a port.

For larger applications, you'd use an FPGA with an integrated DDR controller, and your design's memory access port would be connected internally to this controller, which then in turn generates the necessary signals to drive the external RAM chips, which (with mid-range to high-end FPGAs) can be on normal DIMMs.

Do you have any examples or links to guidelines so I can read more about this? Suppose I have multiple data processing units (e.g multiple sort units), wouldn't it increase latency and possibly cause contention if they only have indirect access to the data via a memory interface? I feel like I'm reinventing a wheel here because there must be some "standard patterns" I can use for my circuit designs.
@BjörnLindqvist I don't think there is a standard pattern. You do the thing that works for you. You probably care about performance more than you care about reuse because you only have a limited number of gates.
Memory access has a latency-vs-speed-vs-size tradeoff. Combinatorial memory gives results in the same cycle, but consumes logic space (so it's limited to a few hundred bytes) and reduces maximum frequency, embedded block RAM gives results in the next cycle, and you have a few kilobytes to a megabyte of it, and DDR has mega- to gigabytes, but a few cycles latency for a command to the active row, and more latency to open and close rows. A typical pattern would be external DDR and cache in block RAM (possibly per-unit L1 and shared L2).

Stack Exchange Network

How to design interfaces for memory hungry circuits

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

How to design interfaces for memory hungry circuits

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions