Different Adder Implementations

Question 1

I'm putting together an ALU, that I want to synthesize on an FPGA. The carry-look-ahead adder is the one many choose to use as opposed to the ripple-carry adder. However, a thought crossed my mind. The ripple-carry adders I have put together before simply has a series of one bit full-adders connected to each other. My though is, what if I were to design a 4-bit full-adder? I'm not talking about an an adder made up of four one bit full-adders. I'm talking about a single components with 9 inputs (x3,x2,x1,x0,y3,y2,y1,y0,cin). I'm aware this would have 512 possible states (2^(9 inputs)).

What I'm wondering is:

There is obviously going to be a massive number of gates used, is it worth it?
If I were implementing all my components using NAND gates with a certain delay or all of this, how much of an improvement in speed would a see in a 32-bit using a.) 4-bit full adders b.) CLA adder c.) 1-bit full adders
Is there some other implementation of an adder I'm not aware of.
Although an adder is a very menial part of an ALU, what do most digital designers actual go for? Or do they simply use assign Sum = X+Y+cin;

Question 2

You want to count states as 2^9 not 9^2. That's 512.

Question 3

@DarenW you're right, not sure what I was thinking...

Question 4

interesting. i would suspect your custom 4-bit slice would end up resembling 4 one-bit full adders with a carry look-ahead circuit, but it might not. Here's the thing- it would certainly not need to be any more complex than that. However, the question of whether you could optimize over 4x full add + 4bit CLA is an interesting one.

Question 5

@JustJeff The main driving reason for this implementation is that in a four-bit full adder, the signals would only need to pass through two levels of gates instead of the 8 or so gates necessary for 4 one-bit full adders. It would be a 4x speed up. And I'm sure a 4-bit full adder would outperform a 4-bit CLA adder. Again, it is only two levels.

Question 6

related: Wide survey of different hardware adders, multiliers, dividers (design)

Question 7

To answer #4, at least in code targeted for synthesis, an adder will usually be coded as assign sum = x + y. This leaves the choice of how to implement the adder up to the synthesis tool. There is a cost/performance tradeoff. Absent tight performance requirements, the tool will implement a ripple carry adder, as that has the lowest cost. If there are more aggressive performance requirements, the tool will implement a more sophisticated structure, at some added cost. Another possibility for FPGA synthesis is that the adder will be mapped to a special-purpose DSP component, if available in the target device.

When maximum performance is desired, the logic will be designed by hand rather than implemented with a synthesis tool. In this case, in addition to a high-level reference model with the form sum = x + y, there would also be a lower-level description describing the individual gates or transistors (this might be done in an HDL, or in a schematic tool). This "maximum performance" scenario would almost certainly be an ASIC implementation rather than an FPGA.

To (not really) answer #3, for more than you ever wanted to know about adder architectures, I found this thesis linked from a thread on edaboard: http://www.iis.ee.ethz.ch/~zimmi/publications/adder_arch.pdf.

To answer #1 and #2, the best way to figure things like this out is to do some experiments, anything else is speculation. What you will get for the "4-bit full adder" design depends how you code it. If you code it as an adder, the tool will likely do what it would have done anyways, although it may fail to figure out that the 4-bit adders go together to form a larger adder. If you code it as a logic function, you may get something faster than the ripple-carry implementation, but you may not.

Question 8

Thank you so much. That was an excellent article you pulled. I guess I might as well forget about the 4-bit full adder. I kind of knew it was too far of a stretch. Still would be nice to have a verilog file for such a thing. Anyway, thanks for the help...

Question 9

I'd suggest replacing the phrase "If there's a minimal performance constraint..." with "Absent tight performance requirements...". Otherwise, it's unclear at first reading whether the constraints are minimal, or whether the constraints are specifying a minimum level of performance.

Question 10

What you don't want to do is implement the adder yourself out of gates. Use the features that VHDL/Verilog give you for adding numbers. Any adder you create is going to be larger and slower than anything the VHDL/Verilog compiler can do.

The reason for this is simple: FPGA's have dedicated logic in them for doing adders with a minimum amount of logic and as fast as possible. This includes special carry chain logic and routing. If you let the compiler utilize these then you'll benefit from the stuff that's already in the FPGA. In other words, just do Sum=X+Y+cin, where X and Y are multi-bit numbers.

Question 11

How do I know this for sure. Ultimately this comes done to the number of LUTs used, do you know for a fact that the Xilinx or Altera synthesizers do a better job than a true structural description. I'm sure they do an excellent job, but this is more of a curiosity for me.

Question 12

@seljuq70 There are many ways to analyze this, but ultimately it comes down to either "blindly" trusting what the compiler is doing, doing some trial and error yourself, or analyzing the compiler output to see what it's doing (a.la. Xilinx FPGA Editor). I've done enough trial and error to know that the compiler does properly use the dedicated carry chain logic. Also, it's not just a LUT usage issue but a speed issue as well. Because of the dedicated logic and routing for the carry chain, this solution can be much faster than a LUT only solution.

Question 13

It's like the situation with software compilers; twenty years ago, optimizing compilers were dubious, but today they are quite good. It stands to reason if there are optimal solutions for logic, the hardware compilers are probably using them already.

Question 14

@JustJeff Exactly! Designing good FPGA logic requires knowing what can safely be left for the compiler to do, and what we need to do manually. Unfortunately that is a moving target and requires experience to figure out.

Question 15

Write code that is easy to read (for others, or for yourself in two week's time :)

 a <= b+c;

Trust the synthesiser until it is proven that

it's not doing what you want
and you are not meeting your area, timing or power targets.

To do anything else is premature optimisation.

Then, and only then, mess around trying to improve things. But at least by this point you already have a full-coverage testbench of the "simple" option (you do have that before starting optimising, don't you? :).

Question 16

I second that your tool will probably implement addition better than you do.

As for various types of adders, check Hennessy and Patterson, IIRC 3th edition (each edition is a completely different book!).

One way to speed up addition is to use what is basically a ripple adder but NOT add completely in each step: each addition produces a sum and carry result, and the carry ripples through one stage on each addition. Very usefull for implementing multiplication.

Question 17

Hennessy and Patterson, "Computer Organization and Design: The Hardware/software Interface. "? Or Hennessy and Patterson, "Computer architecture: a quantitative approach."?

Andy Andy 1,3481 gold badge10 silver badges14 bronze badges · Accepted Answer · 2011-05-09 23:06:31Z

To answer #4, at least in code targeted for synthesis, an adder will usually be coded as assign sum = x + y. This leaves the choice of how to implement the adder up to the synthesis tool. There is a cost/performance tradeoff. Absent tight performance requirements, the tool will implement a ripple carry adder, as that has the lowest cost. If there are more aggressive performance requirements, the tool will implement a more sophisticated structure, at some added cost. Another possibility for FPGA synthesis is that the adder will be mapped to a special-purpose DSP component, if available in the target device.

When maximum performance is desired, the logic will be designed by hand rather than implemented with a synthesis tool. In this case, in addition to a high-level reference model with the form sum = x + y, there would also be a lower-level description describing the individual gates or transistors (this might be done in an HDL, or in a schematic tool). This "maximum performance" scenario would almost certainly be an ASIC implementation rather than an FPGA.

To (not really) answer #3, for more than you ever wanted to know about adder architectures, I found this thesis linked from a thread on edaboard: http://www.iis.ee.ethz.ch/~zimmi/publications/adder_arch.pdf.

To answer #1 and #2, the best way to figure things like this out is to do some experiments, anything else is speculation. What you will get for the "4-bit full adder" design depends how you code it. If you code it as an adder, the tool will likely do what it would have done anyways, although it may fail to figure out that the 4-bit adders go together to form a larger adder. If you code it as a logic function, you may get something faster than the ripple-carry implementation, but you may not.

Thank you so much. That was an excellent article you pulled. I guess I might as well forget about the 4-bit full adder. I kind of knew it was too far of a stretch. Still would be nice to have a verilog file for such a thing. Anyway, thanks for the help...
I'd suggest replacing the phrase "If there's a minimal performance constraint..." with "Absent tight performance requirements...". Otherwise, it's unclear at first reading whether the constraints are minimal, or whether the constraints are specifying a minimum level of performance.

Stack Exchange Network

Different Adder Implementations

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Different Adder Implementations

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions