7
\$\begingroup\$

I'm putting together an ALU, that I want to synthesize on an FPGA. The carry-look-ahead adder is the one many choose to use as opposed to the ripple-carry adder. However, a thought crossed my mind. The ripple-carry adders I have put together before simply has a series of one bit full-adders connected to each other. My though is, what if I were to design a 4-bit full-adder? I'm not talking about an an adder made up of four one bit full-adders. I'm talking about a single components with 9 inputs (x3,x2,x1,x0,y3,y2,y1,y0,cin). I'm aware this would have 512 possible states (2^(9 inputs)).

What I'm wondering is:

  1. There is obviously going to be a massive number of gates used, is it worth it?
  2. If I were implementing all my components using NAND gates with a certain delay or all of this, how much of an improvement in speed would a see in a 32-bit using a.) 4-bit full adders b.) CLA adder c.) 1-bit full adders
  3. Is there some other implementation of an adder I'm not aware of.
  4. Although an adder is a very menial part of an ALU, what do most digital designers actual go for? Or do they simply use assign Sum = X+Y+cin;
Connor Wolf
32.7k6 gold badges83 silver badges143 bronze badges
asked May 9, 2011 at 21:11
\$\endgroup\$
5
  • 3
    \$\begingroup\$ You want to count states as 2^9 not 9^2. That's 512. \$\endgroup\$ Commented May 9, 2011 at 21:16
  • \$\begingroup\$ @DarenW you're right, not sure what I was thinking... \$\endgroup\$ Commented May 9, 2011 at 21:20
  • \$\begingroup\$ interesting. i would suspect your custom 4-bit slice would end up resembling 4 one-bit full adders with a carry look-ahead circuit, but it might not. Here's the thing- it would certainly not need to be any more complex than that. However, the question of whether you could optimize over 4x full add + 4bit CLA is an interesting one. \$\endgroup\$ Commented May 9, 2011 at 21:26
  • \$\begingroup\$ @JustJeff The main driving reason for this implementation is that in a four-bit full adder, the signals would only need to pass through two levels of gates instead of the 8 or so gates necessary for 4 one-bit full adders. It would be a 4x speed up. And I'm sure a 4-bit full adder would outperform a 4-bit CLA adder. Again, it is only two levels. \$\endgroup\$ Commented May 9, 2011 at 22:36
  • \$\begingroup\$ related: Wide survey of different hardware adders, multiliers, dividers (design) \$\endgroup\$ Commented Jun 18, 2012 at 19:20

4 Answers 4

4
\$\begingroup\$

To answer #4, at least in code targeted for synthesis, an adder will usually be coded as assign sum = x + y. This leaves the choice of how to implement the adder up to the synthesis tool. There is a cost/performance tradeoff. Absent tight performance requirements, the tool will implement a ripple carry adder, as that has the lowest cost. If there are more aggressive performance requirements, the tool will implement a more sophisticated structure, at some added cost. Another possibility for FPGA synthesis is that the adder will be mapped to a special-purpose DSP component, if available in the target device.

When maximum performance is desired, the logic will be designed by hand rather than implemented with a synthesis tool. In this case, in addition to a high-level reference model with the form sum = x + y, there would also be a lower-level description describing the individual gates or transistors (this might be done in an HDL, or in a schematic tool). This "maximum performance" scenario would almost certainly be an ASIC implementation rather than an FPGA.

To (not really) answer #3, for more than you ever wanted to know about adder architectures, I found this thesis linked from a thread on edaboard: http://www.iis.ee.ethz.ch/~zimmi/publications/adder_arch.pdf.

To answer #1 and #2, the best way to figure things like this out is to do some experiments, anything else is speculation. What you will get for the "4-bit full adder" design depends how you code it. If you code it as an adder, the tool will likely do what it would have done anyways, although it may fail to figure out that the 4-bit adders go together to form a larger adder. If you code it as a logic function, you may get something faster than the ripple-carry implementation, but you may not.

answered May 9, 2011 at 23:06
\$\endgroup\$
2
  • \$\begingroup\$ Thank you so much. That was an excellent article you pulled. I guess I might as well forget about the 4-bit full adder. I kind of knew it was too far of a stretch. Still would be nice to have a verilog file for such a thing. Anyway, thanks for the help... \$\endgroup\$ Commented May 10, 2011 at 2:13
  • \$\begingroup\$ I'd suggest replacing the phrase "If there's a minimal performance constraint..." with "Absent tight performance requirements...". Otherwise, it's unclear at first reading whether the constraints are minimal, or whether the constraints are specifying a minimum level of performance. \$\endgroup\$ Commented Aug 25, 2011 at 14:54
6
\$\begingroup\$

What you don't want to do is implement the adder yourself out of gates. Use the features that VHDL/Verilog give you for adding numbers. Any adder you create is going to be larger and slower than anything the VHDL/Verilog compiler can do.

The reason for this is simple: FPGA's have dedicated logic in them for doing adders with a minimum amount of logic and as fast as possible. This includes special carry chain logic and routing. If you let the compiler utilize these then you'll benefit from the stuff that's already in the FPGA. In other words, just do Sum=X+Y+cin, where X and Y are multi-bit numbers.

answered May 9, 2011 at 22:06
\$\endgroup\$
4
  • \$\begingroup\$ How do I know this for sure. Ultimately this comes done to the number of LUTs used, do you know for a fact that the Xilinx or Altera synthesizers do a better job than a true structural description. I'm sure they do an excellent job, but this is more of a curiosity for me. \$\endgroup\$ Commented May 9, 2011 at 22:37
  • \$\begingroup\$ @seljuq70 There are many ways to analyze this, but ultimately it comes down to either "blindly" trusting what the compiler is doing, doing some trial and error yourself, or analyzing the compiler output to see what it's doing (a.la. Xilinx FPGA Editor). I've done enough trial and error to know that the compiler does properly use the dedicated carry chain logic. Also, it's not just a LUT usage issue but a speed issue as well. Because of the dedicated logic and routing for the carry chain, this solution can be much faster than a LUT only solution. \$\endgroup\$ Commented May 9, 2011 at 22:50
  • 1
    \$\begingroup\$ It's like the situation with software compilers; twenty years ago, optimizing compilers were dubious, but today they are quite good. It stands to reason if there are optimal solutions for logic, the hardware compilers are probably using them already. \$\endgroup\$ Commented May 10, 2011 at 0:28
  • \$\begingroup\$ @JustJeff Exactly! Designing good FPGA logic requires knowing what can safely be left for the compiler to do, and what we need to do manually. Unfortunately that is a moving target and requires experience to figure out. \$\endgroup\$ Commented May 10, 2011 at 0:47
4
\$\begingroup\$

Write code that is easy to read (for others, or for yourself in two week's time :)

 a <= b+c;

Trust the synthesiser until it is proven that

  • it's not doing what you want
  • and you are not meeting your area, timing or power targets.

To do anything else is premature optimisation.

Then, and only then, mess around trying to improve things. But at least by this point you already have a full-coverage testbench of the "simple" option (you do have that before starting optimising, don't you? :).

answered May 10, 2011 at 14:39
\$\endgroup\$
0
\$\begingroup\$

I second that your tool will probably implement addition better than you do.

As for various types of adders, check Hennessy and Patterson, IIRC 3th edition (each edition is a completely different book!).

One way to speed up addition is to use what is basically a ripple adder but NOT add completely in each step: each addition produces a sum and carry result, and the carry ripples through one stage on each addition. Very usefull for implementing multiplication.

answered Aug 24, 2011 at 17:46
\$\endgroup\$
1
  • \$\begingroup\$ Hennessy and Patterson, "Computer Organization and Design: The Hardware/software Interface. "? Or Hennessy and Patterson, "Computer architecture: a quantitative approach."? \$\endgroup\$ Commented Feb 26, 2015 at 5:01

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.