9
\$\begingroup\$

I have very little experience with C, so I decided to code a simple numerical program.

An Armstrong number with \$N\$ digits is a number that is equal to the sum of its digits to the \$N^{th}\$ power, in pseudo-code:

sum { power(digit, N) for digit in number } == number

My program finds all the Armstrong numbers below the given limit.

#include <stdio.h>
#include <stdbool.h>
#include <string.h>
#include <stdlib.h>
#include <assert.h>
int power(int a, int b) {
 int original_a = a;
 while (b --> 1) { a *= original_a;}
 return a;
}
int digits_len(int n) {
 int length = 0;
 while(n) {
 n /= 10;
 length++;
 }
 return length;
}
int sum_digits_to_the_pow(int n, int exp) {
 int sum = 0;
 while (n) {
 sum += power((n % 10), exp);
 n /= 10;
 }
 return sum;
}
bool is_armstrong(int n) {
 return n == sum_digits_to_the_pow(n, digits_len(n));
}
int main(int argc, char *argv[]) {
 assert(power(4, 3) == 64);
 assert(digits_len(4812) == 4);
 assert(sum_digits_to_the_pow(214,2) == 4+1+16);
 assert(is_armstrong(153) == true);
 for (int i=0; i < atoi(argv[1]); i++) {
 if (is_armstrong(i)) {
 printf("%d ", i);
 }
 }
}

Run my program with (it takes 22 seconds):

gcc -std=c99 -O3 armstrong.c; time ./a.out 100000000
Morwenn
20.2k3 gold badges69 silver badges132 bronze badges
asked Jun 23, 2015 at 10:29
\$\endgroup\$

4 Answers 4

5
\$\begingroup\$

There's a lot of good reviews already. I'll try to add things that haven't been said yet, and point out how you can really speed the program up.

Printout

Your printout is a little strange in that you print all the numbers on one line. That by itself is OK if that's what you prefer, but you should at least print a newline at the very end. For example, in my shell, after I ran your program my shell prompt appeared at the end of your long line of output. Personally, I would prefer one number per line. That way I can more easily compare the results of different runs with varying maximum values.

Faster way of computing digits and length

You duplicate work when you first compute the number of digits and then you extract the digits one by one. In each case you have a division loop. There is a much faster way of doing this work, which is to maintain an array of digits, which represents the current number. Each time you move on to the next number, you can increment the ones digit in this array and handle carries as necessary. By maintaining this array, you no longer have to use division or modulus, which are expensive.

I made a sample program that used this idea and it was 19x faster than the original program.

Incrementally updating the sum

You can do even better than the above by also maintaining the sum of the digits ^ power. Between one number and the next, usually only one digit changes. So you can just update the sum by adding the difference between that digit and the new digit. Suppose a digit changes from x to x+1. To update the sum, you need to do: sum += (x+1)^length - x^length. When the digit carries over from 9 to 0, you do sum -= 9^length. If you have all of these power differences cached in an array, updating the sum will be as simple as sum += diffCache[digit][length].

I wrote a program using this technique, and the result was 66x faster than the original program.

Sample program

Here is my program with the digit array and the incrementally updated sum.

#include <stdio.h>
#include <stdlib.h>
#define MAX_DIGITS 14
void findArmstrong(int maxVal);
int main(int argc, char *argv[])
{
 if (argc < 2) {
 printf("Usage: %s maxval\n", argv[0]);
 return 0;
 }
 int maxVal = atoi(argv[1]);
 if (maxVal < 0)
 return 0;
 findArmstrong(maxVal);
}
/**
 * For this algorithm, an array called digits will contain the current base
 * 10 digits of the number "num". The digits array is in little endian order,
 * which means digits[0] is the ones digit, digits[1] is the tens digit, etc.
 *
 * There is also a diffCache array. This array holds the differences between
 * two consecutive digit ^ power numbers. For example:
 *
 * diffCache[5][8] = 5^8 - 4^8;
 * diffCache[0][8] = 0^8 - 9^8;
 */
void findArmstrong(int maxVal)
{
 int digits[MAX_DIGITS] = {0};
 int length = 1;
 int sum = 0;
 int diffCache[10][MAX_DIGITS];
 // Set up the diffCache array.
 {
 int powerValues[10] = {0,1,2,3,4,5,6,7,8,9};
 for (int j = 1; j < MAX_DIGITS; j++) {
 // powerValues[i] holds the value of: i ^ j
 diffCache[0][j] = -powerValues[9];
 for (int i = 1; i < 10; i++)
 diffCache[i][j] = powerValues[i] - powerValues[i-1];
 // Update powerValues[i] to hold the next higher power
 for (int i = 1; i < 10; i++)
 powerValues[i] *= i;
 }
 }
 // Iterate through every number from 0..maxVal to find Armstrong numbers.
 for (int num = 0; num <= maxVal; num++) {
 // If the number is equal to the sum, this is an Armstrong number.
 if (num == sum)
 printf("%d\n", num);
 // Update digits by adding 1 to the ones digit and carrying to higher
 // digits as necessary. We update the length of the number if we
 // carry to a digit not previously used. The sum is also updated
 // as we update each digit.
 int i = 0;
 while (1) {
 if (++digits[i] == 10) {
 digits[i] = 0;
 sum += diffCache[0][length];
 if (++i == length) {
 // Note that the sum must be 0 at this point, because
 // all the digits are 0 when we carry over to a new length.
 length++;
 }
 continue;
 }
 sum += diffCache[digits[i]][length];
 break;
 }
 }
}

Using a max value of 2000000000, this program ran in 3.3 seconds, compared to the original program which ran in 220 seconds, for a 66x speedup.

answered Jun 24, 2015 at 0:02
\$\endgroup\$
3
  • \$\begingroup\$ We have a winner :) \$\endgroup\$ Commented Jun 24, 2015 at 7:23
  • 1
    \$\begingroup\$ On my machine for that range, this version takes 10.6 seconds and my version takes 55.4 for about a 5x increase in speed, demonstrating once again that it's often better to think carefully about the problem and a fast algorithm than to speed lots of time optimizing a slow algorithm. Nice job! \$\endgroup\$ Commented Jun 24, 2015 at 11:41
  • 1
    \$\begingroup\$ Oops! Wrong compiler flags. When they're both compiled with the same optimizations, this version takes 4.7 seconds for over a 10x increase in speed. \$\endgroup\$ Commented Jun 24, 2015 at 11:55
11
\$\begingroup\$

This is a well-written program. It is clear and easy to follow, which are attributes of good software. With that said, there are still some things that you can do to speed up the performance of the code and there are a few minor comments on formatting. I'm hoping they will help you improve your code.

Avoid peculiar formatting

I have been programming in C for many, many years, but this line threw me for a moment:

while (b --> 1) { a *= original_a;}

It was the --> that was odd, but of course, it is actually equivalent to this:

while (b-- > 1) { 
 a *= original_a;
}

Formatting for clarity, including keeping operators attached to their variables and avoiding one-liners helps others read and understand your code.

Avoid expensive mathematical operations where practical

To speed things along, we can replace this code

int digits_len(int n) {
 int length = 0;
 while(n) {
 n /= 10;
 length++;
 }
 return length;
}

With this:

int digits_len(int n) {
 if (n < 10) return 1;
 else if (n < 100) return 2;
 else if (n < 1000) return 3;
 else if (n < 10000) return 4;
 else if (n < 100000) return 5;
 else if (n < 1000000) return 6;
 else if (n < 10000000) return 7;
 else if (n < 100000000) return 8;
 return 9;
}

It's longer but it avoids the computationally costly division and avoid a loop. Note, however, that it does not produce sensible results for negative numbers. It seemed likely that this was acceptable for the increase in speed gained for this program.

Use a table lookup instead of a loop

The power routine, in context, has the property that neither a nor b are greater than 9. With that observation, it's possible to replace the routine with a simple table lookup. Here's what I used:

const int powtable[10][10] = {
 { 0, 0, 0*0, 0*0*0, 0*0*0*0, 0*0*0*0*0, 0*0*0*0*0*0, 0*0*0*0*0*0*0, 0*0*0*0*0*0*0*0, 0*0*0*0*0*0*0*0*0 },
 { 0, 1, 1*1, 1*1*1, 1*1*1*1, 1*1*1*1*1, 1*1*1*1*1*1, 1*1*1*1*1*1*1, 1*1*1*1*1*1*1*1, 1*1*1*1*1*1*1*1*1 },
 { 0, 2, 2*2, 2*2*2, 2*2*2*2, 2*2*2*2*2, 2*2*2*2*2*2, 2*2*2*2*2*2*2, 2*2*2*2*2*2*2*2, 2*2*2*2*2*2*2*2*2 },
 { 0, 3, 3*3, 3*3*3, 3*3*3*3, 3*3*3*3*3, 3*3*3*3*3*3, 3*3*3*3*3*3*3, 3*3*3*3*3*3*3*3, 3*3*3*3*3*3*3*3*3 },
 { 0, 4, 4*4, 4*4*4, 4*4*4*4, 4*4*4*4*4, 4*4*4*4*4*4, 4*4*4*4*4*4*4, 4*4*4*4*4*4*4*4, 4*4*4*4*4*4*4*4*4 },
 { 0, 5, 5*5, 5*5*5, 5*5*5*5, 5*5*5*5*5, 5*5*5*5*5*5, 5*5*5*5*5*5*5, 5*5*5*5*5*5*5*5, 5*5*5*5*5*5*5*5*5 },
 { 0, 6, 6*6, 6*6*6, 6*6*6*6, 6*6*6*6*6, 6*6*6*6*6*6, 6*6*6*6*6*6*6, 6*6*6*6*6*6*6*6, 6*6*6*6*6*6*6*6*6 },
 { 0, 7, 7*7, 7*7*7, 7*7*7*7, 7*7*7*7*7, 7*7*7*7*7*7, 7*7*7*7*7*7*7, 7*7*7*7*7*7*7*7, 7*7*7*7*7*7*7*7*7 },
 { 0, 8, 8*8, 8*8*8, 8*8*8*8, 8*8*8*8*8, 8*8*8*8*8*8, 8*8*8*8*8*8*8, 8*8*8*8*8*8*8*8, 8*8*8*8*8*8*8*8*8 },
 { 0, 9, 9*9, 9*9*9, 9*9*9*9, 9*9*9*9*9, 9*9*9*9*9*9, 9*9*9*9*9*9*9, 9*9*9*9*9*9*9*9, 9*9*9*9*9*9*9*9*9 }
};
inline int power(int a, int b) {
 return powtable[a][b];
}

The multiplications in the table are all done at compile-time rather than at runtime, so they cost essentially nothing. (I could, of course, have simply placed the appropriate numbers into the table, but I'm very lazy and prefer to make the computer do all the work!)

Avoid unused arguments

The main routine has the usual arguments of argc and argv but argc is never used. If you compile with all warnings turned on (which I would recommend), the compiler will correctly complain about the unused variable. I decided to use it and also to give a hint to the user as to how to use the program:

if (argc < 2) {
 puts("Usage: armstrong maxint\n");
 return 0;
}

Keep using assert

It's refreshing to see code from an avowed beginner at C using assert rationally and correctly. Bonus points for you! Also, although I didn't, you might wish to use asserts to assure that the parameters passed to power are indeed both less than 10.

Results

On my machine, the original code, when compiled with -O2 using gcc version 4.9.2 on a 64-bit Linux box, the code runs in 8.1 seconds. With the changes mentioned above, it runs in 2.4 seconds, or almost 4x faster.

answered Jun 23, 2015 at 12:59
\$\endgroup\$
4
  • 1
    \$\begingroup\$ Well the peculiar formatting was a use of the rundown operator, I could not resist (stackoverflow.com/questions/1642028/…). Also I've been programming for years now, I am new just to C, I use automated tests from experience :) . (Off-topic: you did not write the powtable by hand right? A script to write all kinds of powtables is faster and more fun to write: def write_powtable(a, b): for x in range(0,a+1): print('{',end='') for y in range(0,b+1): print(x**y, end=', ') print('}') write_powtable(10, 10) \$\endgroup\$ Commented Jun 23, 2015 at 13:19
  • 2
    \$\begingroup\$ No, I did not write the table by hand -- I have a smart editor (gvim) and know how to use it effectively. :) \$\endgroup\$ Commented Jun 23, 2015 at 13:26
  • 2
    \$\begingroup\$ vim & similar always looked like wizard tools to me, good job. \$\endgroup\$ Commented Jun 23, 2015 at 13:28
  • 2
    \$\begingroup\$ @Caridorc the "x goes to operator" was probably invented as a joke or on some code obfuscation competition. No self respecting program should use it outside such contexts. ;) \$\endgroup\$ Commented Jun 23, 2015 at 18:50
7
\$\begingroup\$

Power algorithm

You are computing many powers in a tight loop, you probably want your power algorithm to be as fast as possible. Did you try pow from the standard header <math.h>? It operates with floating point numbers, but might still be faster than your algorithm.

Otherwise, you can implement an exponentiation by squaring algorithm to reduce the number of operations you're performing:

int power(int x, int exponent)
{
 // Exponentiation by squaring
 return (exponent == 0) ? 1 :
 (exponent % 2 == 0) ? power(x*x, exponent/2) :
 x * power(x*x, (exponent-1)/2);
}

This version does not handle negative exponents since you don't need them, but it would be trivial to add.

Counting digits

There are other ways to count the decimal digits in a number than the one you use. Once again, I didn't time, but if floating point operations are fast enough, you can simply compute \$\lfloor \log_{10} n \rfloor + 1\$ which in C would be:

floor(log10(n)) + 1

Combined modulo and division

Another speculative improvement (well, I don't have why I need to test at hand, sorry...), but you could theoretically use div from the standard library to compute n % 10 and n / 10 in one operation. The algorithm to compute one also computes the other so computing both at once is free.

div_t res = div(n, 10);
sum += power(res.rem, exp);
n = res.quot;

Now, 10 being a compile-time constant, I wouldn't be surprised if your compiler already optimized things away for you. Note that you don't need to add anything to sum when res.rem is 0 but neither of our power algorithms return immediately when x == 0. Adding a branch to skip the operation might be worth trying even though I suspect that the branch will be too expensive on average:

div_t res = div(n, 10);
if (res.rem) {
 sum += power(res.rem, exp);
}
n = res.quot;

Like the other suggestions in my answer, it should algorithmically be faster, but compilers are smart and reduced algorithms complexity do not always yield faster code in real world.

answered Jun 23, 2015 at 12:33
\$\endgroup\$
7
  • \$\begingroup\$ You just anticipated my answer :) Just for the record, stackoverflow.com/questions/101439/… contains an alternative implementation \$\endgroup\$ Commented Jun 23, 2015 at 12:36
  • \$\begingroup\$ Incredibly enough, your power algoritmh slows the program down very much, by 5 seconds to be exact. This is because the exponent is always small, and the naive method is faster. \$\endgroup\$ Commented Jun 23, 2015 at 12:41
  • \$\begingroup\$ @Caridorc Well, I'm quite surprised that it does so for an input of 100000000. On the other hand, I gave you a recursive implementation which may or may not perform tail recursion. You can try the version in Gentian's link which is iterative. \$\endgroup\$ Commented Jun 23, 2015 at 12:44
  • 1
    \$\begingroup\$ @Caridorc Never higher than 8? Well, then I'm not surprised that it's slower. More efficient algorithms often have a slight higher constant execution time which makes them less suitable for small values :) \$\endgroup\$ Commented Jun 23, 2015 at 12:52
  • 2
    \$\begingroup\$ These are all good things to try, and indeed, I tried them all. However, on my machine, they all slowed things down except for using log10. For that, you can simplify a bit by using ceil rather than floor: return ceil(log10(n)); There's a faster way, though, because of the limited domain of n -- see my answer. \$\endgroup\$ Commented Jun 23, 2015 at 13:05
2
\$\begingroup\$

Benchmark 0: 0m21.372s on my machine with your code as-is, running your command: gcc -std=c99 -O3 armstrong.c; time ./a.out 100000000

Tip #1: Use ++i instead of i++ whenever possible. Read more about this at: https://stackoverflow.com/questions/24853/what-is-the-difference-between-i-and-i

By changing all instances of i++ to ++i, and changing the while loop in your power function (see comments at lines 9 & 10 in full code below), the following benchmark was achieved:

Benchmark 1: 0m19.468s (almost 2s faster than original)

Tip #2: Simplify the conditional expression in for loops as much as possible:

for (int i=0; i<atoi(argv[1]); ++i)

This evaluates atoi(argv[1]) every cycle of the loop, when its value does not change. It's just calculating the same thing over and over. I rewrote your loop like this:

 int iMax = atoi(argv[1]);
 int i=0;
 while (++i < iMax) {
 ...

I saw almost 9 seconds improvement from the original run:

Benchmark 2: 0m13.604s

The lowest time I was able to achieve was 13.059s.

Tip #3: Loop unrolling. Read more about that here: https://en.wikipedia.org/wiki/Loop_unrolling

int i = 0;
while (++i < iMax) {
 if (is_armstrong(i)) {
 printf("%d ", i);
 }
 ... copy and paste previous three lines, however many times pleases you

Benchmark 3: 0m13.059s (loop unrolled 5x) Benchmark 4: 0m12.203s (loop unrolled 10x)

Honesty note: I got anywhere between 12-15 seconds after trying up to 100 unrolls. The effect of loop unrolling is the least substantial change to this code. But it's still a good tip to know about.

Here is your code with my changes:

#include <stdio.h>
#include <stdbool.h>
#include <string.h>
#include <stdlib.h>
#include <assert.h>
int power(int a, int b) {
 int original_a = a;
 // changed (b-- > 1) to (--b)
 // same thing but faster. --b will evaluate to false when b is decremented from 1 to 0
 while (--b) { a *= original_a;}
 return a;
}
int digits_len(int n) {
 int length = 0;
 while(n) {
 n = n/10;
 ++length;
 }
 return length;
}
int sum_digits_to_the_pow(int n, int exp) {
 int sum = 0;
 while (n) {
 sum += power((n % 10), exp);
 n = n/10;
 }
 return sum;
}
bool is_armstrong(int n) {
 return n == sum_digits_to_the_pow(n, digits_len(n));
}
int main(int argc, char *argv[]) {
 assert(power(4, 3) == 64);
 assert(digits_len(4812) == 4);
 assert(sum_digits_to_the_pow(214,2) == 4+1+16);
 assert(is_armstrong(153) == true);
 // took the max value outside of the loop because it was evaluating every loop cycle
 int iMax = atoi(argv[1]);
 int i=0;
 // simplified this
 while (++i < iMax) {
 if (is_armstrong(i)) {
 printf("%d ", i);
 }
 }
}

Theoretical Changes

Multi-Threading: In theory you could reduce the time to nearly 1/n of the original time, if you had an n-core machine and split this process into n threads.

Since the only thing I can really see here that requires a specific order is the output, you could look into POSIX threads and split up your range, evaluating a different subset of the desired range on each core of your machine. Unless your goal is to do this all procedurally on a single thread, you may see the most improvement with this approach.

answered Jun 23, 2015 at 12:13
\$\endgroup\$
4
  • \$\begingroup\$ i++ -> ++i gives only a 0.1 seconds boost on my compiler. b-- -> --b breaks the program (did you even test your changes?) \$\endgroup\$ Commented Jun 23, 2015 at 12:29
  • \$\begingroup\$ Yes, I tested it every step of the way. What do you mean when you say it breaks the program? \$\endgroup\$ Commented Jun 23, 2015 at 12:33
  • \$\begingroup\$ The first line of output is a failing test: a.out: arm.c:36: main: Assertion power(4, 3) == 64 failed. \$\endgroup\$ Commented Jun 23, 2015 at 12:36
  • \$\begingroup\$ That happened to me too when the loop says: while (--b > 1) { ... } I changed it to say: while (--b) { ... } \$\endgroup\$ Commented Jun 23, 2015 at 12:38

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.