Faster map function

Question 1

For my job I was tasked with writing a faster version of Arduino's map. The requirements were only use addition and subtraction, since these are cheap operations on our board.

For reference, here is Arduino's implementation:

long map(long x, long in_min, long in_max, long out_min, long out_max) {
 return (x - in_min) * (out_max - out_min) / (in_max - in_min) + out_min;
}

And here's what I wrote

long fasterMap(long value, long fromLow, long fromHigh, long toLow, long toHigh) {
 long first = value - fromLow;
 long second = toHigh - toLow;
 long combined = 0;
 while (second != 0) { // Avoid usage of * operator
 combined = combined + first;
 second = second - 1;
 }
 long third = fromHigh - fromLow;
 long count = 0;
 while (combined >= third) { // Avoid usage of / operator
 combined = combined - third;
 count = count + 1;
 }
 count = count + toLow;
 return count;
}

Is there any more performance or micro-optimizations I can make to have this function calculate even faster? Thanks!

Note: Because this is running on an Arduino, answers in pure c++ are greatly encouraged.

Question 2

Does your implementation outperform the reference one?

Question 3

Well, I don't speak c++ but checked the provided link to Arduino's map() method and there it states

The function also handles negative numbers well, so that this example

y = map(x, 1, 50, 50, -100);

is also valid and works well.

which your method wouldn't handle well, because with having toHigh = -100 of the fasterMap() method would result in some really long running while loop because second in long second = toHigh - toLow; would result in long second = -100 - 50 and therfor while(second != 0) would never evaluate to true.

Question 4

The division is the real killer since the simple CPU doesn't have a divide instruction. Even in CPUs that do, it is very slow.

If the denominator is known at compile time, the compiler can transform it into a multiplication or otherwise use tricks for that specific value (e.g. powers of two are just a shift).

You want all the in/out parameters to be known at compile time, and only x is the real run-time parameter.

This is done using templates.

template <long in_min, long in_max, long out_min, long out_max>
struct map {
 long operator()(long x) const {
 return (x - in_min) * (out_max - out_min) / (in_max - in_min) + out_min;
 }
}

Now you can use it like this:

map<12,42, 0,100> pbar_mapper;
 ⋮
long output = pbar_mapper(input);

That is, let the compiler figure out the fast trick for multiplying and dividing by these specific constants. With optimizations turned on, it should be at least as fast as any hand-written tricks.

update

Apparently, the compiler for Arduino doesn't replace division with reciprocal multiplication. Only divisions where the denominator is a power of 2 are replaced with simple operations.

The template form does cause the compiler to pre-compute the subexpressions out_max-out_min and in_max-in_min. A big improvement in generated code was seen by using int instead of long. See: https://godbolt.org/z/PPYTWvqff

If int is 16 bits, and you know the values you are interested in will fit in this range, that will be the best benefit. So define the template using int and add a static_assert that (in_max-in_min)*(out_max-out_min) does not exceed the size of an int.

Better yet, detect that case and automatically do the intermediate calculation using long and let the compiler know when it's multiplying two int to get a long and when it's dividing a long by an int, and hopefully it generates simpler code for the multiply and calls a faster form of the divide?

If you can choose the output range to be a power of two, that would eliminate the division.

Hmm and I notice that the "max" values in the original code shown are apparently meant to be exclusive, or the math is a little off and it just seems to work OK if the input value is equal to in_max. Since it's not doing rounding but truncation, a little imprecision might not be noticed.

Anyway, a range of 128 would correspond to parameters of 0,128 or 1,129. If you're mapping to a nice range to pass along to other code, a range that's a power of two would be perfectly natural.

Question 5

Nice trick making use of the compiler optimisation which is replacing the division by a multiplication with the reciprocal. This optimisation is available for most platforms, unfortunately it is not available for Arduino.

Question 6

@stefan Yipes, I wonder why not? Playing with Compiler Explorer, I see it only optimizes/inlines when dividing by powers of two. I see multiplication does have a built-in instruction, and the compiler replaces it for some values and I guess it knows when the built-in is faster (having only single-bit shifts slows it down so x*15 is not replaced). So, I wonder if you could manually generate the equivalent reciprocal multiplication expression? I'm guessing that maybe the double-width arithmetic is slow and that's why it's not helpful.

Question 7

Faster very much depends on the use case. In any case, measure! You may even implement different strategies depending on your parameters.

mapping completely generic over the whole `long` range

do a recursive bisect instead of your plain loop

long bisect_map(long x, long in_min, long in_max, long out_min, long out_max) {
 if (in_min > in_max)
 return bisect_map(x, in_max, in_min, out_max, out_min);
 if (out_min == out_max)
 return out_min;
 const long in_mid = (in_min + in_max) >> 1;
 const long out_mid = (out_min + out_max) >> 1;
 if (in_min == in_mid)
 return out_mid;
 if (x <= in_mid)
 return bisect_map(x, in_min, in_mid, out_min, out_mid);
 else
 return bisect_map(x, in_mid + 1, in_max, out_mid, out_max);
}

On all algorithms - use fixed point arithmetic to avoid precision loss

mapping different values on the same ranges

Precompute the division. Multiplication remains during runtime

class Mapper
{
public:
 Mapper(long in_min, long in_max, long out_min, long out_max, long fp_length = 8)
 : in_min(in_min)
 , out_min(out_min)
 , fp_length(fp_length)
 , factor(((out_max - out_min) << fp_length) / (in_max - in_min)) { }
 long operator()(long x) {
 return ((((x - in_min) * factor) >> fp_length) + out_min);
 }
private:
 long in_min;
 long out_min;
 long fp_length;
 long factor;
};

Question 8

We have no detection of integer overflow (in either the reference implementation or the replacement). Given that we're using a signed type, any overflow would be Undefined Behaviour.

I'm surprised that your implementation turns out faster than the reference. If your compiler can't implement multiplication at least as fast as repeated addition, or division as fast as repeated subtraction, you need a better compiler.

A faster (and less overflow-prone) technique may be to use one of the standard straight-line-drawing algorithms, since we can visualise the rescaling operation as line-graph lookup.

Question 9

On Arduinos with an AVR CPU, it generates a call _mulsi3 for the multiplication and call _divmodsi4 for the division (see godbolt.org/z/ebdPTW). Funnily enough, the compiler sees that the first while-loop in fasterMap() is a multiplication and replaces it by call _mulsi3. It doesn't do so for the division. Possibly because OP's code doesn't handle division by zero, and as @Heslacher mentioned, it doesn't handle negative numbers very well.

Heslacher Heslacher 50.9k5 gold badges83 silver badges177 bronze badges · Accepted Answer · 2021-01-08 04:56:56Z

Well, I don't speak c++ but checked the provided link to Arduino's map() method and there it states

The function also handles negative numbers well, so that this example

y = map(x, 1, 50, 50, -100);

is also valid and works well.

which your method wouldn't handle well, because with having toHigh = -100 of the fasterMap() method would result in some really long running while loop because second in long second = toHigh - toLow; would result in long second = -100 - 50 and therfor while(second != 0) would never evaluate to true.

Stack Exchange Network

Faster map function

4 Answers 4

update

mapping completely generic over the whole `long` range

mapping different values on the same ranges

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Faster map function

4 Answers 4

update

mapping completely generic over the whole long range

mapping different values on the same ranges

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

mapping completely generic over the whole `long` range