17

I'm preparing a lecture where I will start with a many-branched conditional statement and replace it with a table. For example, I could start with:

function getMonthName(monthNumber) {
 if (monthNumber === 1) return "January";
 else if (monthNumber === 2) return "February";
 else if (monthNumber === 3) return "March";
 ... 
}

and replace it with:

function getMonthName(monthNumber) {
 const monthNames = [
 "January", "February", "March", "April", "May", "June",
 "July", "August", "September", "October", "November", "December"
 ];
 
 if (monthNumber < 1 || monthNumber > monthNames.length) {
 return "Invalid month number";
 } else {
 return monthNames[monthNumber - 1];
 }
}

I went to look up what this refactoring was called but couldn't find it in the conditionals section of Martin Fowler's Refactoring.com or at refactoring.guru.

Is there a good reason it doesn't appear in the refactoring catalog, or is there somewhere else I should be looking?

Edit: I use Fowler's definition of refactoring:

Refactoring (noun): a change made to the internal structure of software to make it easier to understand and cheaper to modify without changing its observable behavior.

My question is not about which approach is more efficient.

asked Jul 29 at 22:20
15
  • 7
    This isn't something I would consider to have any upsides, compared to the if/else or switch/case statement. It's a popular idiom in certain languages, but it's not a universally recognized pattern across languages. It just saves key strokes in the implementation of a single method. Commented Jul 29 at 23:16
  • 7
    There is a matching Lookup Table design pattern. Commented Jul 30 at 3:44
  • 9
    There is no reason that array should be reconstructed every single time the function is invoked. Make it a constant (and immutable, preferably) defined outside the function. Commented Jul 30 at 7:13
  • 13
    @user229044 Curbing a twelve-fold repetition is most definitely an upside. Commented Jul 30 at 8:01
  • 9
    I would refactor it to switch/match though. And be done with. The details of the second snippet depend too much on the actual language/runtime. It will be sometimes worth it, sometimes the opposite. While switch/match sounds like a sweet spot. Commented Jul 30 at 10:01

7 Answers 7

24

You describe Lookup Table optimization. It is likely missing from "refactoring" lists because, if used with arrays on large keysets and performant languages, it had an obvious performance boost and there was no reason not to use it (making it an "optimization" and not a "refactoring").

However, its use with less efficient dictionaries and in higher-level languages makes it a stylistic choice and therefore refactoring.

So just name it Lookup Table refactoring and mention the origin in optimization technique.

Some optimization notes:

answered Jul 30 at 5:13
4
  • 5
    It seems to me that the main motivation for this would be flexibility. You can load the mappings at startup or even modify them dynamically. For example, if you need to map the names of the months in different languages. The performance aspect of this seems like a distraction unless you've analyzed the performance and found the mapping to be a hotspot. Commented Jul 30 at 16:16
  • @JimmyJamessupportsCanada the context is "refactoring" configuration loading is a new feature instead. Commented Jul 30 at 16:18
  • 2
    @Basilevs True, to a large degree. But if you had this feature implemented as e.g. a two-layered conditional (language, then month) or using polymorphism, I think this kind of thing is preferrable simply because it eliminates a lot of unnecessary code. Commented Jul 30 at 16:23
  • Aha ! This is the reason for your excellent question on c++ lookup optimisation on SO ;-) Exellent answer btw Commented Jul 31 at 15:44
10

Nor can I find it in Martin Fowler's Refactoring book, but this is indeed a real thing. Usually, it is called "replacing a switch by a map," and is based on an object (or dictionary) where input value is mapped to the corresponding output, with sometimes a default case being also set. It is particularly popular with JavaScript, as well as Python. Your variant is slightly different, as you're not using a map, but rather an array. Both approaches are still similar.

Why is it not listed? The possible reasons I can think of:

  • The authors didn't think about it back then, or the technique is not applicable in the languages they are most familiar with.
  • In terms of readability, it is not necessarily an improvement. With a switch, it's straightforward to see what's going on. With your proposal of an array, it is not immediately clear what's going on, and there is a possibility to make an off-by-one error.
  • This is not a basic refactoring, but something that could have concrete negative impact, especially in terms of memory usage and performance. I wouldn't be surprised to discover that compilers have substantial optimizations of standard switch statements or long chains of conditionals, but cannot do much in a presence of a map.

In all cases, make sure to be very clear with your students about the limitations of this change. While it may sometimes make sense to do it in some languages such as JavaScript, things are quite different in, say, modern C#, where one would rather do the opposite change, moving towards pattern matching.

Notes on performance

TLDR: your refactoring, in the form you suggested, consistently leads to slower code. Additional optimizations make it faster, compared to a switch, in some languages, but not in others.

Details:

Regarding the performance, specifically, doing some benchmarks may help getting an idea how switch performs compared to an array or a map.

C#

Here's the benchmark for C#:

using System;
using System.Collections.ObjectModel;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
BenchmarkRunner.Run<All>();
public class All
{
 [Benchmark] public void PatternMatchingCase() => WithPatternMatching(2);
 [Benchmark] public void CollectionCase() => WithCollection(2);
 public static string WithPatternMatching(int i) => i switch
 {
 1 => "January",
 2 => "February",
 3 => "March",
 4 => "April",
 5 => "May",
 6 => "June",
 7 => "July",
 8 => "August",
 9 => "September",
 10 => "October",
 11 => "November",
 12 => "December",
 _ => throw new ArgumentOutOfRangeException(),
 };
 public static string WithCollection(int i)
 {
 var months = new Collection<string> { "January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December" };
 if (i < 1 || i > months.Count)
 {
 throw new ArgumentOutOfRangeException();
 }
 return months[i - 1];
 }
}

Output (lower = faster):

Method Mean Error StdDev
PatternMatching 0.7649 ns 0.0063 ns 0.0053 ns
Collection 147.1400 ns 0.5143 ns 0.4559 ns
Dictionary 208.0772 ns 1.4790 ns 1.3834 ns

Yes, your refactoring results in code that is 190 times slower. Quite a huge hit! Dictionary appears to perform even worse.

The benchmark is currently flawed, as it's not the lookup that takes time, but the initialization of the collection and the dictionary. Note that it is very easy to miss this point during the refactoring—you missed it, and so did I, when writing an earlier variant of my answer. Let's move the initialization outside the code being measured:

// ... code unchanged.
public class All
{
 [Benchmark] ... // Code unchanged.
 public static string WithPatternMatching(int i) => i switch
 {
 1 => ... // Code unchanged.
 };
 private static Collection<string> Months = ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"];
 private static Dictionary<int, string> MonthsDictionary = new()
 {
 [1] = "January",
 [2] = "February",
 [3] = "March",
 [4] = "April",
 [5] = "May",
 [6] = "June",
 [7] = "July",
 [8] = "August",
 [9] = "September",
 [10] = "October",
 [11] = "November",
 [12] = "December",
 };
 public static string WithCollection(int i) => i < 1 || i > Months.Count ? "Invalid month number" : Months[i - 1];
 public static string WithDictionary(int i) => MonthsDictionary.GetValueOrDefault(i, "Invalid month number");
}
Method Mean Error StdDev
PatternMatching 0.8646 ns 0.0148 ns 0.0138 ns
Collection 2.9836 ns 0.0143 ns 0.0127 ns
Dictionary 4.7731 ns 0.0158 ns 0.0140 ns

Collection is still 3.4 times slower, and dictionary is 5.5 times slower.

What if we test the edge case of a value being outside the range?

Method Mean Error StdDev
PatternMatching 1.156 ns 0.0127 ns 0.0119 ns
Collection 1.752 ns 0.0131 ns 0.0122 ns
Dictionary 3.395 ns 0.0593 ns 0.0495 ns

The switch is now much slower (as it needs to traverse all the cases before falling back to the default one), while the other two approaches have roughly the same performance. They are still way behind, being 1.5 and 2.9 times slower.

JavaScript

On the other hand, in JavaScript:

const Benchmark = require('benchmark');
function withSwitch(i) {
 switch (i) {
 case 1: return "January";
 case 2: return "February";
 case 3: return "March";
 case 4: return "April";
 case 5: return "May";
 case 6: return "June";
 case 7: return "July";
 case 8: return "August";
 case 9: return "September";
 case 10: return "October";
 case 11: return "November";
 case 12: return "December";
 default: return "Invalid month number";
 }
}
function withMap(i) {
 const months = {
 1: "January",
 2: "February",
 3: "March",
 4: "April",
 5: "May",
 6: "June",
 7: "July",
 8: "August",
 9: "September",
 10: "October",
 11: "November",
 12: "December",
 };
 return months[i] || "Invalid month number";
}
function withArray(i) {
 const months = ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"];
 const outsideRange = i < 1 || i > months.length;
 return outsideRange ? "Invalid month number" : months[i - 1];
}
new Benchmark.Suite()
 .add('withSwitch', () => withSwitch(2))
 .add('withArray', () => withArray(2))
 .add('withMap', () => withMap(2))
 .on("cycle", event => console.log(event.target.toString()))
 .run({ 'async': false });

gives (higher = faster):

withSwitch x 171,580,930 ops/sec ±4.63% (85 runs sampled)
withArray x 166,302,420 ops/sec ±4.86% (82 runs sampled)
withMap x 163,659,964 ops/sec ±6.33% (82 runs sampled)

The switch appears to be a little bit faster, but only marginally.

The results are quite similar when using the value 13 instead of 2:

withSwitch x 162,796,741 ops/sec ±5.13% (80 runs sampled)
withArray x 144,900,250 ops/sec ±4.26% (82 runs sampled)
withMap x 162,223,673 ops/sec ±4.68% (81 runs sampled)

The switch and the map have the same performance now. The array is lagging behind, but not by a significant margin.

If the initialization of the array and the object are moved outside, then, somehow, the implementations that use them become slower. This is quite unexpected for me!

monthsMap = {
 1: "January",
 2: "February",
 3: "March",
 4: "April",
 5: "May",
 6: "June",
 7: "July",
 8: "August",
 9: "September",
 10: "October",
 11: "November",
 12: "December",
 };
function withMap(i) {
 return monthsMap[i] || "Invalid month number";
}
monthsArray = ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"];
function withArray(i) {
 const outsideRange = i < 1 || i > monthsArray.length;
 return outsideRange ? "Invalid month number" : monthsArray[i - 1];
}

withSwitch x 174,634,716 ops/sec ±3.98% (84 runs sampled)
withArray x 142,424,333 ops/sec ±3.84% (78 runs sampled)
withMap x 148,519,403 ops/sec ±3.69% (82 runs sampled)

Python

What about Python?

import timeit
def withIfs(i):
 if i == 1: return "January"
 if i == 2: return "February"
 if i == 3: return "March"
 if i == 4: return "April"
 if i == 5: return "May"
 if i == 6: return "June"
 if i == 7: return "July"
 if i == 8: return "August"
 if i == 9: return "September"
 if i == 10: return "October"
 if i == 11: return "November"
 if i == 12: return "December"
 return "Invalid month number"
def withMatch(i):
 match i:
 case 1: return "January"
 case 2: return "February"
 case 3: return "March"
 case 4: return "April"
 case 5: return "May"
 case 6: return "June"
 case 7: return "July"
 case 8: return "August"
 case 9: return "September"
 case 10: return "October"
 case 11: return "November"
 case 12: return "December"
 case _: return "Invalid month number"
def withMap(i):
 return {
 1: "January",
 2: "February",
 3: "March",
 4: "April",
 5: "May",
 6: "June",
 7: "July",
 8: "August",
 9: "September",
 10: "October",
 11: "November",
 12: "December",
 }.get(i, "Invalid month number")
def withList(i):
 months = ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"]
 try:
 return months[i - 1]
 except IndexError:
 return "Invalid month number"
count = 10_000_000
print(timeit.timeit(lambda: withIfs(2), number=count))
print(timeit.timeit(lambda: withMatch(2), number=count))
print(timeit.timeit(lambda: withMap(2), number=count))
print(timeit.timeit(lambda: withList(2), number=count))

gives (lower = faster):

Function Duration
withIfs(2) 0.5304072569999789
withMatch(2) 0.5419288069999766
withMap(2) 2.8250629760000265
withList(2) 0.7911251800000514

Your solution is therefore slightly slower (by a factor of 1.5), however, it's the map that appears to be 5.3 times slower this time.

When tested with the value 13 instead of 2, the results are a bit different:

Function Duration
withIfs(13) 1.6137685459999602
withMatch(13) 1.6344169129999955
withMap(13) 2.7638338360000034
withList(13) 1.9673762900000042

The conditionals are still faster, but only marginally: list is 1.2 times slower, and map is only 1.7 times slower this time.

If the initialization is externalized, then the list appears to be the faster, the map is a bit slower, and the conditionals are the slowest:

map = {
 1: "January",
 2: "February",
 3: "March",
 4: "April",
 5: "May",
 6: "June",
 7: "July",
 8: "August",
 9: "September",
 10: "October",
 11: "November",
 12: "December",
}
def withMap(i):
 return map.get(i, "Invalid month number")
list = ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"]
def withList(i):
 try:
 return list[i - 1]
 except IndexError:
 return "Invalid month number"

gives:

Function Duration
withIfs(2) 0.5488165699999854
withMatch(2) 0.5435621729999411
withMap(2) 0.5075932459999422
withList(2) 0.4404710260000684
answered Jul 29 at 23:14
25
  • 2
    This has an enormous performance gain on large keysets actually. The conditional statement, subject to (unreliable) branch prediction, is replaced with a math operation (instead of map an aŕray is often used). Moreover, this is a prominent optimisation in C and C++ compilers for switches and multiple if statements. This is effectively a special case of jump table optimisation. Commented Jul 30 at 3:35
  • 2
    for speed you have to use low level arrays, not collections like lists and maps. Also, you have to avoid creating objects, unless there is a scalar optimization done by JIT. Commented Jul 30 at 9:43
  • 1
    And of course if you cache the dict and list outside of those functions, then they become drastically faster. Always at least as fast as if/match, and often faster (the bigger i is). Commented Jul 30 at 10:13
  • 3
    Putting array/map/dictionary definitions in functions/methods is pretty darned silly; putting the definitions in the object's "global" section (where they're defined once) seems pretty obvious. I'm old, though, and probably accused of premature optimization. Commented Jul 30 at 14:41
  • 5
    @RonJohn: exactly! But it's easy to miss, if you're thinking in terms of pure refactoring—take a method, rewrite it slightly, call it a day. Not surprisingly, the author of the question missed this aspect as well. And so did I when I was originally writing my answer. Commented Jul 30 at 15:28
4

Because if the values are fixed, that is, not expected to change and grow (which would have made polymorphism the better choice) then the better refactoring isn't an array. It's an Enum.

By defining every value in the same file you've given up on the ability to add more values without having to edit existing, tested, files. See the Open Closed Principle to know why you'd care.

If you're that confident that you're not going to be adding the month of Smarch then an Enum has the bonus of making an invalid value a compile time error. Much better than run time error messages.

Now, why there isn't there a replace conditional with Enum refactoring? Well because books can only be so big.

answered Jul 30 at 4:48
4
  • 2
    If the aim of the function is to produce a user-facing string representation then, even if you replace the raw integers with an enum, you'll still need some sort of lookup table or switch statement to convert the enums to strings (assuming you're not using a language that provides enum->string conversion automatically). Commented Jul 30 at 12:23
  • @SeanBurton If you have enum values as objects (like e.g. Java), they can have the name as a property. Commented Jul 31 at 23:11
  • Polymorphism would be better when the values are expected to grow/change? Can you elaborate please? On first read, that seems like the absolute wrong tool for the job. Commented Aug 1 at 6:15
  • It is. When the values don’t change. OCP explains this fairly well. Commented Aug 1 at 17:18
1

Because it doesn't work for the general case.

The requirement for this to work is that you have a key/value pair of some kind.

Once you recognise that, your conditional is just the implementation of the lookup function on that list of values. This makes the refactoring question much more complicated because instead of a generic justification like "moving this into a private method makes code more readable" you are comparing the time and space complexity of various hash functions for small data sets and the switch case might well be the most optimal.

If you further simplify your code you can get rid of the entire function and just have monthNames[monthNumber] in the calling code.

answered Jul 29 at 23:55
3
  • 7
    No refactorings are applicable "in general" Commented Jul 30 at 3:24
  • @Basilevs but the question is not what is refactoring, but why particular change is not "standard" refactoring - which is answered by this post in my view: the case where this refactoring is applicable far less common than other refactorings, thus it is not included in "standard" once... Commented Jul 31 at 0:33
  • @Basilevs sonarqube disagrees with you Commented Jul 31 at 2:12
1

You could first replace a sequence of "if" statements with a switch. This will often give you the fastest and most compact code possible.

Then you can replace a "switch" with an array, which uses more complex code. And if the array is small, you might replace the array with a sequence of if’s. And in addition, you might want to look at a sequence of ternary expressions. Each of those changes is a refactoring (and the definition of refactoring only tells you that it replaces code with different but equivalent code, not that it is a good idea).

I’d just present the different ways how the action can be written in code, and different capabilities and pros/cons. For example, in this case the array approach has problems if you want abbreviated names for months, or month names in different languages. Or if you want specific text for this/next/previous month. Where the "switch" doesn’t work well either.

answered Jul 30 at 9:21
0

It is not a standard refactoring because in many cases it would not improve performance or readability of the code, so it has generally less value and in a high functioning team of developers it is a candidate to be refactored back to the conditional logic state.

Specifically with your example, the CS optimisation is to use an Enum, and languages like javascript that do not support enums do have guidelines or workarounds to implement enums using object maps similar to your example:

const monthNames = Object.freeze({
 January: 1,
 February: 2,
 March: 3,
 April: 4,
 May: 5,
 June: 6,
 July: 7,
 August: 8,
 September: 9,
 October: 10,
 November: 11,
 December: 12,
});
...
function getMonthName(monthNumber) {
 var index = Object.values(monthNames).indexOf(monthNumber);
 if (index < 0) {
 return "Invalid month number";
 } else {
 return Object.keys(monthNames)[index];
 }
}

Enums provide a structure that you can query like a table, but the code can still describe and constrain the data in a meaningful way.

If your conditions are simple static value comparisons then enums or tables could be made to work, but these is not likely to result in faster execution. For complex conditional logic to be expressed as a table, you would need to build a matrix of all the possible outcomes. In CS we use truth tables for this purpose, ironically conditional logic is the developer's tool for expressing truth table logic in an optimal way.

The reverse of your is the proposal is more often true, it is standard to refactor a lookup of a static table of expressions into pattern matching or other conditional logic.

  • In a general sense then, using a table to replicate conditional logic is an anti-pattern.

From a code maintenance POV, the problem with tables or arrays in place of conditional expressions is that they abstract away the intent and generally make the code harder to read, and therefore maintain the code. In most cases we optimise the code for performance or to reduce memory footprint, which means the definition and initialization of a table in this case is usually moved to a different area of the process, whether that be using a Memoize pattern, Singleton, static or just to the constructor of the class...

Once the definition of your table has been moved from the implementation of your logic, it is very easy to conceive that other developers (or future you) might add additional records to that table and that may have unintended consequences for your code.

There are use cases where you can justify it, but as a general rule, it is an anti-pattern. You haven't gained any significant value through this refactoring but what we have done is introduced greater complexity, especially now that you have mapped a 1-based domain value to a 0-based array index. It is a simple enough brain cycle to compute, but if the user enters a value of 5, and you need to change the response, you must now reverse engineer the logic to determine that '5' corresponds to the 5th element, which is counter-intuitive in 0-based programming languages... If the developer inserted a new element, maybe intentionally or by accidentally forgetting to remove the element that should have been replaced, that affects all of the other values but there are no compile time checks to detect this.

answered Aug 14 at 0:08
-1

The catalog is not a catalog of all known refactoring. A vast number of refactoring exist, but only a small set of fundamental refactoring are selected by Fowler for his catalog. (And it is based on a book, so there is a practical limit on how many refactorings can be included.)

This is not a question of whether it is a "good" or "bad" refactoring. A refactoring is neutral in itself and only good insofar it brings you closer to a stated goal. If for example your goal is to support multiple translations of the month names, the "replace conditional with table" refactoring is good since it brings you a step closer to that goal. But the refactoring makes the code more complicated (e.g. by introducing the monthNumber - 1 calculation) so you should have a goal which justifies this.

The refactorings in the catalog are all vary broadly applicable, but the "replace conditional with table" refactoring can only apply when some very specific conditions are fulfilled:

  • each branch in the conditional checks for a single constant value
  • each branch returns a single result and does nothing else
  • the values are integers
  • all values in the given range have a result

It is just not that often you have this particular scenario, compared to the refactorings in the catalog which are much more general.

Furthermore, the refactoring is not "atomic". It should be split into at least two refactorings:

  • replace conditional with select-case
  • replace switch-case with lookup table

Each of these are simpler and more general than the compound refactoring. Switch-case is an appropriate intermediate step because it is closer to the lookup-table semantics.

answered Aug 14 at 6:37
3
  • A branch can check for multiple values - table rows are not necessarily unique. Fix first condition. Commented Aug 14 at 7:30
  • For floating point keys (or anything else supportung comparison) there is a variant with sorted tree, that support ranges instead of a single value. Commented Aug 14 at 7:37
  • Result can be a default or replacement value, handled by a dedicated code just like switch does. A sparse range of keys does not preclude optimization, unless it is too wide. I would suggest to replace the condition "all keys should have value" with "key range should be reasonably wide". Commented Aug 14 at 9:49

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.