I'm preparing a lecture where I will start with a many-branched conditional statement and replace it with a table. For example, I could start with:
function getMonthName(monthNumber) {
if (monthNumber === 1) return "January";
else if (monthNumber === 2) return "February";
else if (monthNumber === 3) return "March";
...
}
and replace it with:
function getMonthName(monthNumber) {
const monthNames = [
"January", "February", "March", "April", "May", "June",
"July", "August", "September", "October", "November", "December"
];
if (monthNumber < 1 || monthNumber > monthNames.length) {
return "Invalid month number";
} else {
return monthNames[monthNumber - 1];
}
}
I went to look up what this refactoring was called but couldn't find it in the conditionals section of Martin Fowler's Refactoring.com or at refactoring.guru.
Is there a good reason it doesn't appear in the refactoring catalog, or is there somewhere else I should be looking?
Edit: I use Fowler's definition of refactoring:
Refactoring (noun): a change made to the internal structure of software to make it easier to understand and cheaper to modify without changing its observable behavior.
My question is not about which approach is more efficient.
7 Answers 7
You describe Lookup Table optimization. It is likely missing from "refactoring" lists because, if used with arrays on large keysets and performant languages, it had an obvious performance boost and there was no reason not to use it (making it an "optimization" and not a "refactoring").
However, its use with less efficient dictionaries and in higher-level languages makes it a stylistic choice and therefore refactoring.
So just name it Lookup Table refactoring and mention the origin in optimization technique.
Some optimization notes:
- conditionals are more effective on small keyset
- complex data structures like hashmaps are less performant than arrays on small keysets
- modern CPU pipelines and branch prediction make this optimization necessary (as in, omitting it would be detrimental to performance)
- multiple questions on StackOverflow cover lookup tables
- modern C++ compilers default to jump tables which are less performant
-
5It seems to me that the main motivation for this would be flexibility. You can load the mappings at startup or even modify them dynamically. For example, if you need to map the names of the months in different languages. The performance aspect of this seems like a distraction unless you've analyzed the performance and found the mapping to be a hotspot.JimmyJames– JimmyJames07/30/2025 16:16:18Commented Jul 30 at 16:16
-
@JimmyJamessupportsCanada the context is "refactoring" configuration loading is a new feature instead.Basilevs– Basilevs07/30/2025 16:18:56Commented Jul 30 at 16:18
-
2@Basilevs True, to a large degree. But if you had this feature implemented as e.g. a two-layered conditional (language, then month) or using polymorphism, I think this kind of thing is preferrable simply because it eliminates a lot of unnecessary code.JimmyJames– JimmyJames07/30/2025 16:23:26Commented Jul 30 at 16:23
-
Aha ! This is the reason for your excellent question on c++ lookup optimisation on SO ;-) Exellent answer btwChristophe– Christophe07/31/2025 15:44:11Commented Jul 31 at 15:44
Nor can I find it in Martin Fowler's Refactoring book, but this is indeed a real thing. Usually, it is called "replacing a switch by a map," and is based on an object (or dictionary) where input value is mapped to the corresponding output, with sometimes a default case being also set. It is particularly popular with JavaScript, as well as Python. Your variant is slightly different, as you're not using a map, but rather an array. Both approaches are still similar.
Why is it not listed? The possible reasons I can think of:
- The authors didn't think about it back then, or the technique is not applicable in the languages they are most familiar with.
- In terms of readability, it is not necessarily an improvement. With a
switch
, it's straightforward to see what's going on. With your proposal of an array, it is not immediately clear what's going on, and there is a possibility to make an off-by-one error. - This is not a basic refactoring, but something that could have concrete negative impact, especially in terms of memory usage and performance. I wouldn't be surprised to discover that compilers have substantial optimizations of standard switch statements or long chains of conditionals, but cannot do much in a presence of a map.
In all cases, make sure to be very clear with your students about the limitations of this change. While it may sometimes make sense to do it in some languages such as JavaScript, things are quite different in, say, modern C#, where one would rather do the opposite change, moving towards pattern matching.
Notes on performance
TLDR: your refactoring, in the form you suggested, consistently leads to slower code. Additional optimizations make it faster, compared to a switch, in some languages, but not in others.
Details:
Regarding the performance, specifically, doing some benchmarks may help getting an idea how switch
performs compared to an array or a map.
C#
Here's the benchmark for C#:
using System;
using System.Collections.ObjectModel;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
BenchmarkRunner.Run<All>();
public class All
{
[Benchmark] public void PatternMatchingCase() => WithPatternMatching(2);
[Benchmark] public void CollectionCase() => WithCollection(2);
public static string WithPatternMatching(int i) => i switch
{
1 => "January",
2 => "February",
3 => "March",
4 => "April",
5 => "May",
6 => "June",
7 => "July",
8 => "August",
9 => "September",
10 => "October",
11 => "November",
12 => "December",
_ => throw new ArgumentOutOfRangeException(),
};
public static string WithCollection(int i)
{
var months = new Collection<string> { "January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December" };
if (i < 1 || i > months.Count)
{
throw new ArgumentOutOfRangeException();
}
return months[i - 1];
}
}
Output (lower = faster):
Method | Mean | Error | StdDev |
---|---|---|---|
PatternMatching | 0.7649 ns | 0.0063 ns | 0.0053 ns |
Collection | 147.1400 ns | 0.5143 ns | 0.4559 ns |
Dictionary | 208.0772 ns | 1.4790 ns | 1.3834 ns |
Yes, your refactoring results in code that is 190 times slower. Quite a huge hit! Dictionary appears to perform even worse.
The benchmark is currently flawed, as it's not the lookup that takes time, but the initialization of the collection and the dictionary. Note that it is very easy to miss this point during the refactoring—you missed it, and so did I, when writing an earlier variant of my answer. Let's move the initialization outside the code being measured:
// ... code unchanged.
public class All
{
[Benchmark] ... // Code unchanged.
public static string WithPatternMatching(int i) => i switch
{
1 => ... // Code unchanged.
};
private static Collection<string> Months = ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"];
private static Dictionary<int, string> MonthsDictionary = new()
{
[1] = "January",
[2] = "February",
[3] = "March",
[4] = "April",
[5] = "May",
[6] = "June",
[7] = "July",
[8] = "August",
[9] = "September",
[10] = "October",
[11] = "November",
[12] = "December",
};
public static string WithCollection(int i) => i < 1 || i > Months.Count ? "Invalid month number" : Months[i - 1];
public static string WithDictionary(int i) => MonthsDictionary.GetValueOrDefault(i, "Invalid month number");
}
Method | Mean | Error | StdDev |
---|---|---|---|
PatternMatching | 0.8646 ns | 0.0148 ns | 0.0138 ns |
Collection | 2.9836 ns | 0.0143 ns | 0.0127 ns |
Dictionary | 4.7731 ns | 0.0158 ns | 0.0140 ns |
Collection is still 3.4 times slower, and dictionary is 5.5 times slower.
What if we test the edge case of a value being outside the range?
Method | Mean | Error | StdDev |
---|---|---|---|
PatternMatching | 1.156 ns | 0.0127 ns | 0.0119 ns |
Collection | 1.752 ns | 0.0131 ns | 0.0122 ns |
Dictionary | 3.395 ns | 0.0593 ns | 0.0495 ns |
The switch is now much slower (as it needs to traverse all the cases before falling back to the default one), while the other two approaches have roughly the same performance. They are still way behind, being 1.5 and 2.9 times slower.
JavaScript
On the other hand, in JavaScript:
const Benchmark = require('benchmark');
function withSwitch(i) {
switch (i) {
case 1: return "January";
case 2: return "February";
case 3: return "March";
case 4: return "April";
case 5: return "May";
case 6: return "June";
case 7: return "July";
case 8: return "August";
case 9: return "September";
case 10: return "October";
case 11: return "November";
case 12: return "December";
default: return "Invalid month number";
}
}
function withMap(i) {
const months = {
1: "January",
2: "February",
3: "March",
4: "April",
5: "May",
6: "June",
7: "July",
8: "August",
9: "September",
10: "October",
11: "November",
12: "December",
};
return months[i] || "Invalid month number";
}
function withArray(i) {
const months = ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"];
const outsideRange = i < 1 || i > months.length;
return outsideRange ? "Invalid month number" : months[i - 1];
}
new Benchmark.Suite()
.add('withSwitch', () => withSwitch(2))
.add('withArray', () => withArray(2))
.add('withMap', () => withMap(2))
.on("cycle", event => console.log(event.target.toString()))
.run({ 'async': false });
gives (higher = faster):
withSwitch x 171,580,930 ops/sec ±4.63% (85 runs sampled)
withArray x 166,302,420 ops/sec ±4.86% (82 runs sampled)
withMap x 163,659,964 ops/sec ±6.33% (82 runs sampled)
The switch appears to be a little bit faster, but only marginally.
The results are quite similar when using the value 13
instead of 2
:
withSwitch x 162,796,741 ops/sec ±5.13% (80 runs sampled)
withArray x 144,900,250 ops/sec ±4.26% (82 runs sampled)
withMap x 162,223,673 ops/sec ±4.68% (81 runs sampled)
The switch and the map have the same performance now. The array is lagging behind, but not by a significant margin.
If the initialization of the array and the object are moved outside, then, somehow, the implementations that use them become slower. This is quite unexpected for me!
monthsMap = {
1: "January",
2: "February",
3: "March",
4: "April",
5: "May",
6: "June",
7: "July",
8: "August",
9: "September",
10: "October",
11: "November",
12: "December",
};
function withMap(i) {
return monthsMap[i] || "Invalid month number";
}
monthsArray = ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"];
function withArray(i) {
const outsideRange = i < 1 || i > monthsArray.length;
return outsideRange ? "Invalid month number" : monthsArray[i - 1];
}
withSwitch x 174,634,716 ops/sec ±3.98% (84 runs sampled)
withArray x 142,424,333 ops/sec ±3.84% (78 runs sampled)
withMap x 148,519,403 ops/sec ±3.69% (82 runs sampled)
Python
What about Python?
import timeit
def withIfs(i):
if i == 1: return "January"
if i == 2: return "February"
if i == 3: return "March"
if i == 4: return "April"
if i == 5: return "May"
if i == 6: return "June"
if i == 7: return "July"
if i == 8: return "August"
if i == 9: return "September"
if i == 10: return "October"
if i == 11: return "November"
if i == 12: return "December"
return "Invalid month number"
def withMatch(i):
match i:
case 1: return "January"
case 2: return "February"
case 3: return "March"
case 4: return "April"
case 5: return "May"
case 6: return "June"
case 7: return "July"
case 8: return "August"
case 9: return "September"
case 10: return "October"
case 11: return "November"
case 12: return "December"
case _: return "Invalid month number"
def withMap(i):
return {
1: "January",
2: "February",
3: "March",
4: "April",
5: "May",
6: "June",
7: "July",
8: "August",
9: "September",
10: "October",
11: "November",
12: "December",
}.get(i, "Invalid month number")
def withList(i):
months = ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"]
try:
return months[i - 1]
except IndexError:
return "Invalid month number"
count = 10_000_000
print(timeit.timeit(lambda: withIfs(2), number=count))
print(timeit.timeit(lambda: withMatch(2), number=count))
print(timeit.timeit(lambda: withMap(2), number=count))
print(timeit.timeit(lambda: withList(2), number=count))
gives (lower = faster):
Function | Duration |
---|---|
withIfs(2) | 0.5304072569999789 |
withMatch(2) | 0.5419288069999766 |
withMap(2) | 2.8250629760000265 |
withList(2) | 0.7911251800000514 |
Your solution is therefore slightly slower (by a factor of 1.5), however, it's the map that appears to be 5.3 times slower this time.
When tested with the value 13
instead of 2
, the results are a bit different:
Function | Duration |
---|---|
withIfs(13) | 1.6137685459999602 |
withMatch(13) | 1.6344169129999955 |
withMap(13) | 2.7638338360000034 |
withList(13) | 1.9673762900000042 |
The conditionals are still faster, but only marginally: list is 1.2 times slower, and map is only 1.7 times slower this time.
If the initialization is externalized, then the list appears to be the faster, the map is a bit slower, and the conditionals are the slowest:
map = {
1: "January",
2: "February",
3: "March",
4: "April",
5: "May",
6: "June",
7: "July",
8: "August",
9: "September",
10: "October",
11: "November",
12: "December",
}
def withMap(i):
return map.get(i, "Invalid month number")
list = ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"]
def withList(i):
try:
return list[i - 1]
except IndexError:
return "Invalid month number"
gives:
Function | Duration |
---|---|
withIfs(2) | 0.5488165699999854 |
withMatch(2) | 0.5435621729999411 |
withMap(2) | 0.5075932459999422 |
withList(2) | 0.4404710260000684 |
-
2This has an enormous performance gain on large keysets actually. The conditional statement, subject to (unreliable) branch prediction, is replaced with a math operation (instead of map an aŕray is often used). Moreover, this is a prominent optimisation in C and C++ compilers for switches and multiple if statements. This is effectively a special case of jump table optimisation.Basilevs– Basilevs07/30/2025 03:35:03Commented Jul 30 at 3:35
-
2for speed you have to use low level arrays, not collections like lists and maps. Also, you have to avoid creating objects, unless there is a scalar optimization done by JIT.Basilevs– Basilevs07/30/2025 09:43:21Commented Jul 30 at 9:43
-
1And of course if you cache the dict and list outside of those functions, then they become drastically faster. Always at least as fast as if/match, and often faster (the bigger
i
is).freakish– freakish07/30/2025 10:13:43Commented Jul 30 at 10:13 -
3Putting array/map/dictionary definitions in functions/methods is pretty darned silly; putting the definitions in the object's "global" section (where they're defined once) seems pretty obvious. I'm old, though, and probably accused of premature optimization.RonJohn– RonJohn07/30/2025 14:41:52Commented Jul 30 at 14:41
-
5@RonJohn: exactly! But it's easy to miss, if you're thinking in terms of pure refactoring—take a method, rewrite it slightly, call it a day. Not surprisingly, the author of the question missed this aspect as well. And so did I when I was originally writing my answer.Arseni Mourzenko– Arseni Mourzenko07/30/2025 15:28:54Commented Jul 30 at 15:28
Because if the values are fixed, that is, not expected to change and grow (which would have made polymorphism the better choice) then the better refactoring isn't an array. It's an Enum.
By defining every value in the same file you've given up on the ability to add more values without having to edit existing, tested, files. See the Open Closed Principle to know why you'd care.
If you're that confident that you're not going to be adding the month of Smarch then an Enum has the bonus of making an invalid value a compile time error. Much better than run time error messages.
Now, why there isn't there a replace conditional with Enum refactoring? Well because books can only be so big.
-
2If the aim of the function is to produce a user-facing string representation then, even if you replace the raw integers with an enum, you'll still need some sort of lookup table or switch statement to convert the enums to strings (assuming you're not using a language that provides enum->string conversion automatically).Sean Burton– Sean Burton07/30/2025 12:23:46Commented Jul 30 at 12:23
-
@SeanBurton If you have enum values as objects (like e.g. Java), they can have the name as a property.Paŭlo Ebermann– Paŭlo Ebermann07/31/2025 23:11:24Commented Jul 31 at 23:11
-
Polymorphism would be better when the values are expected to grow/change? Can you elaborate please? On first read, that seems like the absolute wrong tool for the job.infinitezero– infinitezero08/01/2025 06:15:45Commented Aug 1 at 6:15
-
It is. When the values don’t change. OCP explains this fairly well.candied_orange– candied_orange08/01/2025 17:18:06Commented Aug 1 at 17:18
Because it doesn't work for the general case.
The requirement for this to work is that you have a key/value pair of some kind.
Once you recognise that, your conditional is just the implementation of the lookup function on that list of values. This makes the refactoring question much more complicated because instead of a generic justification like "moving this into a private method makes code more readable" you are comparing the time and space complexity of various hash functions for small data sets and the switch case might well be the most optimal.
If you further simplify your code you can get rid of the entire function and just have monthNames[monthNumber]
in the calling code.
-
7No refactorings are applicable "in general"Basilevs– Basilevs07/30/2025 03:24:22Commented Jul 30 at 3:24
-
@Basilevs but the question is not what is refactoring, but why particular change is not "standard" refactoring - which is answered by this post in my view: the case where this refactoring is applicable far less common than other refactorings, thus it is not included in "standard" once...Alexei Levenkov– Alexei Levenkov07/31/2025 00:33:44Commented Jul 31 at 0:33
-
You could first replace a sequence of "if" statements with a switch. This will often give you the fastest and most compact code possible.
Then you can replace a "switch" with an array, which uses more complex code. And if the array is small, you might replace the array with a sequence of if’s. And in addition, you might want to look at a sequence of ternary expressions. Each of those changes is a refactoring (and the definition of refactoring only tells you that it replaces code with different but equivalent code, not that it is a good idea).
I’d just present the different ways how the action can be written in code, and different capabilities and pros/cons. For example, in this case the array approach has problems if you want abbreviated names for months, or month names in different languages. Or if you want specific text for this/next/previous month. Where the "switch" doesn’t work well either.
It is not a standard refactoring because in many cases it would not improve performance or readability of the code, so it has generally less value and in a high functioning team of developers it is a candidate to be refactored back to the conditional logic state.
Specifically with your example, the CS optimisation is to use an Enum, and languages like javascript that do not support enums do have guidelines or workarounds to implement enums using object maps similar to your example:
const monthNames = Object.freeze({
January: 1,
February: 2,
March: 3,
April: 4,
May: 5,
June: 6,
July: 7,
August: 8,
September: 9,
October: 10,
November: 11,
December: 12,
});
...
function getMonthName(monthNumber) {
var index = Object.values(monthNames).indexOf(monthNumber);
if (index < 0) {
return "Invalid month number";
} else {
return Object.keys(monthNames)[index];
}
}
Enums provide a structure that you can query like a table, but the code can still describe and constrain the data in a meaningful way.
If your conditions are simple static value comparisons then enums or tables could be made to work, but these is not likely to result in faster execution. For complex conditional logic to be expressed as a table, you would need to build a matrix of all the possible outcomes. In CS we use truth tables for this purpose, ironically conditional logic is the developer's tool for expressing truth table logic in an optimal way.
The reverse of your is the proposal is more often true, it is standard to refactor a lookup of a static table of expressions into pattern matching or other conditional logic.
- In a general sense then, using a table to replicate conditional logic is an anti-pattern.
From a code maintenance POV, the problem with tables or arrays in place of conditional expressions is that they abstract away the intent and generally make the code harder to read, and therefore maintain the code. In most cases we optimise the code for performance or to reduce memory footprint, which means the definition and initialization of a table in this case is usually moved to a different area of the process, whether that be using a Memoize pattern, Singleton, static or just to the constructor of the class...
Once the definition of your table has been moved from the implementation of your logic, it is very easy to conceive that other developers (or future you) might add additional records to that table and that may have unintended consequences for your code.
There are use cases where you can justify it, but as a general rule, it is an anti-pattern. You haven't gained any significant value through this refactoring but what we have done is introduced greater complexity, especially now that you have mapped a 1-based domain value to a 0-based array index. It is a simple enough brain cycle to compute, but if the user enters a value of 5
, and you need to change the response, you must now reverse engineer the logic to determine that '5' corresponds to the 5th element, which is counter-intuitive in 0-based programming languages... If the developer inserted a new element, maybe intentionally or by accidentally forgetting to remove the element that should have been replaced, that affects all of the other values but there are no compile time checks to detect this.
The catalog is not a catalog of all known refactoring. A vast number of refactoring exist, but only a small set of fundamental refactoring are selected by Fowler for his catalog. (And it is based on a book, so there is a practical limit on how many refactorings can be included.)
This is not a question of whether it is a "good" or "bad" refactoring. A refactoring is neutral in itself and only good insofar it brings you closer to a stated goal. If for example your goal is to support multiple translations of the month names, the "replace conditional with table" refactoring is good since it brings you a step closer to that goal. But the refactoring makes the code more complicated (e.g. by introducing the monthNumber - 1
calculation) so you should have a goal which justifies this.
The refactorings in the catalog are all vary broadly applicable, but the "replace conditional with table" refactoring can only apply when some very specific conditions are fulfilled:
- each branch in the conditional checks for a single constant value
- each branch returns a single result and does nothing else
- the values are integers
- all values in the given range have a result
It is just not that often you have this particular scenario, compared to the refactorings in the catalog which are much more general.
Furthermore, the refactoring is not "atomic". It should be split into at least two refactorings:
- replace conditional with select-case
- replace switch-case with lookup table
Each of these are simpler and more general than the compound refactoring. Switch-case is an appropriate intermediate step because it is closer to the lookup-table semantics.
-
A branch can check for multiple values - table rows are not necessarily unique. Fix first condition.Basilevs– Basilevs08/14/2025 07:30:36Commented Aug 14 at 7:30
-
For floating point keys (or anything else supportung comparison) there is a variant with sorted tree, that support ranges instead of a single value.Basilevs– Basilevs08/14/2025 07:37:08Commented Aug 14 at 7:37
-
Result can be a default or replacement value, handled by a dedicated code just like switch does. A sparse range of keys does not preclude optimization, unless it is too wide. I would suggest to replace the condition "all keys should have value" with "key range should be reasonably wide".Basilevs– Basilevs08/14/2025 09:49:12Commented Aug 14 at 9:49
Explore related questions
See similar questions with these tags.
switch
/match
though. And be done with. The details of the second snippet depend too much on the actual language/runtime. It will be sometimes worth it, sometimes the opposite. Whileswitch
/match
sounds like a sweet spot.