I'm looking for ways to improve the performance of this code. Using numba
helps performance, but I'm not sure how to further improve it. Any (relatively) simple algorithmic improvements (or just any optimizations) I can make would be appreciated.
The typical range for end values will be between 1k-10k, start values are generally between 1-100.
'''
This program calculates the optimal number of upgrades to purchase
given a certain number of points to spend on upgrades
Upgrades cost:
- pennies: 1
- dimes: 3
- dollars: 6
- gold: 15
The upgrade bonus can be calculated as:
pennies + pennies * dimes + pennies * dimes * dollars
+ pennies * dimes * dollars * gold
or alternatively:
pennies * (1 + dimes * (1 + dollars * (1 + gold)))
'''
from numba import jit
import time
'''
Input: x is the number of points to spend
Output: returns a tuple containing
(pennies, dimes, dollars, gold, total bonus)
'''
@jit(nopython=True)
def get_max_score(x):
last_max_score = -1
last_max_combo = (0, 0, 0, 0, 0)
for pennies in range(x + 1):
points_left = x - pennies
max_dimes = points_left // 3
for dimes in range(max_dimes + 1):
points_left = x - pennies - dimes * 3
max_dollars = points_left // 6
for dollars in range(max_dollars + 1):
points_left = (x - pennies - dimes * 3 - dollars * 6)
gold = points_left // 15
if gold >= 0:
score = pennies * (1 + dimes * (1 + dollars * (1 + gold)))
if score >= last_max_score:
last_max_combo = (pennies, dimes, dollars, gold, score)
last_max_score = score
return last_max_combo
start_points = int(input("Start points: "))
end_points = int(input("End points: "))
time_start = time.perf_counter()
for y in range(start_points, end_points + 1):
print(f"start score: {y}, {get_max_score(y)}")
time_end = time.perf_counter()
print(f"Computation took {time_end - time_start}s")
-
\$\begingroup\$ It is a little difficult to understand what this code does. Can you edit your post to include some example function calls and the results you get? It kind of feels like the same problem as calculating change and choosing the fewest number of coins based on their values. But I can't be sure unless I see some sample input and output. \$\endgroup\$Greg Burghardt– Greg Burghardt2024年10月02日 23:07:15 +00:00Commented Oct 2, 2024 at 23:07
-
3\$\begingroup\$ Can you explain a little more about why the upgrade bonus is the way it is? Multiplying the number of dollars, dimes and pennies is very strange. \$\endgroup\$Reinderien– Reinderien2024年10月02日 23:21:51 +00:00Commented Oct 2, 2024 at 23:21
-
\$\begingroup\$ Could you please explain more clearly what your code does? \$\endgroup\$Luke L– Luke L2024年10月03日 00:37:03 +00:00Commented Oct 3, 2024 at 0:37
-
2\$\begingroup\$ @Reinderien (Took me far too long to install some graphing software... but matplotlib just seems to be the best anyway...) I've looked at the graph of x * (1 + y) where x + 3*y < 1000 the graph is a lot less linear than I had thought and LP is probably the worst option to use here. As such I have deleted the comments created by my terrible suggestion. \$\endgroup\$Peilonrayz– Peilonrayz ♦2024年10月03日 02:07:46 +00:00Commented Oct 3, 2024 at 2:07
-
\$\begingroup\$ @Peilonrayz Take it easy on yourself! It's a natural first step to assess linearity, and indeed if this were a linear problem there would be much more efficient solutions. \$\endgroup\$Reinderien– Reinderien2024年10月03日 11:56:47 +00:00Commented Oct 3, 2024 at 11:56
4 Answers 4
(削除) In coin systems where each denomination is at least twice as big as the previous denomination, the optimal algorithm for making change with the least number of coins is greedy (e.g. to make change with quarters, dimes, nickels, and pennies, start by using as many quarters as you can). Your coin system possesses this property (15 >= 2 * 6, 6 >= 2 * 3, 3 >= 2 * 1
). (削除ここまで)
The optimal solution will spend all provided points. (If you ever have leftover points, just dump them all into pennies.) Therefore, you should start with the largest denominations to avoid situations where you can't fit them at the end (e.g. if you have 14 points left after placing pennies / dimes / dollars, it's a wasted effort). Doing so also lets you compute the score incrementally:
def get_max_score_fast(points):
last_max_score = 0
last_max_combo = (0, 0, 0, 0)
max_gold = points // 15
for gold in range(max_gold + 1):
points_past_gold = points - 15 * gold
max_dollars = points_past_gold // 6
score_gold = 1 + gold
for dollars in range(max_dollars + 1):
points_past_dollars = points_past_gold - 6 * dollars
score_dollars = 1 + dollars * score_gold
max_dimes = points_past_dollars // 3
for dimes in range(max_dimes + 1):
pennies = points_past_dollars - 3 * dimes
score = pennies * (1 + dimes * score_dollars)
if score > last_max_score:
last_max_combo = (pennies, dimes, dollars, gold)
last_max_score = score
return *last_max_combo, last_max_score
Let's run both implementations and compare their outputs:
n_tests = 2000
values1 = [None] * n_tests
values2 = [None] * n_tests
start1 = time.perf_counter()
for i in range(n_tests):
values1[i] = get_max_score(i)
diff1 = time.perf_counter() - start1
print(f"Computation took {diff1}s")
start2 = time.perf_counter()
for i in range(n_tests):
values2[i] = get_max_score_fast(i)
diff2 = time.perf_counter() - start2
print(f"Computation took {diff2}s")
for i in range(n_tests):
for j in range(5):
if values1[i][j] != values2[i][j]:
raise ValueError("Unequal!")
Here are the runtimes vs the original implementation:
Computation took 63.30500140000004s
Computation took 1.2820670000000973s
This yields a speedup of over 40x. Hope this helps!
Update: A Dynamic Programming Solution From Rainer P.
The code below runs in O(n^2)
time complexity. It is a modification of the answer written by Rainer P. in order to generate the appropriate output for testing.
def dp_advance(x, coin_value, prev_score):
# opt_score[i] is the optimal 1 + coins * prev_score with i points
opt_score = [0] * (x + 1)
# opt_coins[i] is the coins corresponding to opt_score[i]
opt_coins = [0] * (x + 1)
for coins in range(x // coin_value + 1):
points_spent_here = coins * coin_value
for points_spent_before in range(x + 1 - points_spent_here):
points_spent_total = points_spent_here + points_spent_before
new_score = 1 + coins * prev_score[points_spent_before]
if new_score > opt_score[points_spent_total]:
opt_score[points_spent_total] = new_score
opt_coins[points_spent_total] = coins
return opt_score, opt_coins
def get_max_score_fast(x):
score_base = [1] * (x + 1)
points = x
score_gold, coins_gold = dp_advance(x, 15, score_base)
score_dollars, coins_dollars = dp_advance(x, 6, score_gold)
score_dimes, coins_dimes = dp_advance(x, 3, score_dollars)
score_pennies, coins_pennies = dp_advance(x, 1, score_dimes)
pennies = coins_pennies[x]
x -= pennies * 1
dimes = coins_dimes[x]
x -= dimes * 3
dollars = coins_dollars[x]
x -= dollars * 6
gold = coins_gold[x]
return pennies, dimes, dollars, gold, score_pennies[points] - 1
Here are the runtimes:
Computation took 64.49576829999933s
Computation took 3.1725221000015154s
Although it is slower for small point budgets, it scales better - see the comparison below for n = 10000
.
Computation took 613.0429686999996s # my solution
Computation took 249.5443798999986s # dynamic programming
Note that the DP approach sometimes finds different combinations than the original code, such as 6 pennies vs 3 pennies and 1 dime.
Caching with DP?
If you are allowed to use global variables, the DP solution can be cached. After get_max_score_fast(b)
, you can later get get_max_score_fast(a)
in O(1)
time if a < b
.
def get_cached_dp(x):
score_base = [1] * (x + 1)
score_gold, coins_gold = dp_advance(x, 15, score_base)
score_dollars, coins_dollars = dp_advance(x, 6, score_gold)
score_dimes, coins_dimes = dp_advance(x, 3, score_dollars)
score_pennies, coins_pennies = dp_advance(x, 1, score_dimes)
return score_pennies, coins_pennies, coins_dimes, coins_dollars, coins_gold
cached_x = -1
cached_dp = None
def get_max_score_fast(x):
global cached_x
global cached_dp
if x > cached_x:
cached_x = x
cached_dp = get_cached_dp(x)
score_pennies, coins_pennies, coins_dimes, coins_dollars, coins_gold = cached_dp
score = score_pennies[x] - 1
pennies = coins_pennies[x]
x -= pennies * 1
dimes = coins_dimes[x]
x -= dimes * 3
dollars = coins_dollars[x]
x -= dollars * 6
gold = coins_gold[x]
return pennies, dimes, dollars, gold, score
Computing the largest budget first maximizes the benefit. For n = 10000
:
Computation took 13.706997700001011s
That's a >10x speedup compared to no caching for DP.
-
4\$\begingroup\$ The goal is maximum score, not least number of coins. Are you sure that a greedy algorithm is optimal? \$\endgroup\$Rainer P.– Rainer P.2024年10月02日 23:46:47 +00:00Commented Oct 2, 2024 at 23:46
-
\$\begingroup\$ I don't know if a greedy algorithm is optimal for this problem. However, the known greedy algorithms for change-making inspired me to approach the problem from the largest denominations, which turned out well. \$\endgroup\$Justin Chang– Justin Chang2024年10月02日 23:59:56 +00:00Commented Oct 2, 2024 at 23:59
-
4\$\begingroup\$ Greedy change-making is definitely not optimal for this problem. For example, if you've got 3 points, a greedy change-making algorithm would say to buy 1 dime. That would give a score of 0, worse than buying 3 pennies. However, the code here is not a greedy algorithm. The first algorithm is actually a brute-force search. The properties of greedy algorithms aren't really relevant. \$\endgroup\$user2357112– user23571122024年10月03日 07:33:12 +00:00Commented Oct 3, 2024 at 7:33
-
\$\begingroup\$ The dynamic programming algorithm is really fast, however I was trying to think of some way to speed it up more and I noticed one thing: since the program generally calls
get_max_score_fast
multiple times, is there any work that can be reused between consecutiveget_max_score_fast
calls? \$\endgroup\$ayaan098– ayaan0982024年10月03日 15:45:37 +00:00Commented Oct 3, 2024 at 15:45 -
1\$\begingroup\$ I wonder if it might make sense to use a
@functools.cache
decorator on a recursive function part of the dynamic programming implementation, in order to have that handle the caching automatically instead of rolling your own. \$\endgroup\$Daniel Schepler– Daniel Schepler2024年10月04日 17:52:35 +00:00Commented Oct 4, 2024 at 17:52
This exercise smells an awful lot like dynamic programming. Let's start with your formula:
pennies * (1 + dimes * (1 + dollars * (1 + gold)))
When you optimize the number of dimes, what factors do you need to consider? Answer:
- How many dimes you want to buy.
- How many points you spent on gold and dollars combined.
- How many points you have left for buying pennies.
Note the emphasized word. You don't have to consider dollars and gold separately, only how many points you spent on them in total. For every amount of points, there is an optimal solution that maximizes (1+dollars*(1+gold)). We can store this optimal score in a lookup table, indexed by the total amount of points spent. We don't need to know if it was 12 dollars and 5 gold or 10 dollars and 6 gold, only that we spent 150 points on (dollars+gold) and the best value of (1+dollars*(1+gold)) was 73.
When we move to optimizing pennies, we consider the value of (1+dimes*(1+dollars*(1+gold))) and how many points we spent on (dimes+dollars+gold) combined. This results in quadratic complexity, regardless of how many types of coins we have. Here we had four types and your naive algorithm had cubic complexity. With five or six types of coins, the naive algorithm becomes unusable. Dynamic programming remains quadratic.
The code could look roughly as shown below (I didn't test it, it may have bugs):
score_gold = [0] * (x + 1)
for gold in range(x // 15 + 1):
points_spent_here = gold * 15
score_gold[points_spent_here] = 1 + gold
score_dollars = [0] * (x + 1)
for dollars in range(x // 6 + 1):
points_spent_here = dollars * 6
for points_spent_before in range(x + 1 - points_spent_here):
points_spent_total = points_spent_here + points_spent_before
score_dollars[points_spent_total] = max(score_dollars[points_spent_total], 1 + dollars * score_gold[points_spent_before])
score_dimes = [0] * (x + 1)
for dimes in range(x // 3 + 1):
points_spent_here = dimes * 3
for points_spent_before in range(x + 1 - points_spent_here):
points_spent_total = points_spent_here + points_spent_before
score_dimes[points_spent_total] = max(score_dimes[points_spent_total], 1 + dimes * score_dollars[points_spent_before])
score_pennies = [0] * (x + 1)
for pennies in range(x + 1):
points_spent_here = pennies * 1
for points_spent_before in range(x + 1 - points_spent_here):
points_spent_total = points_spent_here + points_spent_before
score_pennies[points_spent_total] = max(score_pennies[points_spent_total], pennies * score_dimes[points_spent_before])
-
\$\begingroup\$ Please DO NOT post any code that may have errors. I haven't tested it thoroughly yet, but I do suspect there may be some bugs. Please always test your code before posting. \$\endgroup\$Luke L– Luke L2024年10月03日 00:44:42 +00:00Commented Oct 3, 2024 at 0:44
-
3\$\begingroup\$ @LukeL I've posted code with errors before as I don't always run the code in my answers. If someone notices an error the edit is typically minor, so a comment + edit and we're both happy after 5 mins. For some users, having a bug has helped the asker understand the code by truly thinking about the code in my answer. We have no requirement for code in answers to be runnable without issue. \$\endgroup\$2024年10月03日 01:02:31 +00:00Commented Oct 3, 2024 at 1:02
-
1\$\begingroup\$ +1 because the approach above has identical output when modified to return the allocation of coins as well. I have inserted a version in my answer. \$\endgroup\$Justin Chang– Justin Chang2024年10月03日 01:30:52 +00:00Commented Oct 3, 2024 at 1:30
I can't think of any optimizations, but I'm sure someone else will. In the meantime, here are some minor style considerations.
Documentation
It is great that you have a docstring at the top of your code to summarize its purpose. I suggest adding a little more information to describe what an "upgrade" is.
Naming
You did a good job of naming the function and variables.
The function docstring is helpful, but I suggest giving the function
input variable a more meaningful name than x
, perhaps points_to_spend
.
Constants
You repeatedly use numeric values like 3 and 6 for the cost. Consider assigning them names instead:
PENNY = 1
DIME = 3
DOLLAR = 6
GOLD = 15
Then change:
max_dimes = points_left // 3
to:
max_dimes = points_left // DIME
Etc.
Comment
It might be worth adding a comment explaining why your range is
(x + 1)
instead of (x)
in the following line:
for pennies in range(x + 1):
The same for:
for dimes in range(max_dimes + 1):
for dollars in range(max_dollars + 1):
Output
When I run the code, I get an output like:
Computation took 1.3014662526547909s
If you don't need all those digits of precision, it would be easier to read
by formatting the output with something like .4f
. Change:
print(f"Computation took {time_end - time_start}s")
to:
print(f"Computation took {time_end - time_start:.4f}s")
New output:
Computation took 1.3015s
If you need to improve speed more than the Dynamic Programming approach you could try a greedy algorithm but there isn't a guarantee the solution will be optimal. The time complexity is O(n) so it will be much faster for large number of points.
def add_coins_brute(start_pennies, start_dimes, start_dollars, start_gold, points):
last_max_score = 0
last_max_combo = (0, 0, 0, 0)
initial_points = start_pennies + 3*start_dimes + 6*start_dollars + 15*start_gold
max_gold = points // 15
for gold in range(start_gold, start_gold + max_gold + 1):
points_past_gold = points - 15 * (gold - start_gold)
max_dollars = points_past_gold // 6
score_gold = 1 + gold
for dollars in range(start_dollars, start_dollars + max_dollars + 1):
points_past_dollars = points_past_gold - 6 * (dollars - start_dollars)
score_dollars = 1 + dollars * score_gold
max_dimes = points_past_dollars // 3
for dimes in range(start_dimes, start_dimes + max_dimes + 1):
pennies = start_pennies + points - (
3 * (dimes - start_dimes)
+ 6 * (dollars - start_dollars)
+ 15 * (gold - start_gold)
)
score = pennies * (1 + dimes * score_dollars)
if score > last_max_score:
last_max_combo = (pennies, dimes, dollars, gold)
last_max_score = score
return *last_max_combo, last_max_score
def get_max_score(x):
pennies = 0
dimes = 0
dollars = 0
gold = 0
score = 0
greedy_step = 21 # number of points to add at a time
# take smaller step first
if x % greedy_step != 0:
step = x % greedy_step
pennies, dimes, dollars, gold, score = add_coins_brute(
pennies, dimes, dollars, gold, step)
x -= step
while x > 0:
# add greedy_step number of points
pennies, dimes, dollars, gold, score = add_coins_brute(
pennies, dimes, dollars, gold, greedy_step)
x -= greedy_step
return pennies, dimes, dollars, gold, score
The idea is to add x points at a time with a brute for approach (could use DP instead for the step) then add the next x points such that you are maximizing the score each time and repeat until you've spent all the points. The first number of steps is the remainder of the total number of points to add and your step size so that you end up spending all the points. The add_coins_brute function is based off of Justin Chang's brute force method, modified to allow some initial starting coins.
I tested up to 2000 points and got the same output as the brute force method from Justin Chang.
n_tests = 2000
values1 = [None] * n_tests
values2 = [None] * n_tests
start1 = time.perf_counter()
for i in range(n_tests):
values1[i] = get_max_score_brute(i)
diff1 = time.perf_counter() - start1
print(f"Computation took {diff1}s")
start2 = time.perf_counter()
for i in range(n_tests):
values2[i] = get_max_score(i)
diff2 = time.perf_counter() - start2
print(f"Computation took {diff2}s")
for i in range(n_tests):
if values1[i][-1] != values2[i][-1]:
print(f"brute force: {values1[i]} != O(n) method: {values2[i]}\n")
Output:
Computation took 577.1469325999788s
Computation took 0.6601930999895558s
The speed improves from 151.60913479997544s with brute force to 0.004353099997388199s with greedy method for me when evaluating with points=10000 and provides the same output. I would suggest testing this for the rest of your expected inputs before using it, or just precompute everything and use a lookup table.
EDIT
The value of greedy step is chosen somewhat arbitrarily. It has to be at least 15 to be able to add gold coins. I used greedy_step=15 initially and it failed to find the optimal solution for various numbers within your expected range (points=85 for example). It fails at that step because in the second to last step (70 points have been added) the optimal combination is:
(25, 9, 3, 0, 925)
and the optimal step with 85 points is:
(22, 8, 4, 1, 1606)
(no idea if either of these are unique optimal values, just what I get with the full brute force method). In general the greedy method will fail to find the optimal value if it ever adds more of a coin than the optimal solution has, in this case too many pennies in the second to last step. This should happen less when taking larger steps (increasing greedy_step) because more coins are added in the last step, although it costs performance because the algorithm scales as O(greedy_step^3) with brute force steps or O(greedy_step^2) with DP steps. I increased to 21 because it was enough to add a gold coin and a dollar (or any other equivalent combination) at the last step and I'm hoping that is enough to give optimal solutions in the range the function will be evaluated at. It was for the cases I tested but like I said there is no guarantee so if you need optimal solution you should test it and I would just use a lookup table if it doesn't work (O(1))
Better answer
As Rainer suggested in the comment, evenly dividing the points between each time of coin gives approximately the correct answer. Because you have to have an integer number of coins you can't always evenly divide thing, not sure how he was thinking of dividing them, but just searching the local grid around this point should always be optimal. Code below, it's very fast. Didn't think about this initially but it is much better. My answer is long now, I don't know if I should just delete the previous stuff.
def get_grid_center(points):
"""
Return the # of each coin type if dividing points evenly.
Parameters
----------
points : int
The total number of points spent on coins.
Returns
-------
center : tuple
Number of each type of coin if points are distributed evenly.
Will be a float if needed.
"""
center = (points/4, points/4/3, points/4/6, points/4/15)
return center
def get_max_fast(points):
"""
Performs brute force search for the combination of coins that
maximizes the score. Searches a grid centered around dividing
the coins evenly.
Parameters
----------
points : int
The total number of points spent on coins.
Returns
-------
tuple
A tuple with integer values for
(pennies, dimes, dollars, gold, score)
"""
last_max_score = 0
last_max_combo = (0, 0, 0, 0)
center = get_grid_center(points)
max_gold = math.ceil(center[-1])+1
min_gold = max([0,math.floor(center[-1])-1])
for gold in range(min_gold, max_gold):
max_dollars = math.ceil(center[2])+3
min_dollars = max([0,math.floor(center[2])-3])
for dollars in range(min_dollars, max_dollars):
max_dimes = math.ceil(center[1])+5
min_dimes = max([0,math.floor(center[-1])-5])
for dimes in range(min_dimes, max_dimes):
pennies = points - (3 * dimes + 6 * dollars + 15 * gold)
if pennies < 0:
# we have over spent points, should not use this score
continue
score = pennies * (1 + dimes * (1 + dollars * (1 + gold)))
if score > last_max_score:
last_max_combo = (pennies, dimes, dollars, gold)
last_max_score = score
return *last_max_combo, last_max_score
-
\$\begingroup\$ If you're happy with a solution that is not guaranteed to be optimal, you can solve this in O(1) by spending an equal amount of points on each type of coin. The optimal solution for 3000 points is (756, 251, 126, 49) and if you blindly spend a quarter of the points on each coin type, you get (750, 250, 125, 50). This streategy is unbeatably fast and asymptotic correctness is easy to prove. \$\endgroup\$Rainer P.– Rainer P.2024年10月04日 15:23:45 +00:00Commented Oct 4, 2024 at 15:23
-
\$\begingroup\$ @RainerP. what exactly do you mean by asymptotic correctness? A Google search didn't turn up much. \$\endgroup\$ayaan098– ayaan0982024年10月04日 15:49:39 +00:00Commented Oct 4, 2024 at 15:49
-
\$\begingroup\$ Where does the number 21 in
greedy_step
come from? \$\endgroup\$ayaan098– ayaan0982024年10月04日 15:58:22 +00:00Commented Oct 4, 2024 at 15:58 -
\$\begingroup\$ @RainerP might be a good idea to use a brute force or greedy method for small n and switch to your suggestion for large n (depending on whether or not they need an exact solution). \$\endgroup\$user286929– user2869292024年10月04日 16:10:10 +00:00Commented Oct 4, 2024 at 16:10
-
1\$\begingroup\$ @ayann098 I think with asymptotic correctness they mean that it approaches the optimal solution as the number of points increases. I'll add an edit for the value of greedy_step. \$\endgroup\$user286929– user2869292024年10月04日 16:13:10 +00:00Commented Oct 4, 2024 at 16:13