Memory management for python scripts

Question 1

So I'm trying to solve some problems from the Euler project in python. I'm currently working on Problem 92, square digit chains. Basically the idea is that if you take any integer, and square its component digits recursively (e.g. 42 = 4² + 2² = 20, then 2² + 0² = 4, etc.), you always end up either at 1 or 89.

I am trying to write a program that can compute how many numbers, in a range 1 to 10^K, will end up in 89 and how many will end up in 1. I am not trying to store which integers end up where, only how many. The goal is to be able to do that for the largest K possible. (This is a challenge from Hackerrank for those curious).

In other to do for large number within my lifetime, I need to use caching. But then that's a balancing act between caching (which eventually takes up lots of RAM) and computing time.

My problem is that I eventually run out of memory. So I have tried to cap the length of the cache that I am using. However, I still run out of memory. I cannot seem to be able to find what is causing me to run out of memory.

I am running it on pycharm on ubuntu 14.04 LTS.

My question:

Is there a way to check what is taking up my RAM? Is there some tool (or script) that can allow me to basically monitor memory use by variables within my program? Or an wrong in assuming that if I run out of RAM, it is necessarily because some variable in my program is too large? I have to admit I am not all that clear on the fine details of memory use within a program....

EDIT: I run out of mem when K = 8, so for integers up to 10⁸, which is not so large. Also, I did testing before 10⁸ (so 10⁷, which terminates but takes some time and uses more memory than smaller computation). And it doesn't seem that capping my cache size variables makes a differences.....

Question 2

Does this help? pypi.python.org/pypi/memory_profiler

Question 3

Maybe - but the problem is that this is line by line, so for a recursive algorithm I'm unsure how it will fare? My ideal solution would monitor on a variable by variable basis. Unless there is something fundamental I am missing here regard mem alloc....

Question 4

Actually thinking about it might be better than I first thought - I could monitor it function by function which is already an improvement. Worth a try

Question 5

Why are you using recursion?

Question 6

It seemed to be a good approach as you basically want to compute at thing over and over until you arrive at 1 or 89. There might be better approach... However the quesiton is more about monitoring mem alloc and less about algorithm...

Question 7

I would suggest testing various cache sizes to see if it is actually beneficial to have as large a cache as possible.

If you take any 10-digit number and compute the sum of squares of its digits, the sum will be at most 10*9*9 = 810. Thus, if you cache the result for numbers 1 to 810, then you should be able to process all numbers with between 4 and 10 digits without recursion.

In this way, I have processed the first 10^8 numbers in around 6 minutes with memory usage staying constant at roughly 10 MB.

Question 8

No this is more complicated than this. For example the suit starting with number 6: 6, 36, 45, 41, 17, 50, 25, 89. For some numbers you may have to iterate quite a few times before figuring out where you actually end up.... As for caching, with larger numbers I divide runtime by 2 for a factor 10 of greater caching... So it is significant.

Question 9

@Francky_V You figure this out once and then cache 89 with key 6. You only have to figure things out once in the range 1-810

Question 10

Oh crap now I see what you mean. Ok from an algorithmic perspective this is a better approach. Not the original question but thanks for help!

Question 11

This is a variation of Mathias Rav's excellent idea but keeps your idea of using a recursive function with memozation. The idea is to use a helper function to do the heavy lifting and have the main function just do the first step of the iteration. The very first step reduces the problem size to one for which caching is useful. The cache remains small. I was able to do all numbers up to 10**8 in about 10 minutes (the overhead due to the recursion makes this solution less efficient than Mathias' solution):

cache = {}
def helper(n):
 if n == 1 or n == 89:
 return n
 elif n in cache:
 return cache[n]
 else:
 ss = sum(int(d)**2 for d in str(n))
 v = helper(ss)
 cache[n] = v
 return v
def f(n):
 ss = sum(int(d)**2 for d in str(n))
 return helper(ss)
def freq89(n):
 total = 0
 for i in range(1,n+1):
 if f(i) == 89: total += 1
 return total/n

Question 12

The big hog in your version isn't the recursion. It's sum(int(d)**2 for d in str(n)). I've combined your answer and that of @MathiasRav into something that is fast. I'll make that a community answer so as not to steal points from either one of you.

Question 13

@DavidHammen If you have a good answer, no need to make it a community answer. I'd be happy to upvote a solution which is quicker than mine. I look forward to seeing it.

Question 14

This is an extended comment on the answers by Mathias Rav and John Coleman. I was going to make this a community wiki answer. John Coleman said not to do so, so I'm not.

I'll start with John Coleman's answer.

cache = {}
def helper(n):
 if n == 1 or n == 89:
 return n
 elif n in cache:
 return cache[n]
 else:
 ss = sum(int(d)**2 for d in str(n))
 v = helper(ss)
 cache[n] = v
 return v
def f(n):
 ss = sum(int(d)**2 for d in str(n))
 return helper(ss)

A small thing that will speed things up a bit is to avoid that first if in helper(n) by initializing cache to {1:some_value, 89:some_other_value}. The obvious initialization is {1:1, 89:89}. A less obvious, but ultimately faster initialization is {1:False, 89:True}. This enables changing if f(i) == 89: total += 1 to if f(i): total += 1.

Another small thing that might help is to get rid of the recursion. That's not the case here. To get rid of the recursion, we'd have to do something along the lines of

def helper(n):
 l = []
 while n not in cache :
 l.append(n)
 n = sum(int(d)**2 for d in str(n))
 v = cache[n]
 for k in l : 
 cache[k] = v
 return v

The problem is that almost all of the numbers encountered by f(n) will already be in the cache thanks to how helper is called from f(n). Getting rid of the recursion needlessly creates an empty list that needs to be garbage collected.

The big issue with John Coleman's answer is the calculation of the sum of the square of the digits via sum(int(d)**2 for d in str(n)). While very pythonic, this is extremely expensive. I'll start by changing the variable ss in helper and in f into a function:

def ss(n):
 return sum(int(d)**2 for d in str(n))

This alone does nothing for performance. In fact, it hurts performance. Function calls are expensive in python. By making this a function, we can do some non-pythonic things by replacing the string operations with integer arithmetic:

def ss(n):
 s = 0
 while n != 0:
 d = n % 10
 n = n // 10
 s += d**2
 return s

The speedup here is quite significant; I get a 30% reduction in computation time. That's not great. There's another problem, the use of the exponentiation operator. In almost any language but Fortran and Matlab, using d*d is much faster than is d**2. That's certainly the case in python. That simple change almost halves the execution time from that already significant 30% reduction.

Putting this all together yields

cache = {1:False, 89:True}
def ss (n):
 s = 0
 while n != 0:
 d = n % 10
 n = n // 10
 s += d*d
 return s
def helper(n):
 if n in cache:
 return cache[n]
 else:
 v = helper(ss(n))
 cache[n] = v
 return v
def f(n):
 return helper(ss(n))
def freq89(n):
 total = 0
 for i in range(1,n+1):
 if f(i): total += 1
 return total/n
print (freq89(int(1e7)))

I have yet to take advantage of Mathias Rav's answer. In this case, it will make sense to get rid of the recursion. It will also help to embed the loop over the initial range inside of the function that initializes the cache (function calls are expensive in python).

N = int(1e7)
cache = {1:False, 89:True}
def ss(n):
 s = 0
 while n != 0:
 d = n % 10
 n //= 10
 s += d*d
 return s
def initialize_cache(maxsum):
 for n in range(1,maxsum+1):
 l = []
 while n not in cache:
 l.append(n)
 n = ss(n)
 v = cache[n]
 for k in l:
 cache[k] = v
def freq89(n):
 total = 0
 for i in range(1,n):
 if cache[ss(i)]:
 total += 1
 return total/n
maxsum = 81*len(str(N-1))
initialize_cache(maxsum)
print (freq89(N))

The above takes about 16.5 seconds (on my computer) to calculate the ratio for numbers between 1 (inclusive) and 10000000 (exclusive) on my computer. This is almost three times faster than the initial version (44.7 seconds). It takes a bit over three minutes for the above to calculate calculate the ratio for numbers between 1 (inclusive) and 1e8 (exclusive).

It turns out I'm not done. There's no need to calculate the sum of the squares of the digits of (for example) 12345679 digit by digit when the program just did that for 12345678. A shortcut that reduces the calculation time for nine out of ten use cases pays off. The function ss(n) becomes a bit more complex:

prevn = 0 
prevd = 0 
prevs = 0 
def ss(n):
 global prevn, prevd, prevs
 d = n % 10
 if (n == prevn+1) and (d == prevd+1):
 s = prevs + 2*prevd + 1 
 prevs = s 
 prevn = n 
 prevd = d 
 return s
 s = 0 
 prevn = n 
 prevd = d 
 while n != 0:
 d = n % 10
 n //= 10
 s += d*d 
 prevs = s 
 return s

With this, calculating the ratio for numbers up to (but not including) 1e7 takes 6.6 seconds, 68 seconds for numbers up to but not including 1e8.

Question 15

Nice. Python sometimes temps you into writing elegant but slow code. I have trouble saying "no" to a slick comprehension.

Question 16

@JohnColeman - I have the same problem. That said, those slick comprehensions are typically faster than is the same code written as a loop. In most cases, a short and sweet list comprehension beats out verbose code in both speed and clarity. BTW, I increased the speed even more by taking advantage of the fact that (d+1)**2 is d**2+2d+1. A reduction in CPU time from 44.7 seconds to 6.3 seconds is more than enough of a reduction to justify not using a slick list comprehension in this case.

Question 17

@JohnColeman - There is a way to avoid using int(d)**2 in your original list comprehension without going to a separate function: Use sum(d*d for d in [int(c) for c in str(n)]). It takes less time to build that intermediate list than it does to invoke the exponentiation operator. This coupled with my first change (cache={1:False, 89:True}) reduces the CPU time of your answer by a nice amount, from 44.7 seconds to 34.5 seconds.

Mathias Rav 2,97317 silver badges25 bronze badges · Accepted Answer · 2015-10-18 14:17:10Z

3

I would suggest testing various cache sizes to see if it is actually beneficial to have as large a cache as possible.

If you take any 10-digit number and compute the sum of squares of its digits, the sum will be at most 10*9*9 = 810. Thus, if you cache the result for numbers 1 to 810, then you should be able to process all numbers with between 4 and 10 digits without recursion.

In this way, I have processed the first 10^8 numbers in around 6 minutes with memory usage staying constant at roughly 10 MB.

Share

Improve this answer

edited Oct 18, 2015 at 14:50

answered Oct 18, 2015 at 14:17

Mathias Rav's user avatar

Mathias Rav

2,97317 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

logicOnAbstractions

logicOnAbstractions Over a year ago

No this is more complicated than this. For example the suit starting with number 6: 6, 36, 45, 41, 17, 50, 25, 89. For some numbers you may have to iterate quite a few times before figuring out where you actually end up.... As for caching, with larger numbers I divide runtime by 2 for a factor 10 of greater caching... So it is significant.

2015年10月18日T14:39:56.43Z+00:00

John Coleman

John Coleman Over a year ago

@Francky_V You figure this out once and then cache 89 with key 6. You only have to figure things out once in the range 1-810

2015年10月18日T14:46:28.277Z+00:00

logicOnAbstractions

logicOnAbstractions Over a year ago

Oh crap now I see what you mean. Ok from an algorithmic perspective this is a better approach. Not the original question but thanks for help!

2015年10月18日T14:51:38.693Z+00:00

CollectivesTM on Stack Overflow

Memory management for python scripts

3 Answers 3

3 Comments

2 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

3 Answers 3

3 Comments

2 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related