Python memory consumption of objects and process

Question 1

I wrote the following code:

from hurry.size import size
from pysize import get_zise
import os
import psutil
def load_objects():
 process = psutil.Process(os.getpid())
 print "start method"
 process = psutil.Process(os.getpid())
 print "process consumes " + size(process.memory_info().rss)
 objects = make_a_call()
 print "total size of objects is " + (get_size(objects))
 print "process consumes " + size(process.memory_info().rss)
 print "exit method"
def main():
 process = psutil.Process(os.getpid())
 print "process consumes " + size(process.memory_info().rss)
 load_objects()
 print "process consumes " + size(process.memory_info().rss)

get_size() returns the memory consumption of the objects using this code.

I get the following prints:

process consumes 21M
start method
total size of objects is 20M
process consumes 29M
exit method
process consumes 29M

How come the objects consumed 20M if the process consumed only 8M more?
If I exit a method shouldn't the memory decreased back to 21 as the garbage collector will clear the consumed memory?

Question 2

For second question, gc won't clear memory immediately as there is a cost of garbage collecting.

Question 3

I read that it would run ONLY if reaches threshold - is it right?

Question 4

No, it's not true. There is a complicated policy for garbage collection. ref: quora.com/…

Question 5

BTW, to use python pythonically, you need to believe that the underlying mechanics work well enough. Touching them will make you feel plain... For example, even you explicitly call gc.collect(), it won't work all the time.

Question 6

for 1, what is your issue, 8m vm overhead is too small? for 2, gc works, but it won't release back to os.

Question 7

Most likely this is because there's inaccuracy in your code.

Here's a fully working (python 2.7) example that has the same problem (I've slightly updated the original code for simplicity's sake)

from hurry.filesize import size
from pysize import get_size
import os
import psutil
def make_a_call():
 return range(1000000)
def load_objects():
 process = psutil.Process(os.getpid())
 print "start method"
 process = psutil.Process(os.getpid())
 print"process consumes ", size(process.memory_info().rss)
 objects = make_a_call()
 # FIXME
 print "total size of objects is ", size(get_size(objects))
 print "process consumes ", size(process.memory_info().rss)
 print "exit method"
def main():
 process = psutil.Process(os.getpid())
 print "process consumes " + size(process.memory_info().rss)
 load_objects()
 print "process consumes " + size(process.memory_info().rss)
main()

Here's the output:

process consumes 7M
start method
process consumes 7M
total size of objects is 30M
process consumes 124M
exit method
process consumes 124M

The difference is ~100Mb

And here's the fixed version of the code:

from hurry.filesize import size
from pysize import get_size
import os
import psutil
def make_a_call():
 return range(1000000)
def load_objects():
 process = psutil.Process(os.getpid())
 print "start method"
 process = psutil.Process(os.getpid())
 print"process consumes ", size(process.memory_info().rss)
 objects = make_a_call()
 print "process consumes ", size(process.memory_info().rss)
 print "total size of objects is ", size(get_size(objects))
 print "exit method"
def main():
 process = psutil.Process(os.getpid())
 print "process consumes " + size(process.memory_info().rss)
 load_objects()
 print "process consumes " + size(process.memory_info().rss)
main()

And here is the updated output:

process consumes 7M
start method
process consumes 7M
process consumes 38M
total size of objects is 30M
exit method
process consumes 124M

Did you spot the difference? You're calculating object sizes before measuring the final process size and it leads to additional memory consumption. Lets check why it might be happening - here's the sources of https://github.com/bosswissam/pysize/blob/master/pysize.py:

import sys
import inspect
def get_size(obj, seen=None):
 """Recursively finds size of objects in bytes"""
 size = sys.getsizeof(obj)
 if seen is None:
 seen = set()
 obj_id = id(obj)
 if obj_id in seen:
 return 0
 # Important mark as seen *before* entering recursion to gracefully handle
 # self-referential objects
 seen.add(obj_id)
 if hasattr(obj, '__dict__'):
 for cls in obj.__class__.__mro__:
 if '__dict__' in cls.__dict__:
 d = cls.__dict__['__dict__']
 if inspect.isgetsetdescriptor(d) or inspect.ismemberdescriptor(d):
 size += get_size(obj.__dict__, seen)
 break
 if isinstance(obj, dict):
 size += sum((get_size(v, seen) for v in obj.values()))
 size += sum((get_size(k, seen) for k in obj.keys()))
 elif hasattr(obj, '__iter__') and not isinstance(obj, (str, bytes, bytearray)):
 size += sum((get_size(i, seen) for i in obj))
 return size

Lots of things are happening here! The most notable one is that it holds all the objects it has seen in a set to resolve circular references. If you remove that line it won't each that much memory in either case.

First of all, this behavior heavily depends on whether you use CPython or something else. As of CPython, this may happen because it's not always possible to give memory back to the OS immediately.

Here's a good article on the subject, quoting:

If you create a large object and delete it again, Python has probably released the memory, but the memory allocators involved don’t necessarily return the memory to the operating system, so it may look as if the Python process uses a lot more virtual memory than it actually uses.

Question 8

Why would the process need to consume an overhead greater than 8M?
Garbage collection does not necessarily happen immediately. See the documentation:

Objects are never explicitly destroyed; however, when they become unreachable they may be garbage-collected. An implementation is allowed to postpone garbage collection or omit it altogether — it is a matter of implementation quality how garbage collection is implemented, as long as no objects are collected that are still reachable.

CPython implementation detail: CPython currently uses a reference-counting scheme with (optional) delayed detection of cyclically linked garbage, which collects most objects as soon as they become unreachable, but is not guaranteed to collect garbage containing circular references. See the documentation of the gc module for information on controlling the collection of cyclic garbage. Other implementations act differently and CPython may change. Do not depend on immediate finalization of objects when they become unreachable (so you should always close files explicitly).

score 4 · Accepted Answer · 2017-12-30 21:01:39Z

Most likely this is because there's inaccuracy in your code.

Here's a fully working (python 2.7) example that has the same problem (I've slightly updated the original code for simplicity's sake)

from hurry.filesize import size
from pysize import get_size
import os
import psutil
def make_a_call():
 return range(1000000)
def load_objects():
 process = psutil.Process(os.getpid())
 print "start method"
 process = psutil.Process(os.getpid())
 print"process consumes ", size(process.memory_info().rss)
 objects = make_a_call()
 # FIXME
 print "total size of objects is ", size(get_size(objects))
 print "process consumes ", size(process.memory_info().rss)
 print "exit method"
def main():
 process = psutil.Process(os.getpid())
 print "process consumes " + size(process.memory_info().rss)
 load_objects()
 print "process consumes " + size(process.memory_info().rss)
main()

Here's the output:

process consumes 7M
start method
process consumes 7M
total size of objects is 30M
process consumes 124M
exit method
process consumes 124M

The difference is ~100Mb

And here's the fixed version of the code:

from hurry.filesize import size
from pysize import get_size
import os
import psutil
def make_a_call():
 return range(1000000)
def load_objects():
 process = psutil.Process(os.getpid())
 print "start method"
 process = psutil.Process(os.getpid())
 print"process consumes ", size(process.memory_info().rss)
 objects = make_a_call()
 print "process consumes ", size(process.memory_info().rss)
 print "total size of objects is ", size(get_size(objects))
 print "exit method"
def main():
 process = psutil.Process(os.getpid())
 print "process consumes " + size(process.memory_info().rss)
 load_objects()
 print "process consumes " + size(process.memory_info().rss)
main()

And here is the updated output:

process consumes 7M
start method
process consumes 7M
process consumes 38M
total size of objects is 30M
exit method
process consumes 124M

Did you spot the difference? You're calculating object sizes before measuring the final process size and it leads to additional memory consumption. Lets check why it might be happening - here's the sources of https://github.com/bosswissam/pysize/blob/master/pysize.py:

import sys
import inspect
def get_size(obj, seen=None):
 """Recursively finds size of objects in bytes"""
 size = sys.getsizeof(obj)
 if seen is None:
 seen = set()
 obj_id = id(obj)
 if obj_id in seen:
 return 0
 # Important mark as seen *before* entering recursion to gracefully handle
 # self-referential objects
 seen.add(obj_id)
 if hasattr(obj, '__dict__'):
 for cls in obj.__class__.__mro__:
 if '__dict__' in cls.__dict__:
 d = cls.__dict__['__dict__']
 if inspect.isgetsetdescriptor(d) or inspect.ismemberdescriptor(d):
 size += get_size(obj.__dict__, seen)
 break
 if isinstance(obj, dict):
 size += sum((get_size(v, seen) for v in obj.values()))
 size += sum((get_size(k, seen) for k in obj.keys()))
 elif hasattr(obj, '__iter__') and not isinstance(obj, (str, bytes, bytearray)):
 size += sum((get_size(i, seen) for i in obj))
 return size

Lots of things are happening here! The most notable one is that it holds all the objects it has seen in a set to resolve circular references. If you remove that line it won't each that much memory in either case.

First of all, this behavior heavily depends on whether you use CPython or something else. As of CPython, this may happen because it's not always possible to give memory back to the OS immediately.

Here's a good article on the subject, quoting:

If you create a large object and delete it again, Python has probably released the memory, but the memory allocators involved don’t necessarily return the memory to the operating system, so it may look as if the Python process uses a lot more virtual memory than it actually uses.

CollectivesTM on Stack Overflow

Python memory consumption of objects and process

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related