Why does python memory allocation behave like this?

Question 1

Consider the following code which is used for finding all valid parentheses placements with n parentheses.

def paren(n):
 ans = []
 def helper(string, left, right,n,foo):
 if len(string)==2*n:
 ans.append(string)
 if left > 0:
 helper(string+'(', left-1,right,n)
 if right > left:
 helper(string+')', left,right-1,n)
 helper('',n,n,n)

If we add a list (that has no practical use in the function) we get

def paren(n):
 ans = []
 def helper(string, left, right,n,foo):
 print(hex(id(foo)), foo)
 if len(string)==2*n:
 ans.append(string)
 if left > 0:
 helper(string+'(', left-1,right,n,[])
 if right > left:
 helper(string+')', left,right-1,n,[])
 helper('',n,n,n,[])
 
paren(2)
OUTPUT:
0x2e5e2446288 []
0x2e5e28e3508 []
0x2e5e28e3688 []
0x2e5e26036c8 []
0x2e5e27bafc8 []
0x2e5e28e3688 []
0x2e5e26036c8 []
0x2e5e27bafc8 []

Whereas if we explicitly pass foo each time then we get

def paren(n):
 ans = []
 def helper(string, left, right,n,foo):
 print(hex(id(foo)), foo)
 if len(string)==2*n:
 ans.append(string)
 if left > 0:
 helper(string+'(', left-1,right,n,foo)
 if right > left:
 helper(string+')', left,right-1,n,foo)
 helper('',n,n,n,[])
 
paren(2)
OUTPUT:
0x1c2cfec6288 []
0x1c2cfec6288 []
0x1c2cfec6288 []
0x1c2cfec6288 []
0x1c2cfec6288 []
0x1c2cfec6288 []
0x1c2cfec6288 []
0x1c2cfec6288 []

In the first case we get a different object in memory, why is this compared to the second case when we don't I think it is to do with with the fact that we create a new list rather than passing the function argument?

However when we add something to foo we get the same behaviour as with the first case:

def paren(n):
 ans = []
 def helper(string, left, right,n,foo):
 print(hex(id(foo)), foo)
 if len(string)==2*n:
 ans.append(string)
 if left > 0:
 helper(string+'(', left-1,right,n,foo+['bar'])
 if right > left:
 helper(string+')', left,right-1,n,foo+['bar'])
 helper('',n,n,n,[])
 
paren(2)
OUTPUT:
0x269572e6288 []
0x26959283548 ['bar']
0x26957363688 ['bar', 'bar']
0x2695925ae88 ['bar', 'bar', 'bar']
0x26957363408 ['bar', 'bar', 'bar', 'bar']
0x2695925ae88 ['bar', 'bar']
0x26957363408 ['bar', 'bar', 'bar']
0x269592833c8 ['bar', 'bar', 'bar', 'bar']

But strangely if we pass some int for foo, I will take 5 for demonstration we get:

def paren(n):
 ans = []
 def helper(string, left, right,n,foo):
 print(hex(id(foo)), foo)
 if len(string)==2*n:
 ans.append(string)
 if left > 0:
 helper(string+'(', left-1,right,n,5)
 if right > left:
 helper(string+')', left,right-1,n,5)
 helper('',n,n,n,5)
 
paren(2)
OUTPUT:
0x7ffef47293c0 5
0x7ffef47293c0 5
0x7ffef47293c0 5
0x7ffef47293c0 5
0x7ffef47293c0 5
0x7ffef47293c0 5
0x7ffef47293c0 5
0x7ffef47293c0 5

i.e. the same point in memory.

However if I replace the 5 in the above code with a larger int for instance 2550 I get the following:

0x2519f6d4790 2550
0x2519f9ec6f0 2550
0x2519f9ec6f0 2550
0x2519f9ec6f0 2550
0x2519f9ec6f0 2550
0x2519f9ec6f0 2550
0x2519f9ec6f0 2550
0x2519f9ec6f0 2550

So initially it is stored at a different memory address but each subsequent call is at the same address. Why is this changing from the case foo=5 what is going on here?

Also in examples where the memory address is changing between calls I do see the same memory addresses being used on more than one occasion for example:

...
0x2695925ae88 ['bar', 'bar', 'bar']
...
0x2695925ae88 ['bar', 'bar']
...

Why is this the case? Is python using previously used memory addresses to store new variables once the old ones are no longer on the recursion call stack?

My mind is really fuzzy on these behaviours so if anyone could help me out that would be great!

I have heard of things like pass by reference and pass by value, but I am not too sure what they mean and if it relates to this python example.

Thank you.

Question 2

First code: the call stack keeps a reference to each foo, so you have many lists in memory at once, each with a unique ID.

Second code: you pass the same list (that was initially empty) to each recursive call.

Third code: Cpython, as an implementation-specific optimization, caches small int constants for reuse

Fourth code Cpython does not cache large (i.e., greater than 256), so each occurrence of 2550 creates a new int object.

Question 3

Thanks for the good answer. One last point to ask though, you say that large values are not cached but if I declare two variables x = 1, y = 1000 and print the memory addresses they have the same address all the time even though y > 256 so it should have a different location right? Also if we take an address say 0x2519f6d4790 for example how many bits of information does this address hold?

Question 4

Also for the last example with 2550 you say this is not cached but then why do each recursive call store 2550 at the same location (with the exception of the initial call)? Surely the value 2550 as it is not cached would be stored in a separate location for each instance on the call stack?

Question 5

I believe that's a separate implementation-specific optimization. When the code for helper is generated, the literal 2550 is allocated once as a "local" constant, and every call to helper uses the same object.

Question 6

In general, it is not worth worrying about whether a literal produces a new object or reuses an existing object. Your code should not be written to depend on one scenario or the other. Object identity should only be used to test if two names refer to the same object or not, due to explicit assignment (and such use cases are relatively rare).

Question 7

I will try to explain this behavior to you with a generic explanation.

Python differentiates between reference data types and primitive data types. Reference data types save references in the memory to a stored value and not the value them self. On the other hand primitive (or value) data types save the value itself.

If you pass an array to a method it will create a new instance of this array every time. So doing some_function([]) generates a new array and passes it to the function. This means calling some_function([]) and then some_function([]) again will create one array on each call. However, if you create an array like foo = [] then foo will reference to the array and calling some_function(foo) and then some_function(foo) will execute some_function on this array.

This is not happening for values like float, int, boolean, but for values like objects and arrays.

Follow up research can be done by following keywords: reference data type, value data type, built in types.

I hope my answer was helpful in understanding the stated problem.

chepner 539k77 gold badges596 silver badges748 bronze badges · Accepted Answer · 2020-07-06 18:10:55Z

1

First code: the call stack keeps a reference to each foo, so you have many lists in memory at once, each with a unique ID.

Second code: you pass the same list (that was initially empty) to each recursive call.

Third code: Cpython, as an implementation-specific optimization, caches small int constants for reuse

Fourth code Cpython does not cache large (i.e., greater than 256), so each occurrence of 2550 creates a new int object.

Share

Improve this answer

answered Jul 6, 2020 at 18:10

chepner's user avatar

chepner

539k77 gold badges596 silver badges748 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

zepsaces

zepsaces Over a year ago

Thanks for the good answer. One last point to ask though, you say that large values are not cached but if I declare two variables x = 1, y = 1000 and print the memory addresses they have the same address all the time even though y > 256 so it should have a different location right? Also if we take an address say 0x2519f6d4790 for example how many bits of information does this address hold?

2020年07月08日T14:09:24.3Z+00:00

zepsaces

zepsaces Over a year ago

Also for the last example with 2550 you say this is not cached but then why do each recursive call store 2550 at the same location (with the exception of the initial call)? Surely the value 2550 as it is not cached would be stored in a separate location for each instance on the call stack?

2020年07月08日T14:13:52.597Z+00:00

chepner

chepner Over a year ago

I believe that's a separate implementation-specific optimization. When the code for helper is generated, the literal 2550 is allocated once as a "local" constant, and every call to helper uses the same object.

2020年07月08日T14:19:49.503Z+00:00

chepner

chepner Over a year ago

In general, it is not worth worrying about whether a literal produces a new object or reuses an existing object. Your code should not be written to depend on one scenario or the other. Object identity should only be used to test if two names refer to the same object or not, due to explicit assignment (and such use cases are relatively rare).

2020年07月08日T14:21:30.94Z+00:00

CollectivesTM on Stack Overflow

Why does python memory allocation behave like this?

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related