I am trying to get my head around on mutable and immutable objects. I have read that string is immutable and that for each string, a separate object is created with a different object ID. I am trying to verify this using below simple code, however, I see same object ID for multiple strings which are not same. Can someone please clarify this. Thanks in advance.
mystring = ""
mylist = ["This ", "That ", "This ", "That ", "This ", "That ", "This ", "That "]
for item in mylist:
mystring = mystring + item
print("mystring: ", mystring, "ID of mystring: ", id(mystring))
which results in below output:
mystring: This ID of mystring: 6407264
mystring: This That ID of mystring: 42523448
mystring: This That This ID of mystring: 42523448
mystring: This That This That ID of mystring: 6417200
mystring: This That This That This ID of mystring: 42785608
mystring: This That This That This That ID of mystring: 42785608
mystring: This That This That This That This ID of mystring: 42837536
mystring: This That This That This That This That ID of mystring: 42775856
2 Answers 2
Python is allowed to reuse object IDs for objects with non-overlapping lifetimes, but you're seeing ID reuse in cases where there should be a lifetime overlap. Specifically, during execution of this statement:
mystring = mystring + item
between the evaluation of mystring + item
and the assignment to mystring
, there should be a lifetime overlap between any two successive values of mystring
. You're seeing ID reuse for successive values of mystring
, which shouldn't happen.
The effect you're seeing happens because of an optimization in the CPython bytecode evaluation loop, where statements of the form
string1 = string1 + string2
or
string1 += string2
are detected, and if the interpreter can confirm that string1
has no other references, it attempts to perform the concatenation by mutating string1
in-place. You can see the code in Python/ceval.c
under unicode_concatenate
. This optimization is mostly invisible, due to the refcount check, but the effect on id
values is one way it's visible.
String are immutable. There exist no str
method that allows to mutate them.
That being said, the reason you see the same id multiple times is because when an object is no longer in use, Python will reuse its position in memory. And what id
does is precisely to provide a unique identifier by returning the position of the object in memory.
One way to convince yourself that this is indeed the reason for your observation would be to make sure to always have a reference to each of the string you create by adding them to a list
.
Code
mystring = ""
mylist = ["This ", "That ", "This ", "That ", "This ", "That ", "This ", "That "]
# A list to keep a reference to each string
created_strings = []
for item in mylist:
mystring = mystring + item
# Prevent mystring from being garbage collected by adding it to the list
created_strings.append(mystring)
print("mystring: ", mystring, "ID of mystring: ", id(mystring))
Output
mystring: This ID of mystring: 2522900655888
mystring: This That ID of mystring: 2522903930416
mystring: This That This ID of mystring: 2522903930544
mystring: This That This That ID of mystring: 2522902118880
mystring: This That This That This ID of mystring: 2522900546624
mystring: This That This That This That ID of mystring: 2522900546864
mystring: This That This That This That This ID of mystring: 2522902428376
mystring: This That This That This That This That ID of mystring: 2522900907952
Notice that now that memory is not reclaimed, each object has a different id
.
id
s are reclaimed when not used, so not surprising you are seeing the sameid
because you are discarding the old strings.id
return values and string immutability.mystring + item
and the assignment tomystring
, the lifetimes of successivemystring
values should overlap. Lifetime overlap isn't transitive, but that doesn't matter, because we're seeing ID reuse for successivemystring
values. If it weren't for the in-place optimization ofmystring = mystring + item
, this ID reuse wouldn't happen.mystring
value would come into existence before the name binding operation, and then the name binding would end the lifetime of the oldmystring
value. There would be a llifetime overlap between the+
and the=
.