5

I am new to Python and trying to understand the difference between mutable and immutable objects. One of the mutable types in Python is list. Let's say L = [1,2,3], then L has a id that points the object [1,2,3]. If the content of [1,2,3] is modified then L still retains the same id. In other words L is still associated with the same object even though the size and content of the object has been altered.

With immutable objects, my understanding is that modification of the object isn't allowed. Therefore, if a variable is reassigned with a new value, then that variable is bind to a new object with a different id. I expect string to behave in similar manner. Yet I tried to modified a string but the string id didn't change.

string = "blue"
for i in range(10):
 string = string + str(i)
 print("string id after {}th iteration: {}".format(i,id(string)))
string id after 0th iteration: 46958272
string id after 1th iteration: 46958272
string id after 2th iteration: 46958272
string id after 3th iteration: 47077400
string id after 4th iteration: 47077400
string id after 5th iteration: 47077400
string id after 6th iteration: 47077400
string id after 7th iteration: 47077400
string id after 8th iteration: 47077400
string id after 9th iteration: 47077400
DYZ
57.2k10 gold badges72 silver badges99 bronze badges
asked May 9, 2019 at 5:44
8
  • Possible duplicate of Mutable vs immutable objects Commented May 9, 2019 at 5:55
  • 1
    the duplicate does in no way answer the question about the ids of the strings... Commented May 9, 2019 at 5:57
  • if you start with an empty string; the id remains the same even until the 6th iteration... Commented May 9, 2019 at 6:00
  • @hiroprotagonist But then it changes and keeps changing.
    DYZ
    Commented May 9, 2019 at 6:03
  • Aside: string is a poor variable name since it clashes with the string module name.
    jpmc26
    Commented May 9, 2019 at 6:08

1 Answer 1

3

You really shouldn't see the same ID twice in a row, but CPython has an optimization for string concatenation with + that doesn't quite obey all the rules it's supposed to.

When CPython sees an operation of the form x = x + something or x += something, if x refers to a string and x holds the only reference to that string, then CPython will grow the string with realloc instead of creating a new string object. Depending on details of available memory, realloc may resize the allocated memory in place, or it may allocate new memory. If it resizes the allocation, the object's id remains the same. You can see the implementation in unicode_concatenate in Python/ceval.c.

This optimization is mostly fine, because the refcount check ensures it behaves mostly as if strings were really immutable and a new string was created. However, in x = x + stuff, the old string and the new string should have briefly overlapping lifetimes, because the new string should come into existence before the assignment ends the old string's lifetime, so it should be impossible for the ID values to be equal.

id is one of the few ways the optimization is observably different from if no string mutation occurred. The language developers seem to have decided they're okay with that.

answered May 9, 2019 at 6:28
4
  • Please provide a reference that the language guarantees that the "lifetime" of an object should last until the assignment is complete, rather than something else such as being defined at the statement level so that it may be discarded any time during the statement.
    jpmc26
    Commented May 9, 2019 at 9:16
  • @jpmc26: The Python documentation does not define the term "lifetime", though it does use the term. There's no sensible way to define it statement-level, though, and the closest thing to a sensible statement-level definition would still prohibit equal ID values (because the old and new strings are alive during the same statement). Commented May 9, 2019 at 9:33
  • While I would like an actual definition of the term (particularly for multi-threaded programs, without assuming a GIL to linearize things), it's hard to argue for any interpretation where two objects that exist at the same time are considered to have non-overlapping lifetimes, and without the optimization, the old and new strings would definitely exist simultaneously. The RHS of the assignment must be fully evaluated before name (re)binding occurs. Commented May 9, 2019 at 9:34
  • If your concern is whether the language guarantees that the LHS of an assignment is not discarded early, see the assignment statement documentation, which says that name (re)binding only occurs once the new object is available. The string concatenation optimization cheats by unbinding the name up front. Commented May 9, 2019 at 9:41

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.