53

Do you know of a Python library which provides mutable strings? Google returned surprisingly few results. The only usable library I found is http://code.google.com/p/gapbuffer/ which is in C but I would prefer it to be written in pure Python.

Edit: Thanks for the responses but I'm after an efficient library. That is, ''.join(list) might work but I was hoping for something more optimized. Also, it has to support the usual stuff regular strings do, like regex and unicode.

codeforester
43.7k21 gold badges121 silver badges157 bronze badges
asked May 13, 2012 at 14:45
7
  • 9
    Lists work pretty well for this purpose. Commented May 13, 2012 at 14:49
  • A couple of links: LINK1, LINK2 Commented May 13, 2012 at 15:00
  • 5
    Can you please explain, why do you need mutable strings? What is the use case? Commented May 13, 2012 at 15:53
  • 2
    @BasicWolf may be for memory-efficient replacements of chars inside the string? We're avoiding to create a copy of string. Commented Oct 29, 2013 at 13:59
  • 5
    @chuwy Well, there is a bytearray for those purposes. A string in Python is a-priori not a "memory-efficient" sequence, but rather concurrency-efficient. Consider this: you can always be sure, that no matter what a string modification operation on original string does not affect it. So, no problems in concurrency, thread safety etc. Commented Oct 29, 2013 at 14:12

8 Answers 8

30

In Python mutable sequence type is bytearray see this link

quamrana
39.5k13 gold badges56 silver badges77 bronze badges
answered May 13, 2012 at 15:01
7
  • I am not sure what @Marcin is referring to because bytearrays allows you to assign a new value to a slice of the bytearray. Commented Mar 5, 2014 at 23:57
  • @jonathanrocher Check edit history. Marcin pointed out an error, and it was corrected. Commented Jul 2, 2014 at 19:43
  • 1
    This should be the 'correct' answer. Too much messing about involved in the current top-voted. Commented Nov 17, 2014 at 9:41
  • 30
    bytearray as the name obviously suggests is an array of bytes. Strings are not sequences of bytes but rather sequences of groups of bytes. I.e. this is only true for ASCII strings, not true for unicode in general. -1. Commented Dec 16, 2014 at 12:34
  • 1
    Beware of multi-byte characters. Example: bytearray('aé'.encode('utf8')) bytearray(b'a\xc3\xa9') Commented Mar 8, 2021 at 1:30
24

This will allow you to efficiently change characters in a string. Although you can't change the string length.

>>> import ctypes
>>> a = 'abcdefghijklmn'
>>> mutable = ctypes.create_string_buffer(a)
>>> mutable[5:10] = ''.join( reversed(list(mutable[5:10].upper())) )
>>> a = mutable.value
>>> print `a, type(a)`
('abcdeJIHGFklmn', <type 'str'>)
answered Oct 3, 2014 at 2:14
2
  • 7
    BE WARNED that the buffer includes the terminator into its reported len(). This will break slices with negative indices unless you add an extra -1 to each negative index. (For unicode buffers, it's -1, too, because len and slice indices for these types are in characters.) Commented Jan 18, 2018 at 16:32
  • 1
    Note: in python3 ctypes.create_string_buffer() takes bytes-type-argument as parameter, and ctypes.create_unicode_buffer() takes string-type-argument. Commented Jul 14, 2021 at 11:53
15
class MutableString(object):
 def __init__(self, data):
 self.data = list(data)
 def __repr__(self):
 return "".join(self.data)
 def __setitem__(self, index, value):
 self.data[index] = value
 def __getitem__(self, index):
 if type(index) == slice:
 return "".join(self.data[index])
 return self.data[index]
 def __delitem__(self, index):
 del self.data[index]
 def __add__(self, other):
 self.data.extend(list(other))
 def __len__(self):
 return len(self.data)

... and so on, and so forth.

You could also subclass StringIO, buffer, or bytearray.

scottmrogowski
2,1334 gold badges24 silver badges32 bronze badges
answered May 13, 2012 at 15:06
5
  • To be able to use regex and string methods like find you need to subclass from str instead of object. Commented Aug 20, 2014 at 17:28
  • 1
    Correction: regex and find only work on the original string. Modifications made through __setitem__are disregarded. Is there a way to use regex on MutableStrings? Commented Aug 20, 2014 at 17:35
  • You can do re.match(expression, repr(mutable_string)) Commented Aug 20, 2014 at 17:46
  • 7
    But then you could as well use a normal string. I want/need to take advantage of the mutability. Commented Aug 20, 2014 at 18:02
  • Too many functions to override. And you would have to check if there are any differences between the str API of the various Python versions. Commented Jan 14, 2023 at 5:52
5

How about simply sub-classing list (the prime example for mutability in Python)?

class CharList(list):
 def __init__(self, s):
 list.__init__(self, s)
 @property
 def list(self):
 return list(self)
 @property
 def string(self):
 return "".join(self)
 def __setitem__(self, key, value):
 if isinstance(key, int) and len(value) != 1:
 cls = type(self).__name__
 raise ValueError("attempt to assign sequence of size {} to {} item of size 1".format(len(value), cls))
 super(CharList, self).__setitem__(key, value)
 def __str__(self):
 return self.string
 def __repr__(self):
 cls = type(self).__name__
 return "{}(\'{}\')".format(cls, self.string)

This only joins the list back to a string if you want to print it or actively ask for the string representation. Mutating and extending are trivial, and the user knows how to do it already since it's just a list.

Example usage:

s = "te_st"
c = CharList(s)
c[1:3] = "oa"
c += "er"
print c # prints "toaster"
print c.list # prints ['t', 'o', 'a', 's', 't', 'e', 'r']

The following is fixed, see update below.

There's one (solvable) caveat: There's no check (yet) that each element is indeed a character. It will at least fail printing for everything but strings. However, those can be joined and may cause weird situations like this: [see code example below]

With the custom __setitem__, assigning a string of length != 1 to a CharList item will raise a ValueError. Everything else can still be freely assigned but will raise a TypeError: sequence item n: expected string, X found when printing, due to the string.join() operation. If that's not good enough, further checks can be added easily (potentially also to __setslice__ or by switching the base class to collections.Sequence (performance might be different?!), cf. here)

s = "test"
c = CharList(s)
c[1] = "oa"
# with custom __setitem__ a ValueError is raised here!
# without custom __setitem__, we could go on:
c += "er"
print c # prints "toaster"
# this looks right until here, but:
print c.list # prints ['t', 'oa', 's', 't', 'e', 'r']
answered Aug 27, 2017 at 0:25
3

The FIFOStr package in pypi supports pattern matching and mutable strings. This may or may not be exactly what is wanted but was created as part of a pattern parser for a serial port (the chars are added one char at a time from left or right - see docs). It is derived from deque.

from fifostr import FIFOStr
myString = FIFOStr("this is a test")
myString.head(4) == "this" #true
myString[2] = 'u'
myString.head(4) == "thus" #true

(full disclosure I'm the author of FIFOstr)

answered Aug 10, 2021 at 0:11
2

Efficient mutable strings in Python are arrays. PY3 Example for unicode string using array.array from standard library:

>>> ua = array.array('u', 'teststring12')
>>> ua[-2:] = array.array('u', '345')
>>> ua
array('u', 'teststring345')
>>> re.search('string.*', ua.tounicode()).group()
'string345'

bytearray is predefined for bytes and is more automatic regarding conversion and compatibility.

You can also consider memoryview / buffer, numpy arrays, mmap and multiprocessing.shared_memory for certain cases.

answered May 21, 2020 at 10:55
2
  • array is "deprecated since 3.3 and will be removed in 4.0". I highly suspect that, having a fixed item size, it doesn't handle surrogate pairs correctly. Commented Sep 15, 2023 at 15:34
  • 1
    @ivan_pozdeev The deprecation notice is under Note (1) and refers only to the 'u' type code. The array module itself is not deprecated: docs.python.org/3/library/array.html. Commented Apr 24, 2024 at 0:07
0

You could use a getter:

original_string = "hey all"
def get_string():
 return original_string

So when you need it you just call it like this:

get_string().split()
answered Jun 18, 2023 at 23:59
-3

Just do this
string = "big"
string = list(string)
string[0] = string[0].upper()
string = "".join(string)
print(string)

'''OUTPUT'''
> Big

answered Oct 20, 2020 at 7:27
1
  • 1
    OP points out in the question description that he's looking for something more efficient than ''.join(list). He's also asking specifically for a library which provides mutable strings in Python (or some other approach to attain it). – Please revise your answer and explain why doing it in this way is still worthwhile from your perspective. Also giving an explanation rather than just do this is much more helpful for future readers. See also the contribution guide for reference. Commented Oct 20, 2020 at 11:12

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.