Do you know of a Python library which provides mutable strings? Google returned surprisingly few results. The only usable library I found is http://code.google.com/p/gapbuffer/ which is in C but I would prefer it to be written in pure Python.
Edit: Thanks for the responses but I'm after an efficient library. That is, ''.join(list)
might work but I was hoping for something more optimized. Also, it has to support the usual stuff regular strings do, like regex and unicode.
-
9Lists work pretty well for this purpose.Aaron Yodaiken– Aaron Yodaiken05/13/2012 14:49:11Commented May 13, 2012 at 14:49
-
A couple of links: LINK1, LINK2digEmAll– digEmAll05/13/2012 15:00:50Commented May 13, 2012 at 15:00
-
5Can you please explain, why do you need mutable strings? What is the use case?Zaur Nasibov– Zaur Nasibov05/13/2012 15:53:01Commented May 13, 2012 at 15:53
-
2@BasicWolf may be for memory-efficient replacements of chars inside the string? We're avoiding to create a copy of string.chuwy– chuwy10/29/2013 13:59:48Commented Oct 29, 2013 at 13:59
-
5@chuwy Well, there is a bytearray for those purposes. A string in Python is a-priori not a "memory-efficient" sequence, but rather concurrency-efficient. Consider this: you can always be sure, that no matter what a string modification operation on original string does not affect it. So, no problems in concurrency, thread safety etc.Zaur Nasibov– Zaur Nasibov10/29/2013 14:12:59Commented Oct 29, 2013 at 14:12
8 Answers 8
In Python mutable sequence type is bytearray see this link
-
I am not sure what @Marcin is referring to because bytearrays allows you to assign a new value to a slice of the bytearray.jonathanrocher– jonathanrocher03/05/2014 23:57:28Commented Mar 5, 2014 at 23:57
-
@jonathanrocher Check edit history. Marcin pointed out an error, and it was corrected.leewz– leewz07/02/2014 19:43:44Commented Jul 2, 2014 at 19:43
-
1This should be the 'correct' answer. Too much messing about involved in the current top-voted.robert– robert11/17/2014 09:41:48Commented Nov 17, 2014 at 9:41
-
30
bytearray
as the name obviously suggests is an array of bytes. Strings are not sequences of bytes but rather sequences of groups of bytes. I.e. this is only true for ASCII strings, not true for unicode in general. -1.freakish– freakish12/16/2014 12:34:49Commented Dec 16, 2014 at 12:34 -
1Beware of multi-byte characters. Example: bytearray('aé'.encode('utf8')) bytearray(b'a\xc3\xa9')Michael Grazebrook– Michael Grazebrook03/08/2021 01:30:02Commented Mar 8, 2021 at 1:30
This will allow you to efficiently change characters in a string. Although you can't change the string length.
>>> import ctypes
>>> a = 'abcdefghijklmn'
>>> mutable = ctypes.create_string_buffer(a)
>>> mutable[5:10] = ''.join( reversed(list(mutable[5:10].upper())) )
>>> a = mutable.value
>>> print `a, type(a)`
('abcdeJIHGFklmn', <type 'str'>)
-
7BE WARNED that the buffer includes the terminator into its reported
len()
. This will break slices with negative indices unless you add an extra-1
to each negative index. (For unicode buffers, it's-1
, too, becauselen
and slice indices for these types are in characters.)ivan_pozdeev– ivan_pozdeev01/18/2018 16:32:24Commented Jan 18, 2018 at 16:32 -
1Note: in python3 ctypes.create_string_buffer() takes bytes-type-argument as parameter, and ctypes.create_unicode_buffer() takes string-type-argument.Rustam A.– Rustam A.07/14/2021 11:53:41Commented Jul 14, 2021 at 11:53
class MutableString(object):
def __init__(self, data):
self.data = list(data)
def __repr__(self):
return "".join(self.data)
def __setitem__(self, index, value):
self.data[index] = value
def __getitem__(self, index):
if type(index) == slice:
return "".join(self.data[index])
return self.data[index]
def __delitem__(self, index):
del self.data[index]
def __add__(self, other):
self.data.extend(list(other))
def __len__(self):
return len(self.data)
... and so on, and so forth.
You could also subclass StringIO, buffer, or bytearray.
-
To be able to use regex and string methods like
find
you need to subclass fromstr
instead ofobject
.chtenb– chtenb08/20/2014 17:28:40Commented Aug 20, 2014 at 17:28 -
1Correction: regex and
find
only work on the original string. Modifications made through__setitem__
are disregarded. Is there a way to use regex on MutableStrings?chtenb– chtenb08/20/2014 17:35:55Commented Aug 20, 2014 at 17:35 -
You can do
re.match(expression, repr(mutable_string))
Joel Cornett– Joel Cornett08/20/2014 17:46:43Commented Aug 20, 2014 at 17:46 -
7But then you could as well use a normal string. I want/need to take advantage of the mutability.chtenb– chtenb08/20/2014 18:02:09Commented Aug 20, 2014 at 18:02
-
Too many functions to override. And you would have to check if there are any differences between the
str
API of the various Python versions.toolforger– toolforger01/14/2023 05:52:34Commented Jan 14, 2023 at 5:52
How about simply sub-classing list
(the prime example for mutability in Python)?
class CharList(list):
def __init__(self, s):
list.__init__(self, s)
@property
def list(self):
return list(self)
@property
def string(self):
return "".join(self)
def __setitem__(self, key, value):
if isinstance(key, int) and len(value) != 1:
cls = type(self).__name__
raise ValueError("attempt to assign sequence of size {} to {} item of size 1".format(len(value), cls))
super(CharList, self).__setitem__(key, value)
def __str__(self):
return self.string
def __repr__(self):
cls = type(self).__name__
return "{}(\'{}\')".format(cls, self.string)
This only joins the list back to a string if you want to print it or actively ask for the string representation. Mutating and extending are trivial, and the user knows how to do it already since it's just a list.
Example usage:
s = "te_st"
c = CharList(s)
c[1:3] = "oa"
c += "er"
print c # prints "toaster"
print c.list # prints ['t', 'o', 'a', 's', 't', 'e', 'r']
The following is fixed, see update below.
There's one (solvable) caveat: There's no check (yet) that each element is indeed a character. It will at least fail printing for everything but strings. However, those can be joined and may cause weird situations like this: [see code example below]
With the custom __setitem__
, assigning a string of length != 1 to a CharList item will raise a ValueError
. Everything else can still be freely assigned but will raise a TypeError: sequence item n: expected string, X found
when printing, due to the string.join()
operation. If that's not good enough, further checks can be added easily (potentially also to __setslice__
or by switching the base class to collections.Sequence
(performance might be different?!), cf. here)
s = "test"
c = CharList(s)
c[1] = "oa"
# with custom __setitem__ a ValueError is raised here!
# without custom __setitem__, we could go on:
c += "er"
print c # prints "toaster"
# this looks right until here, but:
print c.list # prints ['t', 'oa', 's', 't', 'e', 'r']
The FIFOStr package in pypi supports pattern matching and mutable strings. This may or may not be exactly what is wanted but was created as part of a pattern parser for a serial port (the chars are added one char at a time from left or right - see docs). It is derived from deque.
from fifostr import FIFOStr
myString = FIFOStr("this is a test")
myString.head(4) == "this" #true
myString[2] = 'u'
myString.head(4) == "thus" #true
(full disclosure I'm the author of FIFOstr)
Efficient mutable strings in Python are arrays.
PY3 Example for unicode string using array.array
from standard library:
>>> ua = array.array('u', 'teststring12')
>>> ua[-2:] = array.array('u', '345')
>>> ua
array('u', 'teststring345')
>>> re.search('string.*', ua.tounicode()).group()
'string345'
bytearray
is predefined for bytes and is more automatic regarding conversion and compatibility.
You can also consider memoryview
/ buffer
, numpy
arrays, mmap
and multiprocessing.shared_memory
for certain cases.
-
array
is "deprecated since 3.3 and will be removed in 4.0". I highly suspect that, having a fixed item size, it doesn't handle surrogate pairs correctly.ivan_pozdeev– ivan_pozdeev09/15/2023 15:34:55Commented Sep 15, 2023 at 15:34 -
1@ivan_pozdeev The deprecation notice is under Note (1) and refers only to the
'u'
type code. Thearray
module itself is not deprecated: docs.python.org/3/library/array.html.Jason Johnston– Jason Johnston04/24/2024 00:07:08Commented Apr 24, 2024 at 0:07
You could use a getter:
original_string = "hey all"
def get_string():
return original_string
So when you need it you just call it like this:
get_string().split()
Just do this
string = "big"
string = list(string)
string[0] = string[0].upper()
string = "".join(string)
print(string)
'''OUTPUT'''
> Big
-
1OP points out in the question description that he's looking for something more efficient than
''.join(list)
. He's also asking specifically for a library which provides mutable strings in Python (or some other approach to attain it). – Please revise your answer and explain why doing it in this way is still worthwhile from your perspective. Also giving an explanation rather than just do this is much more helpful for future readers. See also the contribution guide for reference.Ivo Mori– Ivo Mori10/20/2020 11:12:23Commented Oct 20, 2020 at 11:12