Make multiple modifications to string: how to, being inmutable in Python?

Question 1

I'm new to Python, so maybe I'm asking for something very easy but I can't think of the problem in a Python way.

I have a compressed string. The idea is, if a character gets repeated 4-15 times, I make this change:

'0000' ---> '0|4'

If more than 15 times, I use a slash and two digits to represent the amount (working with hexadecimal values):

'00...(16 times)..0' ---> '0/10'

So, accustomed to other languages, my approach is the following:

def uncompress(line):
 verticalBarIndex = line.index('|')
 while verticalBarIndex!=-1:
 repeatedChar = line[verticalBarIndex-1:verticalBarIndex]
 timesRepeated = int(line[verticalBarIndex+1:verticalBarIndex+2], 16)
 uncompressedChars = [repeatedChar]
 for i in range(timesRepeated):
 uncompressedChars.append(repeatedChar)
 uncompressedString = uncompressedChars.join()
 line = line[:verticalBarIndex-1] + uncompressedString + line[verticalBarIndex+2:]
 verticalBarIndex = line.index('|') #next one
 slashIndex = line.index('/')
 while slashIndex!=-1:
 repeatedChar = line[slashIndex-1:slashIndex]
 timesRepeated = int(line[slashIndex+1:verticalBarIndex+3], 16)
 uncompressedChars = [repeatedChar]
 for i in range(timesRepeated):
 uncompressedChars.append(repeatedChar)
 uncompressedString = uncompressedChars.join()
 line = line[:slashIndex-1] + uncompressedString + line[slashIndex+3:]
 slashIndex = line.index('/') #next one
 return line

Which I know it is wrong, since strings are inmutable in Python, and I am changing line contents all the time until no '|' or '/' are present.

I know UserString exists, but I guess there is an easier and more Pythonish way of doing it, which would be great to learn.

Any help?

Question 2

you're just creating a new object and naming that object line (the same as was the name of the old object). There's nothing wrong w/ that. All str remain immutable.

Question 3

Oh, I guess you are right and my question is stupid then! Sorry!

Question 4

could you use compress = lambda s: s.encode('zip') and uncompress = lambda z: z.decode('zip')? It is inefficient to compress/uncompress byte-by-byte in pure Python

Question 5

The changes necessary to get your code running with the sample strings:

Change .index() to .find(). .index() raises an exception if the substring isn't found, .find() returns -1.

Change uncompressedChars.join() to ''.join(uncompressedChars).

Change timesRepeated = int(line[slashIndex+1:verticalBarIndex+3], 16) to timesRepeated = int(line[slashIndex+1:slashIndex+3], 16)

Set uncompressedChars = [] to start with, instead of uncompressedChars = [repeatedChar].

This should get it working properly. There are a lot of places where the code an be tidied and otpimised, but this works.

Question 6

The most common pattern I have seen is to use a list of characters. Lists are mutable and work as you describe above.

To create a list from a string

mystring = 'Hello'
mylist = list(mystring)

To create a string from a list

mystring = ''.join(mylist)

Question 7

You should build a list of substrings as you go and join them at the end:

def uncompress(line):
 # No error checking, sorry. Will crash with empty strings.
 result = []
 chars = iter(line)
 prevchar = chars.next() # this is the previous character
 while True:
 try:
 curchar = chars.next() # and this is the current character
 if curchar == '|':
 # current character is a pipe.
 # Previous character is the character to repeat
 # Get next character, the number of repeats
 curchar = chars.next()
 result.append(prevchar * int(curchar, 16))
 elif curchar == '/':
 # current character is a slash.
 # Previous character is the character to repeat
 # Get two next characters, the number of repeats
 curchar = chars.next()
 nextchar = chars.next()
 result.append(prevchar * int(curchar + nextchar, 16))
 else:
 # No need to repeat the previous character, append it to result.
 result.append(curchar)
 prevchar = curchar
 except StopIteration:
 # No more characters. Append the last one to result.
 result.append(curchar)
 break
 return ''.join(result)

Tim Tim 12.2k4 gold badges45 silver badges43 bronze badges · Accepted Answer · 2012-10-31 12:16:38Z

The changes necessary to get your code running with the sample strings:

Change .index() to .find(). .index() raises an exception if the substring isn't found, .find() returns -1.

Change uncompressedChars.join() to ''.join(uncompressedChars).

Change timesRepeated = int(line[slashIndex+1:verticalBarIndex+3], 16) to timesRepeated = int(line[slashIndex+1:slashIndex+3], 16)

Set uncompressedChars = [] to start with, instead of uncompressedChars = [repeatedChar].

This should get it working properly. There are a lot of places where the code an be tidied and otpimised, but this works.

CollectivesTM on Stack Overflow

Make multiple modifications to string: how to, being inmutable in Python?

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related