I'm new to Python, so maybe I'm asking for something very easy but I can't think of the problem in a Python way.
I have a compressed string. The idea is, if a character gets repeated 4-15 times, I make this change:
'0000' ---> '0|4'
If more than 15 times, I use a slash and two digits to represent the amount (working with hexadecimal values):
'00...(16 times)..0' ---> '0/10'
So, accustomed to other languages, my approach is the following:
def uncompress(line):
verticalBarIndex = line.index('|')
while verticalBarIndex!=-1:
repeatedChar = line[verticalBarIndex-1:verticalBarIndex]
timesRepeated = int(line[verticalBarIndex+1:verticalBarIndex+2], 16)
uncompressedChars = [repeatedChar]
for i in range(timesRepeated):
uncompressedChars.append(repeatedChar)
uncompressedString = uncompressedChars.join()
line = line[:verticalBarIndex-1] + uncompressedString + line[verticalBarIndex+2:]
verticalBarIndex = line.index('|') #next one
slashIndex = line.index('/')
while slashIndex!=-1:
repeatedChar = line[slashIndex-1:slashIndex]
timesRepeated = int(line[slashIndex+1:verticalBarIndex+3], 16)
uncompressedChars = [repeatedChar]
for i in range(timesRepeated):
uncompressedChars.append(repeatedChar)
uncompressedString = uncompressedChars.join()
line = line[:slashIndex-1] + uncompressedString + line[slashIndex+3:]
slashIndex = line.index('/') #next one
return line
Which I know it is wrong, since strings are inmutable in Python, and I am changing line contents all the time until no '|' or '/' are present.
I know UserString exists, but I guess there is an easier and more Pythonish way of doing it, which would be great to learn.
Any help?
3 Answers 3
The changes necessary to get your code running with the sample strings:
Change .index()
to .find()
. .index()
raises an exception if the substring isn't found, .find()
returns -1.
Change uncompressedChars.join()
to ''.join(uncompressedChars)
.
Change timesRepeated = int(line[slashIndex+1:verticalBarIndex+3], 16)
to timesRepeated = int(line[slashIndex+1:slashIndex+3], 16)
Set uncompressedChars = []
to start with, instead of uncompressedChars = [repeatedChar]
.
This should get it working properly. There are a lot of places where the code an be tidied and otpimised, but this works.
The most common pattern I have seen is to use a list of characters. Lists are mutable and work as you describe above.
To create a list from a string
mystring = 'Hello'
mylist = list(mystring)
To create a string from a list
mystring = ''.join(mylist)
You should build a list of substrings as you go and join them at the end:
def uncompress(line):
# No error checking, sorry. Will crash with empty strings.
result = []
chars = iter(line)
prevchar = chars.next() # this is the previous character
while True:
try:
curchar = chars.next() # and this is the current character
if curchar == '|':
# current character is a pipe.
# Previous character is the character to repeat
# Get next character, the number of repeats
curchar = chars.next()
result.append(prevchar * int(curchar, 16))
elif curchar == '/':
# current character is a slash.
# Previous character is the character to repeat
# Get two next characters, the number of repeats
curchar = chars.next()
nextchar = chars.next()
result.append(prevchar * int(curchar + nextchar, 16))
else:
# No need to repeat the previous character, append it to result.
result.append(curchar)
prevchar = curchar
except StopIteration:
# No more characters. Append the last one to result.
result.append(curchar)
break
return ''.join(result)
line
(the same as was the name of the old object). There's nothing wrong w/ that. Allstr
remain immutable.compress = lambda s: s.encode('zip')
anduncompress = lambda z: z.decode('zip')
? It is inefficient to compress/uncompress byte-by-byte in pure Python