1

I'm new to Python, so maybe I'm asking for something very easy but I can't think of the problem in a Python way.

I have a compressed string. The idea is, if a character gets repeated 4-15 times, I make this change:

'0000' ---> '0|4'

If more than 15 times, I use a slash and two digits to represent the amount (working with hexadecimal values):

'00...(16 times)..0' ---> '0/10'

So, accustomed to other languages, my approach is the following:

def uncompress(line):
 verticalBarIndex = line.index('|')
 while verticalBarIndex!=-1:
 repeatedChar = line[verticalBarIndex-1:verticalBarIndex]
 timesRepeated = int(line[verticalBarIndex+1:verticalBarIndex+2], 16)
 uncompressedChars = [repeatedChar]
 for i in range(timesRepeated):
 uncompressedChars.append(repeatedChar)
 uncompressedString = uncompressedChars.join()
 line = line[:verticalBarIndex-1] + uncompressedString + line[verticalBarIndex+2:]
 verticalBarIndex = line.index('|') #next one
 slashIndex = line.index('/')
 while slashIndex!=-1:
 repeatedChar = line[slashIndex-1:slashIndex]
 timesRepeated = int(line[slashIndex+1:verticalBarIndex+3], 16)
 uncompressedChars = [repeatedChar]
 for i in range(timesRepeated):
 uncompressedChars.append(repeatedChar)
 uncompressedString = uncompressedChars.join()
 line = line[:slashIndex-1] + uncompressedString + line[slashIndex+3:]
 slashIndex = line.index('/') #next one
 return line

Which I know it is wrong, since strings are inmutable in Python, and I am changing line contents all the time until no '|' or '/' are present.

I know UserString exists, but I guess there is an easier and more Pythonish way of doing it, which would be great to learn.

Any help?

SilentGhost
322k67 gold badges311 silver badges294 bronze badges
asked Oct 31, 2012 at 12:11
3
  • 2
    you're just creating a new object and naming that object line (the same as was the name of the old object). There's nothing wrong w/ that. All str remain immutable. Commented Oct 31, 2012 at 12:15
  • Oh, I guess you are right and my question is stupid then! Sorry! Commented Oct 31, 2012 at 12:17
  • could you use compress = lambda s: s.encode('zip') and uncompress = lambda z: z.decode('zip')? It is inefficient to compress/uncompress byte-by-byte in pure Python Commented Oct 31, 2012 at 12:31

3 Answers 3

2

The changes necessary to get your code running with the sample strings:

Change .index() to .find(). .index() raises an exception if the substring isn't found, .find() returns -1.

Change uncompressedChars.join() to ''.join(uncompressedChars).

Change timesRepeated = int(line[slashIndex+1:verticalBarIndex+3], 16) to timesRepeated = int(line[slashIndex+1:slashIndex+3], 16)

Set uncompressedChars = [] to start with, instead of uncompressedChars = [repeatedChar].

This should get it working properly. There are a lot of places where the code an be tidied and otpimised, but this works.

answered Oct 31, 2012 at 12:16
0

The most common pattern I have seen is to use a list of characters. Lists are mutable and work as you describe above.

To create a list from a string

mystring = 'Hello'
mylist = list(mystring)

To create a string from a list

mystring = ''.join(mylist)
answered Oct 31, 2012 at 12:12
0

You should build a list of substrings as you go and join them at the end:

def uncompress(line):
 # No error checking, sorry. Will crash with empty strings.
 result = []
 chars = iter(line)
 prevchar = chars.next() # this is the previous character
 while True:
 try:
 curchar = chars.next() # and this is the current character
 if curchar == '|':
 # current character is a pipe.
 # Previous character is the character to repeat
 # Get next character, the number of repeats
 curchar = chars.next()
 result.append(prevchar * int(curchar, 16))
 elif curchar == '/':
 # current character is a slash.
 # Previous character is the character to repeat
 # Get two next characters, the number of repeats
 curchar = chars.next()
 nextchar = chars.next()
 result.append(prevchar * int(curchar + nextchar, 16))
 else:
 # No need to repeat the previous character, append it to result.
 result.append(curchar)
 prevchar = curchar
 except StopIteration:
 # No more characters. Append the last one to result.
 result.append(curchar)
 break
 return ''.join(result)
answered Oct 31, 2012 at 12:19

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.