I would like to build an an encoder and decoder using text coding.
A string "AAABBBBCDDDDDDDDDDEEDDDD" as input, returning a string "A3B4C1D10E2D4", where each alphabet symbol is followed by its frequency in the string. The decoder reverses the process.
Would like help getting started in python.
-
have you tried anything..any single line of code??namit– namit2013年01月26日 17:09:14 +00:00Commented Jan 26, 2013 at 17:09
-
3What have you tried?freakish– freakish2013年01月26日 17:09:32 +00:00Commented Jan 26, 2013 at 17:09
-
So take a stab at it, maybe with a for loop. You're much more likely to get useful answers that way.Kyle Maxwell– Kyle Maxwell2013年01月26日 17:19:16 +00:00Commented Jan 26, 2013 at 17:19
-
@JohnWard What do you mean by that? Fire a notepad or some other IDE - that's a good start. We won't ( or at least shouldn't ) give you solutions. Try something and then come back to us with that piece of code you'll have. Then we will analyze it and help you ( or not ). Don't be lazy. You might also realize that you don't even need help.freakish– freakish2013年01月26日 17:32:36 +00:00Commented Jan 26, 2013 at 17:32
4 Answers 4
Check this questions not exactly what you want but it can help you try to do that
Comments
The solution can be approached in different ways, and its pretty easy as a loop based solution, and is left as an exercise for you
As to give you a taste of the power of Python's batteries, I am proposing a solution using groupby
>>> ''.join("{}{}".format(k, sum(1 for e in v))
for k,v in groupby("AAABBBBCDDDDDDDDDDEEDDDD"))
'A3B4C1D10E2D4'
Salient features of this solution
- itertools.groupby groups similar consecutive data as a key, valued pair where the key is the duplicate element and the value is the group of repetition
- As the group is a generator, len may not work here but a possible way of calculating length of any non sequence iterable is to use sum
- str.join joins an iterable to generate a string with any supplied separator, in this case its an empty string
1 Comment
len(list(v)) might be slightly faster in some cases though sum is suitable if v might be infinite.One possible solution for the cnoder would be to simply iterate over the string and count the character occurences, not very fancy but O(n).
def encode(s):
last = s[0]
count = 0
for c in s:
if last != c:
yield '%s%i' % (last, count)
last = c
count = 0
count += 1
yield '%s%i' % (last, count)
For the decoder you could use a regular expression which splits the string up nicely for you, no need to write your own parser.
import re
def decode(s):
for c, n in re.findall(r'(\w)(\d+)', s):
yield c * int(n)
given your test input
s = 'AAABBBBCDDDDDDDDDDEEDDDD'
encoded = ''.join(encode(s))
print encoded
decoded = ''.join(decode(encoded))
print decoded
results in
A3B4C1D10E2D4
AAABBBBCDDDDDDDDDDEEDDDD
One more note, there's no real reason to use yield here, you could of course also build the strings in the en-/decode functions first, then return.
7 Comments
groupby solution (two lines going the other way, although I guess I could pack it into one if I had to).I would start by looking at the python string documentation, specifically find or count and work from there. Though I'm not sure you could really decode anything that you encode if the actual content inside the string matters in that manner.