python encoder and decoder

Question 1

I would like to build an an encoder and decoder using text coding.

A string "AAABBBBCDDDDDDDDDDEEDDDD" as input, returning a string "A3B4C1D10E2D4", where each alphabet symbol is followed by its frequency in the string. The decoder reverses the process.

Would like help getting started in python.

Question 2

have you tried anything..any single line of code??

Question 3

What have you tried?

Question 4

So take a stab at it, maybe with a for loop. You're much more likely to get useful answers that way.

Question 5

@JohnWard What do you mean by that? Fire a notepad or some other IDE - that's a good start. We won't ( or at least shouldn't ) give you solutions. Try something and then come back to us with that piece of code you'll have. Then we will analyze it and help you ( or not ). Don't be lazy. You might also realize that you don't even need help.

Question 6

Check this questions not exactly what you want but it can help you try to do that

Determining Letter Frequency Of Cipher Text

Question 7

The solution can be approached in different ways, and its pretty easy as a loop based solution, and is left as an exercise for you

As to give you a taste of the power of Python's batteries, I am proposing a solution using groupby

>>> ''.join("{}{}".format(k, sum(1 for e in v))
 for k,v in groupby("AAABBBBCDDDDDDDDDDEEDDDD"))
'A3B4C1D10E2D4'

Salient features of this solution

itertools.groupby groups similar consecutive data as a key, valued pair where the key is the duplicate element and the value is the group of repetition
As the group is a generator, len may not work here but a possible way of calculating length of any non sequence iterable is to use sum
str.join joins an iterable to generate a string with any supplied separator, in this case its an empty string

Question 8

len(list(v)) might be slightly faster in some cases though sum is suitable if v might be infinite.

Question 9

One possible solution for the cnoder would be to simply iterate over the string and count the character occurences, not very fancy but O(n).

def encode(s):
 last = s[0]
 count = 0
 for c in s:
 if last != c:
 yield '%s%i' % (last, count)
 last = c
 count = 0
 count += 1
 yield '%s%i' % (last, count)

For the decoder you could use a regular expression which splits the string up nicely for you, no need to write your own parser.

import re
def decode(s):
 for c, n in re.findall(r'(\w)(\d+)', s):
 yield c * int(n)

given your test input

s = 'AAABBBBCDDDDDDDDDDEEDDDD'
encoded = ''.join(encode(s))
print encoded
decoded = ''.join(decode(encoded))
print decoded

results in

A3B4C1D10E2D4
AAABBBBCDDDDDDDDDDEEDDDD

One more note, there's no real reason to use yield here, you could of course also build the strings in the en-/decode functions first, then return.

Question 10

-1: First of all it assumes that input does not contain digits. Secondly: regular expressions? Seriously? And finally: giving solutions to such questions is an antistackoverflow behaviour.

Question 11

First of all the input could hardly contain digits to get that kind of output, otherwise you would need some sort of separator between the letter and the letter count. Secondly, yes, regular expressions, they get the job done – use your tools. And finally: thanks for clearing that up, I'll try to do better in the future.

Question 12

Yeah, I guess you're right about digits. As for regular expressions: I just find it a bit overkill in this scenario, especially if we exclude digits in input.

Question 13

What's the alternative? Going through the string character by character, essentially building your own little parser? I would agree with you if the count could only be single digit, but clearly it can be of arbitrary length so you would have to parse it somehow.

Question 14

@pyrrrat: there's a one-line groupby solution (two lines going the other way, although I guess I could pack it into one if I had to).

Question 15

I would start by looking at the python string documentation, specifically find or count and work from there. Though I'm not sure you could really decode anything that you encode if the actual content inside the string matters in that manner.

seleucia 1,0664 gold badges17 silver badges30 bronze badges · Answer 1 · 2013-01-26 17:18:55Z

1

Check this questions not exactly what you want but it can help you try to do that

Determining Letter Frequency Of Cipher Text

Share

Improve this answer

edited May 23, 2017 at 11:50

Community's user avatar

Community Bot

11 silver badge

answered Jan 26, 2013 at 17:18

seleucia's user avatar

seleucia

1,0664 gold badges17 silver badges30 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Abhijit 64k20 gold badges143 silver badges209 bronze badges · Answer 2 · 2013-01-26 17:46:49Z

The solution can be approached in different ways, and its pretty easy as a loop based solution, and is left as an exercise for you

As to give you a taste of the power of Python's batteries, I am proposing a solution using groupby

>>> ''.join("{}{}".format(k, sum(1 for e in v))
 for k,v in groupby("AAABBBBCDDDDDDDDDDEEDDDD"))
'A3B4C1D10E2D4'

Salient features of this solution

itertools.groupby groups similar consecutive data as a key, valued pair where the key is the duplicate element and the value is the group of repetition
As the group is a generator, len may not work here but a possible way of calculating length of any non sequence iterable is to use sum
str.join joins an iterable to generate a string with any supplied separator, in this case its an empty string

len(list(v)) might be slightly faster in some cases though sum is suitable if v might be infinite.

pyrrrat 3591 silver badge4 bronze badges · Answer 3 · 2013-01-26 17:22:02Z

One possible solution for the cnoder would be to simply iterate over the string and count the character occurences, not very fancy but O(n).

def encode(s):
 last = s[0]
 count = 0
 for c in s:
 if last != c:
 yield '%s%i' % (last, count)
 last = c
 count = 0
 count += 1
 yield '%s%i' % (last, count)

For the decoder you could use a regular expression which splits the string up nicely for you, no need to write your own parser.

import re
def decode(s):
 for c, n in re.findall(r'(\w)(\d+)', s):
 yield c * int(n)

given your test input

s = 'AAABBBBCDDDDDDDDDDEEDDDD'
encoded = ''.join(encode(s))
print encoded
decoded = ''.join(decode(encoded))
print decoded

results in

A3B4C1D10E2D4
AAABBBBCDDDDDDDDDDEEDDDD

One more note, there's no real reason to use yield here, you could of course also build the strings in the en-/decode functions first, then return.

-1: First of all it assumes that input does not contain digits. Secondly: regular expressions? Seriously? And finally: giving solutions to such questions is an antistackoverflow behaviour.
First of all the input could hardly contain digits to get that kind of output, otherwise you would need some sort of separator between the letter and the letter count. Secondly, yes, regular expressions, they get the job done – use your tools. And finally: thanks for clearing that up, I'll try to do better in the future.
Yeah, I guess you're right about digits. As for regular expressions: I just find it a bit overkill in this scenario, especially if we exclude digits in input.
What's the alternative? Going through the string character by character, essentially building your own little parser? I would agree with you if the count could only be single digit, but clearly it can be of arbitrary length so you would have to parse it somehow.
@pyrrrat: there's a one-line groupby solution (two lines going the other way, although I guess I could pack it into one if I had to).

DanielCardin 5652 gold badges8 silver badges17 bronze badges · Answer 4 · 2013-01-26 17:14:04Z

I would start by looking at the python string documentation, specifically find or count and work from there. Though I'm not sure you could really decode anything that you encode if the actual content inside the string matters in that manner.

CollectivesTM on Stack Overflow

python encoder and decoder

4 Answers 4

Comments

1 Comment

7 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

4 Answers 4

Comments

1 Comment

7 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related