Unix-to-Unix Encoding (Uuencoding)

Question 1

Uuencoding is historically used to encode emails. Instructions for creating a Uuencoder are:

Start with 3 bytes from the source, 24 bits in total.
Split into 4 6-bit groupings, each representing a value in the range 0 to 63: bits (00-05), (06-11), (12-17) and (18-23).
Add 32 to each of the values. With the addition of 32 this means that the possible results can be between 32 (" " space) and 95 ("_" underline). 96 ("`" grave accent) as the "special character" is a logical extension of this range.
Output the ASCII equivalent of these numbers.

import platform
def dec_to_bin(int):
 return bin(int).replace('0b', '')
def uuencode(string, filename):
 if platform.system() == 'Windows':
 mode = '644'
 elif platform.system() == 'Linux':
 mode = '744'
 trail = 'begin ' + mode + ' ' + filename + '\n'
 string_values = []
 char_list = list(string)
 bytes = ''
 for char in char_list:
 string_values.append(ord(char))
 for ascii in string_values:
 bytes += dec_to_bin(ascii)
 three_byte = [bytes[p:p+24] for p in range(0, len(bytes), 24)]
 if len(three_byte[-1]) < 24:
 three_byte[-1] += (24 - len(three_byte[-1])) * '0'
 four_six_bits = [three_byte[n][m:m+6] for n in range(len(three_byte)) for m in range(0, 24, 6)]
 four_six_bits = [four_six_bits[k:k+4] for k in range(0, len(four_six_bits), 4)]
 decimal_list = []
 for x in range(len(four_six_bits)):
 for z in range(4):
 decimal_list.append(int(four_six_bits[x][z], 2))
 for index, decimal in enumerate(decimal_list):
 decimal_list[index] += 32
 encoded_chars = [chr(decimal_list[o]) for o in range(len(decimal_list))]
 length_char = chr(len(encoded_chars) + 32)
 trail += length_char
 for newchar in encoded_chars:
 trail += newchar
 trail += '\n' + '`' + '\n' + 'end' + '\n'
 return trail

Question 2

This is somewhat unstructured, but here are a few recommendations:

Be careful when shadowing built-ins

In dec_to_bin(), the int parameter shadows the Python core int type keyword. This works fine, but it's usually clearer to avoid redefining builtins when you can. Maybe call it num.

bytes is also a builtin. It's ok to redefine them, just be aware.

Variables not always defined

if platform.system() == 'Windows':
 mode = '644'
elif platform.system() == 'Linux':
 mode = '744'
trail = 'begin ' + mode + ' ' + filename + '\n'

What happens if platform.system() is neither Windows or Linux? (For example: on Mac it returns 'Darwin'.) mode will be undefined, raising UnboundLocalError.

Unnecessary conversion to list

char_list = list(string)
...
for char in char_list: ...

Strings are iterable just like lists, so for char in string: works fine.

In fact, consider removing string_values and char_list entirely. You can condense them into:

bytes = ''.join(dec_to_bin(ord(char)) for char in string)

Unused variable

A common convention is to use _ to represent unused vars:

for index, _ in enumerate(decimal_list):
 decimal_list[index] += 32

This shows that you are deliberately ignoring the second value from enumerate.

Question 3

@BenC humbly mentioned en-passing that that his answer is "somewhat unstructured", but while his answer is fine, I feel that this concern of lack of structure applies to your code.

Your code encapsulates all the logic in one function (apart from a tiny helper), this is hard to test, hard to write and hard to read.

This algorithm lends itself pretty well to modularization: the same transformation is applied identical to each chunk of 3 bytes of the original message.

Each function can be equipped with a docstring describing its purpose, a link to further documentation and some examples improving readability.

My version of this uuencode_chunk also makes use of the chunks function (taken from StackOverflow) to divide the complexity further and because this chunk logic is necessary in two places:

import doctest
def padded_decimal_to_bin(i):
 return (bin(i).replace('0b', '')).rjust(8, "0")
def chunks(l, n):
 """
 Yield successive n-sized chunks from l.
 Credit to: http://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks-in-python
 >>> list(chunks([1, 2, 3, 4, 5, 6], 2))
 [[1, 2], [3, 4], [5, 6]]
 """
 for i in range(0, len(l), n):
 yield l[i:i + n]
def uuencode_chunk(chunk):
 """
 Given a chunk of 3 bytes (24 bits),
 splits it in 4 equal groups of 6, adds 32 to each
 and represents them as ASCII characters.
 See this section for details:
 https://en.wikipedia.org/wiki/Uuencoding#Formatting_mechanism
 >>> uuencode_chunk("Cat")
 '0V%T'
 """
 bits = [padded_decimal_to_bin(ord(char)) for char in chunk]
 return ''.join(chr(int(b, base=2) + 32) for b in chunks(''.join(bits), 6))
doctest.testmod()

Making use of the uuencode_chunk and chunks the main function should result much shorter and simpler.

Question 4

I actually meant that my answer would be somewhat unstructured. But this is a fantastic simplification of the code, nice.

Question 5

@BenC I already had my idea that the code could be more organized than it was, so when reading your answer I just took it to confirm my feeling, a fun coincidence, I am adapting the first paragraph of my answer.

BenC BenC 2,77811 silver badges22 bronze badges · Answer 1 · 2016-08-24 06:18:06Z

This is somewhat unstructured, but here are a few recommendations:

Be careful when shadowing built-ins

In dec_to_bin(), the int parameter shadows the Python core int type keyword. This works fine, but it's usually clearer to avoid redefining builtins when you can. Maybe call it num.

bytes is also a builtin. It's ok to redefine them, just be aware.

Variables not always defined

if platform.system() == 'Windows':
 mode = '644'
elif platform.system() == 'Linux':
 mode = '744'
trail = 'begin ' + mode + ' ' + filename + '\n'

What happens if platform.system() is neither Windows or Linux? (For example: on Mac it returns 'Darwin'.) mode will be undefined, raising UnboundLocalError.

Unnecessary conversion to list

char_list = list(string)
...
for char in char_list: ...

Strings are iterable just like lists, so for char in string: works fine.

In fact, consider removing string_values and char_list entirely. You can condense them into:

bytes = ''.join(dec_to_bin(ord(char)) for char in string)

Unused variable

A common convention is to use _ to represent unused vars:

for index, _ in enumerate(decimal_list):
 decimal_list[index] += 32

This shows that you are deliberately ignoring the second value from enumerate.

Caridorc Caridorc 28.1k7 gold badges54 silver badges137 bronze badges · Answer 2 · 2016-08-24 12:08:02Z

@BenC humbly mentioned en-passing that that his answer is "somewhat unstructured", but while his answer is fine, I feel that this concern of lack of structure applies to your code.

Your code encapsulates all the logic in one function (apart from a tiny helper), this is hard to test, hard to write and hard to read.

This algorithm lends itself pretty well to modularization: the same transformation is applied identical to each chunk of 3 bytes of the original message.

Each function can be equipped with a docstring describing its purpose, a link to further documentation and some examples improving readability.

My version of this uuencode_chunk also makes use of the chunks function (taken from StackOverflow) to divide the complexity further and because this chunk logic is necessary in two places:

import doctest
def padded_decimal_to_bin(i):
 return (bin(i).replace('0b', '')).rjust(8, "0")
def chunks(l, n):
 """
 Yield successive n-sized chunks from l.
 Credit to: http://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks-in-python
 >>> list(chunks([1, 2, 3, 4, 5, 6], 2))
 [[1, 2], [3, 4], [5, 6]]
 """
 for i in range(0, len(l), n):
 yield l[i:i + n]
def uuencode_chunk(chunk):
 """
 Given a chunk of 3 bytes (24 bits),
 splits it in 4 equal groups of 6, adds 32 to each
 and represents them as ASCII characters.
 See this section for details:
 https://en.wikipedia.org/wiki/Uuencoding#Formatting_mechanism
 >>> uuencode_chunk("Cat")
 '0V%T'
 """
 bits = [padded_decimal_to_bin(ord(char)) for char in chunk]
 return ''.join(chr(int(b, base=2) + 32) for b in chunks(''.join(bits), 6))
doctest.testmod()

Making use of the uuencode_chunk and chunks the main function should result much shorter and simpler.

I actually meant that my answer would be somewhat unstructured. But this is a fantastic simplification of the code, nice.
@BenC I already had my idea that the code could be more organized than it was, so when reading your answer I just took it to confirm my feeling, a fun coincidence, I am adapting the first paragraph of my answer.

Stack Exchange Network

Unix-to-Unix Encoding (Uuencoding)

2 Answers 2

Be careful when shadowing built-ins

Variables not always defined

Unnecessary conversion to list

Unused variable

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Unix-to-Unix Encoding (Uuencoding)

2 Answers 2

Be careful when shadowing built-ins

Variables not always defined

Unnecessary conversion to list

Unused variable

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions