Uuencoding is historically used to encode emails. Instructions for creating a Uuencoder are:
- Start with 3 bytes from the source, 24 bits in total.
- Split into 4 6-bit groupings, each representing a value in the range 0 to 63: bits (00-05), (06-11), (12-17) and (18-23).
- Add 32 to each of the values. With the addition of 32 this means that the possible results can be between 32 (" " space) and 95 ("_" underline). 96 ("`" grave accent) as the "special character" is a logical extension of this range.
- Output the ASCII equivalent of these numbers.
import platform
def dec_to_bin(int):
return bin(int).replace('0b', '')
def uuencode(string, filename):
if platform.system() == 'Windows':
mode = '644'
elif platform.system() == 'Linux':
mode = '744'
trail = 'begin ' + mode + ' ' + filename + '\n'
string_values = []
char_list = list(string)
bytes = ''
for char in char_list:
string_values.append(ord(char))
for ascii in string_values:
bytes += dec_to_bin(ascii)
three_byte = [bytes[p:p+24] for p in range(0, len(bytes), 24)]
if len(three_byte[-1]) < 24:
three_byte[-1] += (24 - len(three_byte[-1])) * '0'
four_six_bits = [three_byte[n][m:m+6] for n in range(len(three_byte)) for m in range(0, 24, 6)]
four_six_bits = [four_six_bits[k:k+4] for k in range(0, len(four_six_bits), 4)]
decimal_list = []
for x in range(len(four_six_bits)):
for z in range(4):
decimal_list.append(int(four_six_bits[x][z], 2))
for index, decimal in enumerate(decimal_list):
decimal_list[index] += 32
encoded_chars = [chr(decimal_list[o]) for o in range(len(decimal_list))]
length_char = chr(len(encoded_chars) + 32)
trail += length_char
for newchar in encoded_chars:
trail += newchar
trail += '\n' + '`' + '\n' + 'end' + '\n'
return trail
2 Answers 2
This is somewhat unstructured, but here are a few recommendations:
Be careful when shadowing built-ins
In dec_to_bin()
, the int
parameter shadows the Python core int
type keyword. This works fine, but it's usually clearer to avoid redefining builtins when you can. Maybe call it num
.
bytes
is also a builtin. It's ok to redefine them, just be aware.
Variables not always defined
if platform.system() == 'Windows':
mode = '644'
elif platform.system() == 'Linux':
mode = '744'
trail = 'begin ' + mode + ' ' + filename + '\n'
What happens if platform.system()
is neither Windows or Linux? (For example: on Mac it returns 'Darwin'
.) mode
will be undefined, raising UnboundLocalError
.
Unnecessary conversion to list
char_list = list(string)
...
for char in char_list: ...
Strings are iterable just like lists, so for char in string:
works fine.
In fact, consider removing string_values
and char_list
entirely. You can condense them into:
bytes = ''.join(dec_to_bin(ord(char)) for char in string)
Unused variable
A common convention is to use _
to represent unused vars:
for index, _ in enumerate(decimal_list):
decimal_list[index] += 32
This shows that you are deliberately ignoring the second value from enumerate
.
@BenC humbly mentioned en-passing that that his answer is "somewhat unstructured", but while his answer is fine, I feel that this concern of lack of structure applies to your code.
Your code encapsulates all the logic in one function (apart from a tiny helper), this is hard to test, hard to write and hard to read.
This algorithm lends itself pretty well to modularization: the same transformation is applied identical to each chunk of 3 bytes of the original message.
Each function can be equipped with a docstring describing its purpose, a link to further documentation and some examples improving readability.
My version of this uuencode_chunk
also makes use of the chunks function (taken from StackOverflow) to divide the complexity further and because this chunk logic is necessary in two places:
import doctest
def padded_decimal_to_bin(i):
return (bin(i).replace('0b', '')).rjust(8, "0")
def chunks(l, n):
"""
Yield successive n-sized chunks from l.
Credit to: http://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks-in-python
>>> list(chunks([1, 2, 3, 4, 5, 6], 2))
[[1, 2], [3, 4], [5, 6]]
"""
for i in range(0, len(l), n):
yield l[i:i + n]
def uuencode_chunk(chunk):
"""
Given a chunk of 3 bytes (24 bits),
splits it in 4 equal groups of 6, adds 32 to each
and represents them as ASCII characters.
See this section for details:
https://en.wikipedia.org/wiki/Uuencoding#Formatting_mechanism
>>> uuencode_chunk("Cat")
'0V%T'
"""
bits = [padded_decimal_to_bin(ord(char)) for char in chunk]
return ''.join(chr(int(b, base=2) + 32) for b in chunks(''.join(bits), 6))
doctest.testmod()
Making use of the uuencode_chunk
and chunks
the main function should result much shorter and simpler.
-
1\$\begingroup\$ I actually meant that my answer would be somewhat unstructured. But this is a fantastic simplification of the code, nice. \$\endgroup\$BenC– BenC2016年08月24日 17:34:43 +00:00Commented Aug 24, 2016 at 17:34
-
\$\begingroup\$ @BenC I already had my idea that the code could be more organized than it was, so when reading your answer I just took it to confirm my feeling, a fun coincidence, I am adapting the first paragraph of my answer. \$\endgroup\$Caridorc– Caridorc2016年08月24日 19:02:37 +00:00Commented Aug 24, 2016 at 19:02