This code takes a user input number which is a positive integer, and outputs a string representation of the number in binary. For example, input 2
outputs '10'
and input 25
outputs '11001'
. As far as I can tell it works correctly.
Please review it for good style and organization, and I'm especially looking for a critique of the algorithm and level of documentation. My goals are correctness and that it is obvious what the code is doing. My experience level is I've been using Python for about 7 years, hobbyist, haven't collaborated with other programmers ever, and I'm now in a Comp Sci degree program.
I chose the log2 function and the for loop to calculate how long to run, rather than trying to guess if the calc is over on each pass of the loop, but I assume there is a cleaner/simpler way to do that.
from math import log2, floor
def integer_to_binary_string(int_value):
'''Converts integer to a string of base-2 bits,
example: 20 becomes '10100'
'''
rev_str = integer_to_reverse_binary(int_value)
forward_str = reverse_string(rev_str)
return forward_str
def integer_to_reverse_binary(int_value):
'''returns a string, not a numerical'''
out_s = ''
if int_value == 0:
#to avoid ValueError on the log2 call
return '0'
#find length of bin string
bin_digits = floor(log2(int_value)) + 1
for i in range(bin_digits):
out_s = out_s + str(int_value // 2**i % 2)
return out_s
def reverse_string(s):
return s[::-1] #start:stop:step slice notation
if __name__ == '__main__':
user_input = int(input())
print(integer_to_binary_string(user_input))
4 Answers 4
Your implementation seems fine at a glance, but Python actually provides two really easy ways to do this:
Your function would then become:
def integer_to_binary_string(int_value):
'''Converts integer to a string of base-2 bits,
example: 20 becomes '10100'
'''
return f"{int_value:b}"
# or
# return bin(int_value)[2:]
The advantages of this are numerous:
- Obvious what is happening & idiomatic Python
- Doesn't require someone to understand the math behind it
- Way faster (
timeit
with 1,000,000 iterations clocks at 0.23s for my method, 1.82s for yours)
Now to actually review your code (the reinventing-the-wheel tag was added after I started writing this):
- Generally, I prefer list comprehensions over an explicit loop; they're often faster
- Creating a function called
reverse_string
is unnecessary - the idiomatic/Pythonic way to do this is using the slice notation you mentioned - hiding it behind a method is just going to confuse experienced Python users. - Repeatedly appending to strings/lists/things is generally not the best from a performance perspective (this has to do with how memory is allocated). Instead, use
str.join
, as it will do better from a memory allocation perspective
What I would end up with looks like this:
def integer_to_binary_string2(int_value):
"""Converts integer to a string of base-2 bits,
example: 20 becomes '10100'
"""
if int_value == 0:
return "0"
bin_digits = floor(log2(int_value)) + 1
return "".join([
str(int_value // 2 ** i % 2)
for i in range(bin_digits)
])[::-1]
You can use a generator comprehension instead of a list comprehension, but as pointed out by @Graipher in the comments, in CPython str.join
turns arbitrary iterables into lists as the very first step, negating any potential benefits. Generally speaking I prefer generator comprehensions to list comprehensions because they can be more memory efficient & are lazily evaluated, but they add no benefit here.
-
\$\begingroup\$ Thank you to all who posted answers, all very helpful (I didn't know about type hinting in Python, especially). I chose this one because it especially focused on the algorithm and demo'd the
timeit
info and revised non-trivial code, and I think pointing out my need to study list/generator comprehensions is very helpful. \$\endgroup\$nexus_2006– nexus_20062020年10月19日 20:19:13 +00:00Commented Oct 19, 2020 at 20:19
Several general hints:
- Formatting: Use an auto-formatter. I recommend black. It's wide-spread, maintained by the Python Software Foundation, allows little configuration so the style is consistent among projects (no configuration is good in this case)
- Linting: Use linters. I recommend flake8 with
flake8-bugbear
,flake8-comprehensions
,flake8_implicit_str_concat
,flake8-simplify
as plugins. See also: Static code analysis for Python - Type Annotations: They are awesome. Use them. See also: Type annotations in Python 3.6+
- Docstring style: There are two which are good (Numpy doc and Google Style) and another one which is wide-spread (Sphinx Style). I like to use something very similar to numpy doc as it is easy to read.
- Doctests:
doctest
makes your documentation double as a unit test. So your documentation is actually tested! - (you are aware of this as there is the reinvent-the-wheel, but this is super important in general: Use built-ins: Use built format or bin as shown here. Code you don't write is code you don't need to test / maintain.)
Applied to this case, it gives:
def integer_to_binary_string(number: int) -> str:
"""
Convert an integer to a string of base-2 bits.
Examples
--------
>>> integer_to_binary_string(20)
'10100'
>>> integer_to_binary_string(-20)
'-10100'
Examples
--------
"""
return format(number, "b")
if __name__ == "__main__":
import doctest
doctest.testmod()
-
2\$\begingroup\$ Every point here is valid, except for the last one. The question is tagged
reinventing-the-wheel
, and the OP probably already knows about the builtins \$\endgroup\$Sriv– Sriv2020年10月17日 13:39:30 +00:00Commented Oct 17, 2020 at 13:39 -
2\$\begingroup\$ Good point! I havent seem that. I've adjusted the answer :-) \$\endgroup\$Martin Thoma– Martin Thoma2020年10月17日 15:04:37 +00:00Commented Oct 17, 2020 at 15:04
According to PEP8, docstrings should use
"""
instead of'''
Both
integer_to_reverse_binary
andreverse_string
are only used once. I'd just replace the usage with the body of the functions.Instead of
integer_to_binary_string
, I would prefer a shorter name such asint_to_bin
.log2
is inaccurate for large values and should be avoided if possible.int
has an in-built function calledbit_length
which, as you could probably guess, returns the bit length of an integer.This function can replace
floor(log2(int_value)) + 1
.rev_str += some_value
is the proper (and marginally faster) way for inplace addition.rev_str += str(int_value // 2 ** i % 2)
can be replaced torev_str += str((int_value >> i) & 1)
by using bitwise operators to be made faster.
Now, the function looks something like this:
def int_to_bin(int_value):
"""
Converts integer to a string of base-2 bits
Example: 20 becomes '10100'
"""
if int_value == 0:
return '0'
rev_str = ''
bin_digits = int_value.bit_length()
for i in range(bin_digits):
rev_str += str((int_value >> i) & 1)
forward_str = rev_str[::-1]
return forward_str
Now, the above is good enough, but it can be made shorter and faster, though it's probably not recommended.
bin_digits
andforward_str
can be turned into inline statements, thus removing the variables.-
rev_str = '' for i in range(int_value.bit_length()): rev_str += str((int_value >> i) & 1)
If you use
''.join()
, this can be changed to a one-liner as follows.
rev_str = ''.join(str((int_value >> i) & 1) for i in range(int_value.bit_length()))
Now, the code has been reduced to:
def int_to_bin(int_value):
"""
Converts integer to a string of base-2 bits
Example: 20 becomes '10100'
"""
if int_value == 0:
return '0'
return ''.join(str((int_value >> i) & 1) for i in range(int_value.bit_length()))[::-1]
I hope this helped!
I only have a few points to add to Dannnno's, Sriv's, and Martin Thoma's excellent answers.
Consistency
Sometimes, you abbreviate string as "str", sometimes as "s", sometimes not at all. Sometimes, you put the type in the identifier, sometimes not. When you put the type in the identifier, you sometimes put it as a prefix, sometimes as a suffix.
You should choose one style and stick with it. If you are editing some existing code, you should adapt your style to be the same as the existing code. If you are part of a team, you should adapt your style to match the rest of the team.
In general, if you use two different ways to write the exact same thing, the reader will think that you want to convey a message with that. So, you should only use two different ways of writing the same thing IFF you actually want to convey some extra information.
Vertical whitespace
Your code is all bunched up together. Some empty lines would allow the code room to breathe, for example in the integer_to_reverse_binary
function. The function is clearly separated into a series of steps: setup, compute, return. Some whitespace would help draw attention to those steps:
def integer_to_reverse_binary(int_value):
'''returns a string, not a numerical'''
out_s = ''
if int_value == 0:
#to avoid ValueError on the log2 call
return '0'
#find length of bin string
bin_digits = floor(log2(int_value)) + 1
for i in range(bin_digits):
out_s = out_s + str(int_value // 2**i % 2)
return out_s
Naming
Let's look at these names:
int_value
rev_str
forward_str
out_s
s
rev_str
, forward_str
, s
, and out_s
all contain strings. And yet, two of them are named str
and two of them are named s
. forward_str
and rev_str
are right next to each other in the code, they conceptually do the same thing (naming a temporary string), yet the word forward
is spelled out, the word rev
erse is abbreviated.
Some of the names are very cryptic (e.g. s
) and some very detailed (e.g. integer_to_reverse_binary
).
Names are important. Names are also hard.
A good name should be intention-revealing. They should convey meaning.
int_value
, for example, does not convey much meaning. You can only pass values as arguments, so every argument is always a value. Thus, the word "value" doesn't tell me anything I don't already know. And int
only tells me what it is, but not what it does and what it's for, how it is used, why it is there.
Compared to int_value
, even just n
would be a better name, since it does actually tell the reader something: "this is just a number, and there's nothing special about it, so it doesn't deserve a name".
It is also not quite correct. In the question, you write [bold emphasis mine]:
This code takes a user input number which is a positive integer
So, int
is actually not a restrictive enough description. In another sense, however, it is too restrictive, because your code will not just work with positive int
s but with any positive whole number type that implements __floordiv__
, __rpow__
, and __rdivmod__
.
The name bin_digits
is actively misleading: it does not contain the binary digits. It contains the number of binary digits.
Oh, and also, the term "binary digit" is usually shortened to "bit".
So, it should be called bit_length
or number_of_bits
.
Not naming
As I wrote above, naming is hard, and names are important. However, there is a flip side to that: because naming is hard and names are important, that means when you give something a name, that thing is important.
If I see something with a name, I think "naming is hard work, so if the author put in the hard work to come up with a name for this thing, then this must be an important thing".
I think at least some of the slightly awkward names in your code are caused by trying to come up with names for things that shouldn't have names or maybe shouldn't even exist. For example, I think that out_s
shouldn't be necessary.
Comments
Generally speaking, comments are a code smell. Mostly, comments should not exist:
- If your code is so complex that you need to explain it in a comment, you should rather try to refactor your code to be less complex.
- If you have a comment that explains what something is, you can rather give it a name.
- If you have a comment that explains how the code does something, just delete it: the code explains how the code does something.
The only acceptable thing for a comment is to explain why the code does something in a specific non-obvious way. So, for example, there is an obvious way that looks like it should work, but you tried it and it didn't work for a non-obvious reason. Then you can add a comment explaining that you are specifically using solution B even though it looks like much simpler solution A should also work, but it actually doesn't work because of issue X. Ideally, you would add a link to the pull request / code review / bug ticket where this issue is discussed in greater detail and maybe a link to a wiki page with a detailed explanation.
I will give a couple of examples:
#find length of bin string
bin_digits = floor(log2(int_value)) + 1
I talked before about how I think the name bin_digits
is not a very good name. And I think the comment reinforces that. Instead of using the comment to clarify the unclear name of the variable, we can rename the variable to just say that in the first place:
bit_length = floor(log2(int_value)) + 1
Here is another example:
return s[::-1] #start:stop:step slice notation
This comment is basically useless.
If I know Python, I know slice notation, so this comment tells me nothing. If I don't know Python, I have no idea what "slice notation" means, so this comment tells me nothing.
[::-1]
is the standard Python idiom for reversing a sequence. This idiom is so well-known that even I know it, even though I am not actually a Python programmer. It really needs no explanation.
For a Python programmer, the "letters" [::-1]
are actually no different from the letters reverse
.
Type annotations
There are several places in your code, where you put types either in comments or in names. If you think the types are important enough to write down at all, don't you think they shouldn't be hidden away in comments or in names? Why not write the type as a ... type?
For example, this
def integer_to_binary_string(int_value):
could become this:
def integer_to_binary(n: int) -> str:
Or here, where for some strange reason you put the type not in the name but in the documentation, even though the situation is exactly the same as above:
def integer_to_reverse_binary(int_value):
'''returns a string, not a numerical'''
becomes
def integer_to_reverse_binary(n: int) -> str:
Iteration Patterns
A lot of stuff we do in programming is iteration. We are iterating over data structures, over user inputs, you can think of a reactive program as iterating over a stream of events, etc. It is no wonder that the collections framework is the centerpiece of any language's core libraries.
There are a couple of fundamental iteration patterns that occur over and over again. Some of the more well-known are fold ("folding" a collection into a single value using a binary operation) and map (transforming each individual element of the collection). Then there is scan aka prefix-sum, filter, and so on.
Of these, fold has an interesting property: it is universal and expressive, meaning that all the other ones I listed and many more (in fact, everything that can be expressed by iterating over a collection, i.e. by using Python's for
/ in
) can be expressed as a fold.
What does this have to do with this problem? Well, in turns out, fold has a dual, which is typically called unfold. Where fold "folds" a collection into a single value using a function that takes two arguments and returns one value, unfold does the exact dual of that: it "unfolds" a single value into a collection using a function that takes one argument and returns two values.
Doesn't that sound exactly what we are trying to do? Unfold a single number into a collection of ones and zeroes?
Python implements many recursion patterns in its library and even in the language. For example, functors.reduce
is fold, itertools.accumulate
is scan comprehensions are a powerful combination of map and filter.
Unfold, however, is unfortunately missing. But it is easy to write (or, as I did, google):
from typing import Callable, Iterable, Optional, Tuple, TypeVar
A = TypeVar("A")
B = TypeVar("B")
def unfold(f: Callable[[B], Optional[Tuple[A, B]]], initial: B) -> Iterable[A]:
"""
Unfold a seed into an iterable.
Unfold an initial seed value into a potentially infinite iterable by
repeatedly applying a function for generating the next element.
Implementation inspired by:
https://reddit.com/r/programming/comments/65bpf/unfold_in_python/c02vvrz/
:param f: a callable that takes the current seed as argument and
returns a tuple of the next value and the next seed (or None to
stop the generation).
:param initial: the initial seed value
:yields: a potentially infinite interable of values generated from the seed
by repeatedly applying f.
"""
intermediate = f(initial)
while intermediate:
element, initial = intermediate
yield element
intermediate = f(initial)
With this definition in scope, we can express the operation very succinctly, in my opinion:
def integer_to_binary(n: int) -> str:
"""
Convert a non-negative integer into binary representation.
:param n: a non-negative integer.
:returns: the binary representation of n.
"""
if n < 0:
raise ValueError(n)
return "".join(unfold(lambda n: (str(n % 2), n // 2) if n else None, n))[::-1]
There seem to be differing opinions in the Python community about whether or not functional programming and recursion patterns are idiomatic or not. For someone who is used to working with recursion patterns, the code here is understandable and succinct (but not dense).
Others may disagree. One way to get around this would be to possibly turn the anonymous lambda into a named nested function.
Explore related questions
See similar questions with these tags.
bin(...)
function, so I've added that tag. Please correct me if I'm wrong... \$\endgroup\$'0b'
from the output ofbin(i)
to satisfy the automated test, but that's trivial) \$\endgroup\$