Convert Integer to Binary String

Question 1

This code takes a user input number which is a positive integer, and outputs a string representation of the number in binary. For example, input 2 outputs '10' and input 25 outputs '11001'. As far as I can tell it works correctly.

Please review it for good style and organization, and I'm especially looking for a critique of the algorithm and level of documentation. My goals are correctness and that it is obvious what the code is doing. My experience level is I've been using Python for about 7 years, hobbyist, haven't collaborated with other programmers ever, and I'm now in a Comp Sci degree program.

I chose the log2 function and the for loop to calculate how long to run, rather than trying to guess if the calc is over on each pass of the loop, but I assume there is a cleaner/simpler way to do that.

from math import log2, floor
def integer_to_binary_string(int_value):
 '''Converts integer to a string of base-2 bits,
 
 example: 20 becomes '10100'
 
 '''
 rev_str = integer_to_reverse_binary(int_value)
 forward_str = reverse_string(rev_str)
 return forward_str
def integer_to_reverse_binary(int_value):
 '''returns a string, not a numerical'''
 out_s = ''
 if int_value == 0:
 #to avoid ValueError on the log2 call
 return '0'
 #find length of bin string
 bin_digits = floor(log2(int_value)) + 1
 for i in range(bin_digits):
 out_s = out_s + str(int_value // 2**i % 2)
 return out_s
def reverse_string(s):
 return s[::-1] #start:stop:step slice notation
if __name__ == '__main__':
 user_input = int(input())
 print(integer_to_binary_string(user_input))

Question 2

I assume you're [reinventing-the-wheel] intentionally, and don't want to just use the bin(...) function, so I've added that tag. Please correct me if I'm wrong...

Question 3

@AJNeufeld thanks for adding that, you are correct. I didn't know about that tag. (and anyhow, I'd have to strip the leading '0b' from the output of bin(i) to satisfy the automated test, but that's trivial)

Question 4

Your implementation seems fine at a glance, but Python actually provides two really easy ways to do this:

Your function would then become:

def integer_to_binary_string(int_value):
 '''Converts integer to a string of base-2 bits,
 
 example: 20 becomes '10100'
 
 '''
 return f"{int_value:b}"
 # or 
 # return bin(int_value)[2:]

The advantages of this are numerous:

Obvious what is happening & idiomatic Python
Doesn't require someone to understand the math behind it
Way faster (timeit with 1,000,000 iterations clocks at 0.23s for my method, 1.82s for yours)

Now to actually review your code (the reinventing-the-wheel tag was added after I started writing this):

Generally, I prefer list comprehensions over an explicit loop; they're often faster
Creating a function called reverse_string is unnecessary - the idiomatic/Pythonic way to do this is using the slice notation you mentioned - hiding it behind a method is just going to confuse experienced Python users.
Repeatedly appending to strings/lists/things is generally not the best from a performance perspective (this has to do with how memory is allocated). Instead, use str.join, as it will do better from a memory allocation perspective

What I would end up with looks like this:

def integer_to_binary_string2(int_value):
 """Converts integer to a string of base-2 bits,
 example: 20 becomes '10100'
 """
 if int_value == 0:
 return "0"
 bin_digits = floor(log2(int_value)) + 1
 return "".join([
 str(int_value // 2 ** i % 2) 
 for i in range(bin_digits)
 ])[::-1]

You can use a generator comprehension instead of a list comprehension, but as pointed out by @Graipher in the comments, in CPython str.join turns arbitrary iterables into lists as the very first step, negating any potential benefits. Generally speaking I prefer generator comprehensions to list comprehensions because they can be more memory efficient & are lazily evaluated, but they add no benefit here.

Question 5

Thank you to all who posted answers, all very helpful (I didn't know about type hinting in Python, especially). I chose this one because it especially focused on the algorithm and demo'd the timeit info and revised non-trivial code, and I think pointing out my need to study list/generator comprehensions is very helpful.

Question 6

Several general hints:

Formatting: Use an auto-formatter. I recommend black. It's wide-spread, maintained by the Python Software Foundation, allows little configuration so the style is consistent among projects (no configuration is good in this case)
Linting: Use linters. I recommend flake8 with flake8-bugbear, flake8-comprehensions, flake8_implicit_str_concat, flake8-simplify as plugins. See also: Static code analysis for Python
Type Annotations: They are awesome. Use them. See also: Type annotations in Python 3.6+
Docstring style: There are two which are good (Numpy doc and Google Style) and another one which is wide-spread (Sphinx Style). I like to use something very similar to numpy doc as it is easy to read.
Doctests: doctest makes your documentation double as a unit test. So your documentation is actually tested!
(you are aware of this as there is the reinvent-the-wheel, but this is super important in general: Use built-ins: Use built format or bin as shown here. Code you don't write is code you don't need to test / maintain.)

Applied to this case, it gives:

def integer_to_binary_string(number: int) -> str:
 """
 Convert an integer to a string of base-2 bits.
 Examples
 --------
 >>> integer_to_binary_string(20)
 '10100'
 >>> integer_to_binary_string(-20)
 '-10100'
 Examples
 --------
 """
 return format(number, "b")
if __name__ == "__main__":
 import doctest
 doctest.testmod()

Question 7

Every point here is valid, except for the last one. The question is tagged reinventing-the-wheel, and the OP probably already knows about the builtins

Question 8

Good point! I havent seem that. I've adjusted the answer :-)

Question 9

According to PEP8, docstrings should use """ instead of '''
Both integer_to_reverse_binary and reverse_string are only used once. I'd just replace the usage with the body of the functions.
Instead of integer_to_binary_string, I would prefer a shorter name such as int_to_bin.
log2 is inaccurate for large values and should be avoided if possible. int has an in-built function called bit_length which, as you could probably guess, returns the bit length of an integer.

This function can replace floor(log2(int_value)) + 1.
rev_str += some_value is the proper (and marginally faster) way for inplace addition.
rev_str += str(int_value // 2 ** i % 2) can be replaced to rev_str += str((int_value >> i) & 1) by using bitwise operators to be made faster.

Now, the function looks something like this:

def int_to_bin(int_value):
 """
 Converts integer to a string of base-2 bits
 Example: 20 becomes '10100'
 """
 if int_value == 0:
 return '0'
 rev_str = ''
 bin_digits = int_value.bit_length()
 for i in range(bin_digits):
 rev_str += str((int_value >> i) & 1)
 forward_str = rev_str[::-1]
 return forward_str

Now, the above is good enough, but it can be made shorter and faster, though it's probably not recommended.

bin_digits and forward_str can be turned into inline statements, thus removing the variables.
```
rev_str = ''
for i in range(int_value.bit_length()):
 rev_str += str((int_value >> i) & 1)
```
If you use ''.join(), this can be changed to a one-liner as follows.
rev_str = ''.join(str((int_value >> i) & 1) for i in range(int_value.bit_length()))

Now, the code has been reduced to:

def int_to_bin(int_value):
 """
 Converts integer to a string of base-2 bits
 Example: 20 becomes '10100'
 """
 if int_value == 0:
 return '0'
 return ''.join(str((int_value >> i) & 1) for i in range(int_value.bit_length()))[::-1]

I hope this helped!

Question 10

I only have a few points to add to Dannnno's, Sriv's, and Martin Thoma's excellent answers.

Consistency

Sometimes, you abbreviate string as "str", sometimes as "s", sometimes not at all. Sometimes, you put the type in the identifier, sometimes not. When you put the type in the identifier, you sometimes put it as a prefix, sometimes as a suffix.

You should choose one style and stick with it. If you are editing some existing code, you should adapt your style to be the same as the existing code. If you are part of a team, you should adapt your style to match the rest of the team.

In general, if you use two different ways to write the exact same thing, the reader will think that you want to convey a message with that. So, you should only use two different ways of writing the same thing IFF you actually want to convey some extra information.

Vertical whitespace

Your code is all bunched up together. Some empty lines would allow the code room to breathe, for example in the integer_to_reverse_binary function. The function is clearly separated into a series of steps: setup, compute, return. Some whitespace would help draw attention to those steps:

def integer_to_reverse_binary(int_value):
 '''returns a string, not a numerical'''
 out_s = ''
 if int_value == 0:
 #to avoid ValueError on the log2 call
 return '0'
 #find length of bin string
 bin_digits = floor(log2(int_value)) + 1
 for i in range(bin_digits):
 out_s = out_s + str(int_value // 2**i % 2)
 return out_s

Naming

Let's look at these names:

int_value
rev_str
forward_str
out_s
s

rev_str, forward_str, s, and out_s all contain strings. And yet, two of them are named str and two of them are named s. forward_str and rev_str are right next to each other in the code, they conceptually do the same thing (naming a temporary string), yet the word forward is spelled out, the word reverse is abbreviated.

Some of the names are very cryptic (e.g. s) and some very detailed (e.g. integer_to_reverse_binary).

Names are important. Names are also hard.

A good name should be intention-revealing. They should convey meaning.

int_value, for example, does not convey much meaning. You can only pass values as arguments, so every argument is always a value. Thus, the word "value" doesn't tell me anything I don't already know. And int only tells me what it is, but not what it does and what it's for, how it is used, why it is there.

Compared to int_value, even just n would be a better name, since it does actually tell the reader something: "this is just a number, and there's nothing special about it, so it doesn't deserve a name".

It is also not quite correct. In the question, you write [bold emphasis mine]:

This code takes a user input number which is a positive integer

So, int is actually not a restrictive enough description. In another sense, however, it is too restrictive, because your code will not just work with positive ints but with any positive whole number type that implements __floordiv__, __rpow__, and __rdivmod__.

The name bin_digits is actively misleading: it does not contain the binary digits. It contains the number of binary digits.

Oh, and also, the term "binary digit" is usually shortened to "bit".

So, it should be called bit_length or number_of_bits.

Not naming

As I wrote above, naming is hard, and names are important. However, there is a flip side to that: because naming is hard and names are important, that means when you give something a name, that thing is important.

If I see something with a name, I think "naming is hard work, so if the author put in the hard work to come up with a name for this thing, then this must be an important thing".

I think at least some of the slightly awkward names in your code are caused by trying to come up with names for things that shouldn't have names or maybe shouldn't even exist. For example, I think that out_s shouldn't be necessary.

Comments

Generally speaking, comments are a code smell. Mostly, comments should not exist:

If your code is so complex that you need to explain it in a comment, you should rather try to refactor your code to be less complex.
If you have a comment that explains what something is, you can rather give it a name.
If you have a comment that explains how the code does something, just delete it: the code explains how the code does something.

The only acceptable thing for a comment is to explain why the code does something in a specific non-obvious way. So, for example, there is an obvious way that looks like it should work, but you tried it and it didn't work for a non-obvious reason. Then you can add a comment explaining that you are specifically using solution B even though it looks like much simpler solution A should also work, but it actually doesn't work because of issue X. Ideally, you would add a link to the pull request / code review / bug ticket where this issue is discussed in greater detail and maybe a link to a wiki page with a detailed explanation.

I will give a couple of examples:

#find length of bin string
bin_digits = floor(log2(int_value)) + 1

I talked before about how I think the name bin_digits is not a very good name. And I think the comment reinforces that. Instead of using the comment to clarify the unclear name of the variable, we can rename the variable to just say that in the first place:

bit_length = floor(log2(int_value)) + 1

Here is another example:

return s[::-1] #start:stop:step slice notation

This comment is basically useless.

If I know Python, I know slice notation, so this comment tells me nothing. If I don't know Python, I have no idea what "slice notation" means, so this comment tells me nothing.

[::-1] is the standard Python idiom for reversing a sequence. This idiom is so well-known that even I know it, even though I am not actually a Python programmer. It really needs no explanation.

For a Python programmer, the "letters" [::-1] are actually no different from the letters reverse.

Type annotations

There are several places in your code, where you put types either in comments or in names. If you think the types are important enough to write down at all, don't you think they shouldn't be hidden away in comments or in names? Why not write the type as a ... type?

For example, this

def integer_to_binary_string(int_value):

could become this:

def integer_to_binary(n: int) -> str:

Or here, where for some strange reason you put the type not in the name but in the documentation, even though the situation is exactly the same as above:

def integer_to_reverse_binary(int_value):
 '''returns a string, not a numerical'''

becomes

def integer_to_reverse_binary(n: int) -> str:

Iteration Patterns

A lot of stuff we do in programming is iteration. We are iterating over data structures, over user inputs, you can think of a reactive program as iterating over a stream of events, etc. It is no wonder that the collections framework is the centerpiece of any language's core libraries.

There are a couple of fundamental iteration patterns that occur over and over again. Some of the more well-known are fold ("folding" a collection into a single value using a binary operation) and map (transforming each individual element of the collection). Then there is scan aka prefix-sum, filter, and so on.

Of these, fold has an interesting property: it is universal and expressive, meaning that all the other ones I listed and many more (in fact, everything that can be expressed by iterating over a collection, i.e. by using Python's for / in) can be expressed as a fold.

What does this have to do with this problem? Well, in turns out, fold has a dual, which is typically called unfold. Where fold "folds" a collection into a single value using a function that takes two arguments and returns one value, unfold does the exact dual of that: it "unfolds" a single value into a collection using a function that takes one argument and returns two values.

Doesn't that sound exactly what we are trying to do? Unfold a single number into a collection of ones and zeroes?

Python implements many recursion patterns in its library and even in the language. For example, functors.reduce is fold, itertools.accumulate is scan comprehensions are a powerful combination of map and filter.

Unfold, however, is unfortunately missing. But it is easy to write (or, as I did, google):

from typing import Callable, Iterable, Optional, Tuple, TypeVar
A = TypeVar("A")
B = TypeVar("B")
def unfold(f: Callable[[B], Optional[Tuple[A, B]]], initial: B) -> Iterable[A]:
 """
 Unfold a seed into an iterable.
 Unfold an initial seed value into a potentially infinite iterable by
 repeatedly applying a function for generating the next element.
 Implementation inspired by:
 https://reddit.com/r/programming/comments/65bpf/unfold_in_python/c02vvrz/
 :param f: a callable that takes the current seed as argument and
 returns a tuple of the next value and the next seed (or None to
 stop the generation).
 :param initial: the initial seed value
 :yields: a potentially infinite interable of values generated from the seed
 by repeatedly applying f.
 """
 intermediate = f(initial)
 while intermediate:
 element, initial = intermediate
 yield element
 intermediate = f(initial)

With this definition in scope, we can express the operation very succinctly, in my opinion:

def integer_to_binary(n: int) -> str:
 """
 Convert a non-negative integer into binary representation.
 :param n: a non-negative integer.
 :returns: the binary representation of n.
 """
 if n < 0:
 raise ValueError(n)
 return "".join(unfold(lambda n: (str(n % 2), n // 2) if n else None, n))[::-1]

There seem to be differing opinions in the Python community about whether or not functional programming and recursion patterns are idiomatic or not. For someone who is used to working with recursion patterns, the code here is understandable and succinct (but not dense).

Others may disagree. One way to get around this would be to possibly turn the anonymous lambda into a named nested function.

Dan Oberlam Dan Oberlam 8,0492 gold badges33 silver badges74 bronze badges · Accepted Answer · 2020-10-16 22:26:27Z

Your implementation seems fine at a glance, but Python actually provides two really easy ways to do this:

Your function would then become:

def integer_to_binary_string(int_value):
 '''Converts integer to a string of base-2 bits,
 
 example: 20 becomes '10100'
 
 '''
 return f"{int_value:b}"
 # or 
 # return bin(int_value)[2:]

The advantages of this are numerous:

Obvious what is happening & idiomatic Python
Doesn't require someone to understand the math behind it
Way faster (timeit with 1,000,000 iterations clocks at 0.23s for my method, 1.82s for yours)

Now to actually review your code (the reinventing-the-wheel tag was added after I started writing this):

Generally, I prefer list comprehensions over an explicit loop; they're often faster
Creating a function called reverse_string is unnecessary - the idiomatic/Pythonic way to do this is using the slice notation you mentioned - hiding it behind a method is just going to confuse experienced Python users.
Repeatedly appending to strings/lists/things is generally not the best from a performance perspective (this has to do with how memory is allocated). Instead, use str.join, as it will do better from a memory allocation perspective

What I would end up with looks like this:

def integer_to_binary_string2(int_value):
 """Converts integer to a string of base-2 bits,
 example: 20 becomes '10100'
 """
 if int_value == 0:
 return "0"
 bin_digits = floor(log2(int_value)) + 1
 return "".join([
 str(int_value // 2 ** i % 2) 
 for i in range(bin_digits)
 ])[::-1]

You can use a generator comprehension instead of a list comprehension, but as pointed out by @Graipher in the comments, in CPython str.join turns arbitrary iterables into lists as the very first step, negating any potential benefits. Generally speaking I prefer generator comprehensions to list comprehensions because they can be more memory efficient & are lazily evaluated, but they add no benefit here.

Thank you to all who posted answers, all very helpful (I didn't know about type hinting in Python, especially). I chose this one because it especially focused on the algorithm and demo'd the timeit info and revised non-trivial code, and I think pointing out my need to study list/generator comprehensions is very helpful.

Stack Exchange Network

Convert Integer to Binary String

4 Answers 4

Consistency

Vertical whitespace

Naming

Not naming

Comments

Type annotations

Iteration Patterns

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Convert Integer to Binary String

4 Answers 4

Consistency

Vertical whitespace

Naming

Not naming

Comments

Type annotations

Iteration Patterns

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions