I had an intuition that the maximal trajectory lengths for collatz conjecture numbers would be produced by numbers of the form (2**n)-1
so I wrote a program to test it (I was wrong). I thought I would put it up for review to catch any coding sins I committed, and see how others would improve my code.
Orginal Code
'''
* I had a conjecture that maximal trajectory lengths for the collatz conjecture were yielded by numbers of the form (2**n)-1. This program attempts to test it.
* "conjecture()" finds the number with the highest trajectory lengths in a given range. The ranges are of the form [2**n, 2**(n+1)).
'''
import math, pprint
f = open("CollatzConjecture.txt")
text = f.read()
def key(value, dct):
for key in dct.keys():
if dct[key] == value:
return key
return None
def conjecture(text):
traj = {int(j.split(" ")[0]): int(j.split(" ")[1]) for j in text.split("\n")} #Processes the text file into a dict mapping numbers to their trajectory length.
result = {}
bound = math.floor(math.log(len(traj),2))
traj2 = {num: traj[num] for num in list(traj.keys())[:2**bound+1]} #Slices the dictionary to receive the section of interest.
for st in (2**x for x in range(1, bound+1)): #Generator for powers of 2.
slce = list(traj2.items())[int((math.log(st,2)-1)**2):st] #Slices "traj2" into powers of 2.
result[st] = key(max([n[1] for n in slce]), traj2)
return pprint.pformat(result)
print(conjecture(text))
"CollatzConjecture.txt" is a text file containing number trajectoryLength
on separate lines. I wanted to find the number corresponding to the maximum trajectory length between successive powers of 2 in order to test my hypothesis.
4 Answers 4
def key(value, dct): for key in dct.keys(): if dct[key] == value: return key return None
This is really inefficient and not how dictionaries should be used.
- The point of a
dict
is O(1) lookup, this is essentially O(n) and it doesn't make sense - You should remove your blockcomments, and create a nice docstring. That will make the purpose of the code clearer
- You don't close the file, a nice addition is using the
with
statement to automatically open/close files PEP343
-
\$\begingroup\$
key(value, dct)
gets the key corresponding to a value.dict.get()
gets the value corresponding to a key. I'll edit my code to make it clearer. \$\endgroup\$Tobi Alafin– Tobi Alafin2019年01月28日 13:49:22 +00:00Commented Jan 28, 2019 at 13:49 -
4\$\begingroup\$ @TobiAlafin: If you need to map both ways (A -> B and B -> A), use two dictionaries to get O(1) lookups in each direction. A linear search O(n) in one direction vs. an O(1) in the other, doesn't make sense, unless that direction is almost never used, and you can't afford to double the storage cost by keeping a reverse dictionary. Or there are probably custom data structures that can index on both sets of data without duplicating the storage... like what a relational database would use for a table with 2 columns, and an index on each column. For short keys two dicts is totally fine. \$\endgroup\$Peter Cordes– Peter Cordes2019年01月28日 20:18:50 +00:00Commented Jan 28, 2019 at 20:18
-
\$\begingroup\$ Peter Cordes thank you. I had not thought of that, I'll apply that in the future. \$\endgroup\$Tobi Alafin– Tobi Alafin2019年01月29日 01:40:00 +00:00Commented Jan 29, 2019 at 1:40
First of all, thumbs up on including a documentation block at the beginning.
When trying to run the segment, I get
Traceback (most recent call last):
File "conjecture.py", line 28, in <module>
print(conjecture(text))
File "conjecture.py", line 19, in conjecture
traj = {int(j.split(" ")[0]): int(j.split(" ")[1]) for j in text.split("\n")} #Processes the text file into a dict mapping numbers to their trajectory length.
File "conjecture.py", line 19, in <dictcomp>
traj = {int(j.split(" ")[0]): int(j.split(" ")[1]) for j in text.split("\n")} #Processes the text file into a dict mapping numbers to their trajectory length.
IndexError: list index out of range
Turns out that when I saved the output of the link, there were empty trailing lines. After fixing that, I could get it to run.
Furthermore, I see several issues with this implementation:
- The
conjecture
function is doing a lot of the work, with several lines that are quite filled with operations, meaning it's hard to find out what it's doing. - A dictionary can contain a value multiple times, and in fact does. For instance, both 5 and 32 have a length of 5 (5 -> 16 -> ...) and (32 -> 16 -> ...). For longer lengths, there will be more ways in which it can be obtained.
- Floating point arithmetic in a fully integral problem seems like an issue.
- The file is opened, and then not immediately closed. For this it's not really an issue, but in general one should use a contextmanager to clean up used resources as quickly as possible.
- The semantics of the output isn't really clear to me, but I'll probably figure out a bit during picking it apart.
- It doesn't use a
__main__
block, which would be beneficial for re-usable code. - The
conjecture
function returns a string, instead of a dictionary.
So let's start with your implementation:
'''
* I had a conjecture that maximal trajectory lengths for the collatz conjecture were yielded by numbers of the form (2**n)-1. This program attempts to test it.
* "conjecture()" finds the number with the highest trajectory lengths in a given range. The ranges are of the form [2**n, 2**(n+1)).
'''
import math, pprint
f = open("CollatzConjecture.txt")
text = f.read()
def key(value, dct):
for key in dct.keys():
if dct[key] == value:
return key
return None
def conjecture(text):
traj = {int(j.split(" ")[0]): int(j.split(" ")[1]) for j in text.split("\n")} #Processes the text file into a dict mapping numbers to their trajectory length.
result = {}
bound = math.floor(math.log(len(traj),2))
traj2 = {num: traj[num] for num in list(traj.keys())[:2**bound+1]} #Slices the dictionary to receive the section of interest.
for st in (2**x for x in range(1, bound+1)): #Generator for powers of 2.
slce = list(traj2.items())[int((math.log(st,2)-1)**2):st] #Slices "traj2" into powers of 2.
result[st] = key(max([n[1] for n in slce]), traj2)
return pprint.pformat(result)
print(conjecture(text))
Use a if __name__ == '__main__'
segment.
One of the best places to start with such a script (especially if you want to make it reusable!) is to place the "main" code (file-mangling) inside a main
block:
... function definitions ...
if __name__ == '__main__':
f = open("CollatzConjecture.txt")
text = f.read()
print(conjecture(text))
Move parsing the trajectories out of conjecture
.
Your conjecture has nothing to do with the piece of text, but about the length of the trajectories. We shouldn't care how these lengths are generated, calculated or parsed inside the conjecture
function. But the current implementation does.
By moving out the text
to traj
transformation, we make conjecture
more general.
...
def conjecture(traj):
result = {}
bound = math.floor(math.log(len(traj),2))
...
if __name__ == '__main__':
f = open("CollatzConjecture.txt")
text = f.read()
traj = {int(j.split(" ")[0]): int(j.split(" ")[1]) for j in text.split("\n")} #Processes the text file into a dict mapping numbers to their trajectory length.
print(conjecture(traj))
And we can then split the parsing into a separate function, where we use a context manager.
...
def parse_lengths(filename):
with open(filename) as fp:
#Processes the text file into a dict mapping numbers to their trajectory length.
return {int(j.split(" ")[0]): int(j.split(" ")[1]) for j in fp}
...
if __name__ == '__main__':
traj = parse_lengths("CollatzConjecture.txt")
print(conjecture(traj))
And similarly, move the pformat out of conjecture:
def conjecture(text):
result = {}
...
return result
if __name__ == '__main__':
traj = parse_lengths("CollatzConjecture.txt")
pprint.pprint(conjecture(traj))
Simplify conjecture
.
Now that we're left with a conjecture
method that does only things related to the conjecture, we can look at it a bit more:
def conjecture(traj):
result = {}
bound = math.floor(math.log(len(traj),2))
traj2 = {num: traj[num] for num in list(traj.keys())[:2**bound+1]} #Slices the dictionary to receive the section of interest.
for st in (2**x for x in range(1, bound+1)): #Generator for powers of 2.
slce = list(traj2.items())[int((math.log(st,2)-1)**2):st] #Slices "traj2" into powers of 2.
result[st] = key(max([n[1] for n in slce]), traj2)
return result
Starting with this expression: int((math.log(st,2)-1)**2)
, what does it even do?! First it takes the 2-logarithm, substracts 1, and then squares it. At first I suspected that you wanted the previous power of two (and tested it). But, what actually happens is that 64 (26) becomes 25 (52), and also 2048 (211) becomes 100 (102). My basic guess is that this is incorrect, and you mean 64 => 32, 128 => 64, and so forth. Assuming that's what you want, you can just divide by two.
def conjecture(traj):
result = {}
bound = math.floor(math.log(len(traj),2))
traj2 = {num: traj[num] for num in list(traj.keys())[:2**bound+1]} #Slices the dictionary to receive the section of interest.
for st in (2**x for x in range(1, bound+1)): #Generator for powers of 2.
slce = list(traj2.items())[st // 2:st] #Slices "traj2" into powers of 2.
result[st] = key(max([n[1] for n in slce]), traj2)
return result
Surprisingly, though, this does not alter the output.
Next up is the fact that I see slices being taken of dictionaries (after casting to a list). In older Python versions (before 3.6) that might cause surprising results, as dictionaries are not necessarily ordered by value, though in this case it does turn out to be that way it seems. It's better to explicitly filter out the values that you want.
Final:
In the end, I found myself settling on this.
'''
* I had a conjecture that maximal trajectory lengths for the collatz conjecture were yielded by numbers of the form (2**n)-1. This program attempts to test it.
* "conjecture()" finds the number with the highest trajectory lengths in a given range. The ranges are of the form [2**n, 2**(n+1)).
'''
import pprint
def key(value, dct):
for key in dct.keys():
if dct[key] == value:
return key
return None
def parse_lengths(filename):
with open(filename) as fp:
#Processes the text file into a dict mapping numbers to their trajectory length.
return {int(j.split(" ")[0]): int(j.split(" ")[1]) for j in fp}
def conjecture(traj):
result = {}
traj_size = len(traj)
lower, higher = 1, 2
while True:
if higher > traj_size:
# We don't have all the data for this range available.
break
max_in_slice = max(length for num, length in traj.items() if lower <= num <= higher)
result[higher] = key(max_in_slice, traj)
# Prepare data for next iteration.
lower, higher = lower * 2, higher * 2
return result
if __name__ == '__main__':
traj = parse_lengths('CollatzConjecture.txt')
pprint.pprint(conjecture(traj))
I'm also not quite happy with this yet, as the "key" function is still bothering me as well (we could iterate over .items()
instead of .keys()
to simplify).
I hope this at least helps a bit.
It's desirable to try to break one's code into small, more reusable segments. For example, it would be preferable (in my opinion) to create a small function to open the text file, extract its contents, and return the dict
of trajectories you desire:
def load_trajectories(file_name):
traj = {}
with open(file_name, 'r') as f:
for line in f.readlines():
S = line.split(' ')
num, traj_len = int(S[0]), int(S[1])
traj[num] = traj_len
return traj
Note here that I used the context manager with open(file_name, 'r') as f:
, since in Python we need to first open a file, do things with it, and then close it. The context manager handles the opening and closing of the file for us, and within it we can access the file.
From my searching there doesn't seem to be a universally accepted way to find the "inverse" of a value in a dictionary, but I like yours for the most part. As one change, I would iterate over the key:value
pairs to simplify things a bit:
def find_key(target_value, dct):
for key, val in dct.items():
if val == target_value:
return key
raise Exception('Value not found!')
With this, it really only remains to clean up the conjecture
function. Here, we can pass the traj
dictionary as an argument, as well as the (un-exponentiated) bound.
In the parlance of your code, note that st
is always of the form 2**x
, so when we compute math.log(st, 2)
, it always evaluates to x
, so one line of your code reads (equivalently)
slce = list(traj2.items())[(x-1)**2:st]
which isn't the 'slicing into power of 2' that you desire. Moreover, it's not (necessarily) guaranteed that traj2.items()
will be turned into a list in the way you desire, so it'll be better to be more explicit:
slce = [key, val for key, val in traj.items() if key in range(2**(x-1), 2**x)]
Together, along with splitting our imports onto different lines, adding a if __name__ == '__main__':
guard, and some other minor reorganization yields the following code:
import math
import pprint
def find_key(target_value, dct):
for key, val in dct.items():
if val == target_value:
return key
raise Exception('Value not found!')
def load_trajectories(file_name):
traj = {}
with open(file_name, 'r') as f:
for line in f.readlines():
S = line.split(' ')
num, traj_len = int(S[0]), int(S[1])
traj[num] = traj_len
return traj
def conjecture(traj):
"""Checks the conjecture that the maximum 'Collatz length' among the integers in [2**n, 2**(n+1)) is 2**n - 1.
The conjecture is false.
"""
bound = math.floor(math.log(len(traj),2))
exp_bound = 2**bound
traj = {key:val for key, val in traj.items() if key < exp_bound} # Only include numbers of interest.
result = {}
for n in range(1,bound+1):
exp_n = 2**n
slce = [key, val for key, val in traj.items() if key in range(2**(n-1), exp_n)]
result[exp_n] = find_key(max([val for key, val in slce]), traj)
return pprint(pformat(result))
if __name__ == "__main__":
file_name = "CollatzConjecture.txt"
print(conjecture(file_name))
-
\$\begingroup\$ "I would iterate over the
key:value
pairs to simplify things a bit": it's not just a question of simplifying. It's also more efficient, because advancing the iterator doesn't involve any search so is as cheap as possible, whereas the lookup potentially has to resolve hash collisions. \$\endgroup\$Peter Taylor– Peter Taylor2019年01月29日 08:56:59 +00:00Commented Jan 29, 2019 at 8:56
The existing answers cover most of the issues that I spotted. There's just one thing that I think it's useful to add:
for st in (2**x for x in range(1, bound+1)): #Generator for powers of 2. slce = list(traj2.items())[int((math.log(st,2)-1)**2):st] #Slices "traj2" into powers of 2. result[st] = key(max([n[1] for n in slce]), traj2)
This converts the dict
into a list every time through the loop. The suggestions, which don't rely on undocumented behaviour, instead filter the dict
each time through the loop. But you can instead invert the loop:
best = [0] * (bound + 1)
for key, value in traj2.items():
x = int(math.log(key, 2)) + 1
if value > best[x] or (value == best[x] and key < result[2**x]):
best[x] = value
result[2**x] = key
This gives straightforward linear running time and allows you explicit control over tie-breaking. (I'm not sure whether I've got the tie-breaking correct for your conjecture, but you can easily fix that).
key
function actually used? \$\endgroup\$