I am writing a program to display the behavior or certain seeds when you apply the Collatz conjecture algorithm to them. I would like to know what I should change and also if there is a way to see which seed produced a certain plot. I am also having some problems with the color.
import matplotlib.pyplot as plt
import numpy as np
def collatz(seed):
"""Apply Collatz conjecture algorithm and stores all the values of the seed after every iteration to display them later"""
y = [seed]
while seed != 1:
if seed % 2 == 0:
seed = seed/2
else:
seed = (3*seed) + 1
y.append(seed)
return y
colors = plt.cm.bwr(np.linspace(0, 1, 1500))
for i in range(1, 1500): #collecting data by applying the algorithm to a certain set of seeds
Y = collatz(i)
plt.plot(Y, color = colors[i])
plt.title("Collatz conjecture")
plt.xlabel("Number of iterations until stuck")
plt.ylabel("Values reached")
plt.show()
-
\$\begingroup\$ Could you please show the resulting plot? \$\endgroup\$no comment– no comment2024年05月01日 20:42:32 +00:00Commented May 1, 2024 at 20:42
-
2\$\begingroup\$ I have rolled back Rev 4 → 3. Please see What should I do when someone answers my question?. \$\endgroup\$Sᴀᴍ Onᴇᴌᴀ– Sᴀᴍ Onᴇᴌᴀ ♦2024年05月02日 22:08:35 +00:00Commented May 2, 2024 at 22:08
3 Answers 3
Note how you're manually repeating the number of seeds (1500):
colors = plt.cm.bwr(np.linspace(0, 1, 1500))
for i in range(1, 1500):
...
That's always a good sign it should be a variable.
One option is to put the plotting code into a function parametrized by n_seeds
(and then add 1 to make the naming accurate):
def plot_collatz(n_seeds=1500):
# ------------
"""Plot Collatz sequences for a given number of seeds"""
colors = plt.cm.bwr(np.linspace(0, 1, n_seeds+1))
# ---------
for seed in range(1, n_seeds+1):
# ---------
plt.plot(collatz(seed), color=colors[seed])
plt.title("Collatz conjecture")
plt.xlabel("Number of iterations until stuck")
plt.ylabel("Values reached")
plt.show()
plot_collatz(10)
-
\$\begingroup\$ Didn't you fall into your own trap by having
n_seeds+1
twice? Why not make this another variable? :) \$\endgroup\$infinitezero– infinitezero2024年05月01日 06:53:47 +00:00Commented May 1, 2024 at 6:53 -
\$\begingroup\$ Well that depends what the "trap" is. For a beginner review, I think the main takeaway can just be that if we want to change the number of seeds, now we only have to do it once. That takeaway message isn't affected by having multiple
n_seeds+1
(and if we were to address the latter, I wouldn't recommend making a throwaway variable forn_seeds+1
). \$\endgroup\$tdy– tdy2024年05月01日 11:14:27 +00:00Commented May 1, 2024 at 11:14 -
\$\begingroup\$ @nocomment This figure is the output of
plot_collatz(10)
(preceding code block), so it doesn't contain seed 1161. If you're curious about the higher seeds, the figure gets quite busy, but here's the version with 1500 sequences. \$\endgroup\$tdy– tdy2024年05月02日 01:52:21 +00:00Commented May 2, 2024 at 1:52 -
\$\begingroup\$ @nocomment as the answer above says, the main idea behind this plot is to show how certain seeds behave if you apply the algorithm. Yeah, it isn't that great to see which seed produced a certain plot but I think it makes a good job at displaying how chaotic the sequence is. Maybe paired with a scatter plot it would make more sense \$\endgroup\$Lorenzo– Lorenzo2024年05月02日 04:54:17 +00:00Commented May 2, 2024 at 4:54
-
\$\begingroup\$ @tdy Ah yes, I missed that you changed that. Why did you? \$\endgroup\$no comment– no comment2024年05月02日 07:04:23 +00:00Commented May 2, 2024 at 7:04
Integer arithmetic
seed = seed/2
takes the value of seed
(initially an integer) and divides it by 2, and stores the result as a float
back in seed
.
>>> 10 / 2
5.0
If the collatz sequence value ever exceeds \2ドル^{53}\$, seed % 2 == 0
will always be true, due to the limited float
precision.
>>> 2 ** 53 + 1.0 # You'd expect this to be odd, but it's not!
9007199254740992.0
Instead, you should be using integer division: seed = seed // 2
. Python's integers have variable length representations, so can easily hold exceedingly large integer values precisely.
You could also use an Augmented Assignment operator here (seed //= 2
). It saves typing a variable name an additional time, and -- when the left-hand-side is more complex than a simple variable, such as self.seed
or a[i]
-- avoids evaluating the term a second time, which can be a speed improvement.
-
\$\begingroup\$ Thanks for the suggestion. So I should use this "contracted form" just for complex cases, right? Also, I don't understand how is it possible that 2^53 causes error: there is only one float digit (0) \$\endgroup\$Lorenzo– Lorenzo2024年05月01日 20:00:09 +00:00Commented May 1, 2024 at 20:00
-
\$\begingroup\$ @Lorenzo I'd probably use the
//=
form here as well. Less to type/maintain and it's how people usually do it. I've avoided it with ints only when hardcore micro-optimizing. \$\endgroup\$no comment– no comment2024年05月01日 20:04:42 +00:00Commented May 1, 2024 at 20:04 -
\$\begingroup\$ Okay so it makes the code more readable and the double // makes so that i am not storing useless float digits. Does this make the program faster? \$\endgroup\$Lorenzo– Lorenzo2024年05月01日 20:08:15 +00:00Commented May 1, 2024 at 20:08
-
\$\begingroup\$ @Lorenzo Actually the
//
instead of/
might make it slower. Because ints tend to be slower than floats, due to their arbitrary size that needs to be handled. You could measure to find out. But... you really should use ints. Otherwise you risk getting wrong results! \$\endgroup\$no comment– no comment2024年05月01日 20:14:41 +00:00Commented May 1, 2024 at 20:14 -
1\$\begingroup\$ @AJNeufeld Btw here's my benchmark for the 27 vs 29 ns that I mentioned. \$\endgroup\$no comment– no comment2024年05月01日 22:03:42 +00:00Commented May 1, 2024 at 22:03
identifiers
The literature tends to speak of "values" generated
by the \3ドルn+1\$ process.
A "seed" is used to start the process,
and we see successive values grow from there.
It is perfectly sensible to accept a seed
parameter.
But during the growth loop, please use another name, perhaps n
.
annotation
It wouldn't hurt to point out we accept seed: int
,
since a variant rational processes might perform
the hailstone operation on a seed: Fraction
.
(Though I confess the lack of from fractions import Fraction
is kind of a tip-off, here.)
It would also set you up for a numba JIT decorator.
single responsibility
Thank you for this beautiful docstring.
def collatz(seed):
"""Apply Collatz conjecture algorithm and stores all the values of the seed after every iteration to display them later"""
It is concise and accurate. It also helps us to see we are doing
- compute and
- store
Consider focusing on compute, so storage is someone else's problem.
from typing import Generator
def collatz(seed: int) -> Generator[int, None, None]:
"""Generates hailstone values."""
n = seed
while n > 1:
if n % 2:
n = 3 * n + 1
else:
n //= 2
yield n
def main(num_seeds: int = 1500) -> None:
...
for seed in range(1, num_seeds + 1):
y = list(collatz(seed))
...
color maps
The cm.bwr
colormap is very nice.
You might consider playing with
cyclic
colormaps, to highlight the chaotic non-linear
nature of the data you're plotting.
Or cycle through a categorical map.
You may find that plotting points rather than lines, and using a partly transparent beta, allows you to achieve higher visual data density.
caching
If you summarize a given seed as producing
an (iterations, max_value)
tuple,
you might find the @lru_cache or
@cache
decorators to be helpful;
or introduce your own caching array.
The literature
sometimes uses delay
to describe number of iterations
for a given seed.
Leavens and Vermeulen
did some interesting work in this space in 1992.
Of course, the calculations we're doing are so lightweight that time spent on cache checks threatens to swamp the time spent calculating. So we'd want to unroll a bit, perhaps by doing half a dozen "ordinary" function calls followed by one where we check the cache. Or equivalently, to handle every seventh seed, do a modulo congruent to zero test. Or perhaps use some bigger prime.
-
\$\begingroup\$ Thanks for the exhaustive answer. Wouldn't a cyclic colormap "hide" the chaos of the algorithm? I think a non cyclic colormap makes it easier to estimate the seed: small seeds = bluish color, big seeds = reddish color (obviously in respect to the chosen interval). Btw this is my first matplotlib program and >I am still learning the basics of python so I don't really know how to check cache. \$\endgroup\$Lorenzo– Lorenzo2024年05月01日 20:16:14 +00:00Commented May 1, 2024 at 20:16
-
\$\begingroup\$ Well, Tufte tells us that there's more than one perceptual channel to access the visual cortex, and more than one thing that that cortex is good at. It all depends on what aspect of the underlying data you wish to highlight. \$\endgroup\$J_H– J_H2024年05月01日 20:30:57 +00:00Commented May 1, 2024 at 20:30
-
\$\begingroup\$ About caching: looking up ints in a dict is also pretty fast, especially when they're a dense as they might be here. Btw if caching actually doesn't help, then another idea: They're already using NumPy. We could run all seeds in parallel. Put them all in an array, then do a step for all of them, then filter out the 1s. Until none are left. That could be competitive/interesting. \$\endgroup\$no comment– no comment2024年05月01日 20:40:47 +00:00Commented May 1, 2024 at 20:40
-
\$\begingroup\$ The cPython
dict
implementation is impressive, but for the current "dense" situation it can't possibly be competitive with array access, as it necessarily has to do more stuff. // Running all seeds in parallel certainly does hold some charm. Go for it! Create an implementation, post an Answer; I will upvote it. \$\endgroup\$J_H– J_H2024年05月01日 21:34:28 +00:00Commented May 1, 2024 at 21:34 -
\$\begingroup\$ Oh I wasn't comparing it with array (or list?) access. I was talking about your last paragraph, which sounds like caching with a dict might make their code slower. Although with an array/list cache, you might need to know the top peak value in advance or always check whether to expand it, which could be more costly than the dict access. I've already done the NumPy version now and it's indeed faster, but I don't think it fits as an answer here. I'll put a link in the next comment. \$\endgroup\$no comment– no comment2024年05月01日 21:47:30 +00:00Commented May 1, 2024 at 21:47
Explore related questions
See similar questions with these tags.