Plotting Collatz conjecture values - Python

Question 1

I am writing a program to display the behavior or certain seeds when you apply the Collatz conjecture algorithm to them. I would like to know what I should change and also if there is a way to see which seed produced a certain plot. I am also having some problems with the color.

import matplotlib.pyplot as plt
import numpy as np
def collatz(seed):
 """Apply Collatz conjecture algorithm and stores all the values of the seed after every iteration to display them later"""
 y = [seed]
 while seed != 1:
 if seed % 2 == 0:
 seed = seed/2
 else:
 seed = (3*seed) + 1
 y.append(seed)
 return y
colors = plt.cm.bwr(np.linspace(0, 1, 1500))
for i in range(1, 1500): #collecting data by applying the algorithm to a certain set of seeds
 Y = collatz(i)
 plt.plot(Y, color = colors[i])
plt.title("Collatz conjecture")
plt.xlabel("Number of iterations until stuck")
plt.ylabel("Values reached")
plt.show()

Question 2

Could you please show the resulting plot?

Question 3

I have rolled back Rev 4 → 3. Please see What should I do when someone answers my question?.

Question 4

Note how you're manually repeating the number of seeds (1500):

colors = plt.cm.bwr(np.linspace(0, 1, 1500))
for i in range(1, 1500):
 ...

That's always a good sign it should be a variable.

One option is to put the plotting code into a function parametrized by n_seeds (and then add 1 to make the naming accurate):

def plot_collatz(n_seeds=1500):
# ------------
 """Plot Collatz sequences for a given number of seeds"""
 colors = plt.cm.bwr(np.linspace(0, 1, n_seeds+1))
 # ---------
 for seed in range(1, n_seeds+1):
 # ---------
 plt.plot(collatz(seed), color=colors[seed])
 
 plt.title("Collatz conjecture")
 plt.xlabel("Number of iterations until stuck")
 plt.ylabel("Values reached")
 plt.show()
plot_collatz(10)

plot_collatz(10) output

Question 5

Didn't you fall into your own trap by having n_seeds+1 twice? Why not make this another variable? :)

Question 6

Well that depends what the "trap" is. For a beginner review, I think the main takeaway can just be that if we want to change the number of seeds, now we only have to do it once. That takeaway message isn't affected by having multiple n_seeds+1 (and if we were to address the latter, I wouldn't recommend making a throwaway variable for n_seeds+1).

Question 7

@nocomment This figure is the output of plot_collatz(10) (preceding code block), so it doesn't contain seed 1161. If you're curious about the higher seeds, the figure gets quite busy, but here's the version with 1500 sequences.

Question 8

@nocomment as the answer above says, the main idea behind this plot is to show how certain seeds behave if you apply the algorithm. Yeah, it isn't that great to see which seed produced a certain plot but I think it makes a good job at displaying how chaotic the sequence is. Maybe paired with a scatter plot it would make more sense

Question 9

@tdy Ah yes, I missed that you changed that. Why did you?

Question 10

Integer arithmetic

seed = seed/2 takes the value of seed (initially an integer) and divides it by 2, and stores the result as a float back in seed.

>>> 10 / 2
5.0

If the collatz sequence value ever exceeds \2ドル^{53}\$, seed % 2 == 0 will always be true, due to the limited float precision.

>>> 2 ** 53 + 1.0 # You'd expect this to be odd, but it's not!
9007199254740992.0

Instead, you should be using integer division: seed = seed // 2. Python's integers have variable length representations, so can easily hold exceedingly large integer values precisely.

You could also use an Augmented Assignment operator here (seed //= 2). It saves typing a variable name an additional time, and -- when the left-hand-side is more complex than a simple variable, such as self.seed or a[i] -- avoids evaluating the term a second time, which can be a speed improvement.

Question 11

Thanks for the suggestion. So I should use this "contracted form" just for complex cases, right? Also, I don't understand how is it possible that 2^53 causes error: there is only one float digit (0)

Question 12

@Lorenzo I'd probably use the //= form here as well. Less to type/maintain and it's how people usually do it. I've avoided it with ints only when hardcore micro-optimizing.

Question 13

Okay so it makes the code more readable and the double // makes so that i am not storing useless float digits. Does this make the program faster?

Question 14

@Lorenzo Actually the // instead of / might make it slower. Because ints tend to be slower than floats, due to their arbitrary size that needs to be handled. You could measure to find out. But... you really should use ints. Otherwise you risk getting wrong results!

Question 15

@AJNeufeld Btw here's my benchmark for the 27 vs 29 ns that I mentioned.

Question 16

identifiers

The literature tends to speak of "values" generated by the \3ドルn+1\$ process. A "seed" is used to start the process, and we see successive values grow from there. It is perfectly sensible to accept a seed parameter. But during the growth loop, please use another name, perhaps n.

annotation

It wouldn't hurt to point out we accept seed: int, since a variant rational processes might perform the hailstone operation on a seed: Fraction. (Though I confess the lack of from fractions import Fraction is kind of a tip-off, here.)

It would also set you up for a numba JIT decorator.

single responsibility

Thank you for this beautiful docstring.

def collatz(seed):
 """Apply Collatz conjecture algorithm and stores all the values of the seed after every iteration to display them later"""

It is concise and accurate. It also helps us to see we are doing

compute and
store

Consider focusing on compute, so storage is someone else's problem.

from typing import Generator
def collatz(seed: int) -> Generator[int, None, None]:
 """Generates hailstone values."""
 n = seed
 while n > 1:
 if n % 2:
 n = 3 * n + 1
 else:
 n //= 2
 yield n
def main(num_seeds: int = 1500) -> None:
 ...
 for seed in range(1, num_seeds + 1):
 y = list(collatz(seed))
 ...

color maps

The cm.bwr colormap is very nice. You might consider playing with cyclic colormaps, to highlight the chaotic non-linear nature of the data you're plotting. Or cycle through a categorical map.

You may find that plotting points rather than lines, and using a partly transparent beta, allows you to achieve higher visual data density.

caching

If you summarize a given seed as producing an (iterations, max_value) tuple, you might find the @lru_cache or @cache decorators to be helpful; or introduce your own caching array. The literature sometimes uses delay to describe number of iterations for a given seed. Leavens and Vermeulen did some interesting work in this space in 1992.

Of course, the calculations we're doing are so lightweight that time spent on cache checks threatens to swamp the time spent calculating. So we'd want to unroll a bit, perhaps by doing half a dozen "ordinary" function calls followed by one where we check the cache. Or equivalently, to handle every seventh seed, do a modulo congruent to zero test. Or perhaps use some bigger prime.

Question 17

Thanks for the exhaustive answer. Wouldn't a cyclic colormap "hide" the chaos of the algorithm? I think a non cyclic colormap makes it easier to estimate the seed: small seeds = bluish color, big seeds = reddish color (obviously in respect to the chosen interval). Btw this is my first matplotlib program and >I am still learning the basics of python so I don't really know how to check cache.

Question 18

Well, Tufte tells us that there's more than one perceptual channel to access the visual cortex, and more than one thing that that cortex is good at. It all depends on what aspect of the underlying data you wish to highlight.

Question 19

About caching: looking up ints in a dict is also pretty fast, especially when they're a dense as they might be here. Btw if caching actually doesn't help, then another idea: They're already using NumPy. We could run all seeds in parallel. Put them all in an array, then do a step for all of them, then filter out the 1s. Until none are left. That could be competitive/interesting.

Question 20

The cPython dict implementation is impressive, but for the current "dense" situation it can't possibly be competitive with array access, as it necessarily has to do more stuff. // Running all seeds in parallel certainly does hold some charm. Go for it! Create an implementation, post an Answer; I will upvote it.

Question 21

Oh I wasn't comparing it with array (or list?) access. I was talking about your last paragraph, which sounds like caching with a dict might make their code slower. Although with an array/list cache, you might need to know the top peak value in advance or always check whether to expand it, which could be more costly than the dict access. I've already done the NumPy version now and it's indeed faster, but I don't think it fits as an answer here. I'll put a link in the next comment.

tdy tdy 2,2661 gold badge10 silver badges21 bronze badges · Accepted Answer · 2024-04-30 14:50:53Z

6

\$\begingroup\$

Note how you're manually repeating the number of seeds (1500):

colors = plt.cm.bwr(np.linspace(0, 1, 1500))
for i in range(1, 1500):
 ...

That's always a good sign it should be a variable.

One option is to put the plotting code into a function parametrized by n_seeds (and then add 1 to make the naming accurate):

def plot_collatz(n_seeds=1500):
# ------------
 """Plot Collatz sequences for a given number of seeds"""
 colors = plt.cm.bwr(np.linspace(0, 1, n_seeds+1))
 # ---------
 for seed in range(1, n_seeds+1):
 # ---------
 plt.plot(collatz(seed), color=colors[seed])
 
 plt.title("Collatz conjecture")
 plt.xlabel("Number of iterations until stuck")
 plt.ylabel("Values reached")
 plt.show()
plot_collatz(10)

plot_collatz(10) output

Share

answered Apr 30, 2024 at 14:50

tdy's user avatar

tdy tdy

2,2661 gold badge10 silver badges21 bronze badges

\$\endgroup\$

6

\$\begingroup\$ Didn't you fall into your own trap by having n_seeds+1 twice? Why not make this another variable? :) \$\endgroup\$

infinitezero
– infinitezero

2024年05月01日 06:53:47 +00:00
Commented May 1, 2024 at 6:53
\$\begingroup\$ Well that depends what the "trap" is. For a beginner review, I think the main takeaway can just be that if we want to change the number of seeds, now we only have to do it once. That takeaway message isn't affected by having multiple n_seeds+1 (and if we were to address the latter, I wouldn't recommend making a throwaway variable for n_seeds+1). \$\endgroup\$

tdy
– tdy

2024年05月01日 11:14:27 +00:00
Commented May 1, 2024 at 11:14
\$\begingroup\$ @nocomment This figure is the output of plot_collatz(10) (preceding code block), so it doesn't contain seed 1161. If you're curious about the higher seeds, the figure gets quite busy, but here's the version with 1500 sequences. \$\endgroup\$

tdy
– tdy

2024年05月02日 01:52:21 +00:00
Commented May 2, 2024 at 1:52
\$\begingroup\$ @nocomment as the answer above says, the main idea behind this plot is to show how certain seeds behave if you apply the algorithm. Yeah, it isn't that great to see which seed produced a certain plot but I think it makes a good job at displaying how chaotic the sequence is. Maybe paired with a scatter plot it would make more sense \$\endgroup\$

Lorenzo
– Lorenzo

2024年05月02日 04:54:17 +00:00
Commented May 2, 2024 at 4:54
\$\begingroup\$ @tdy Ah yes, I missed that you changed that. Why did you? \$\endgroup\$

no comment
– no comment

2024年05月02日 07:04:23 +00:00
Commented May 2, 2024 at 7:04

| Show 1 more comment

Stack Exchange Network

Plotting Collatz conjecture values - Python

3 Answers 3

Integer arithmetic

identifiers

annotation

single responsibility

color maps

caching

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Plotting Collatz conjecture values - Python

3 Answers 3

Integer arithmetic

identifiers

annotation

single responsibility

color maps

caching

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions