Extending die roll simulations for complex data science tasks

Question 1

I've developed a Python script that simulates die rolls and analyses the results. I'm now looking to extend and modify this code for more complex data science tasks and simulations.

Is this code simple and readable?
Are there more insightful statistics or visualisations that can be generated from the die roll data?
How could this code be extended or modified for more complex data science tasks or simulations?

import unittest
from random import randint
import matplotlib.pyplot as plt
import pandas as pd
def roll_die() -> int:
 """Simulate rolling a fair six-sided die and return the result"""
 return randint(1, 6)
num_rolls = 1000
die_rolls = [roll_die() for _ in range(num_rolls)]
df = pd.DataFrame({"Rolls": die_rolls})
roll_counts = df["Rolls"].value_counts().sort_index()
print(df)
print(roll_counts)
plt.bar(roll_counts.index, roll_counts.values)
plt.xlabel("Die Face")
plt.ylabel("Frequency")
plt.title("Die Roll Distribution")
plt.show()
class TestRollDie(unittest.TestCase):
 def test_roll_die(self):
 result = roll_die()
 self.assertTrue(1 <= result <= 6)
if __name__ == "__main__":
 unittest.main()

Question 2

Pandas should not be used for this purpose. Just use bare Numpy.

Do not generate random numbers one at a time. Use the vectorised Numpy random number generator.

Rather than value_counts on a dataframe, use bincount on an ndarray.

Prefer the Matplotlib OOP interface over the non-reentrant interface.

It's good that you've started writing unit tests, but... the unit test you've written isn't very useful - it's just testing randint itself (something that should already be working, and is tested in this case by the Python test suite).

For reproducible results, set a seed before calling random routines within a simulation.

Suggested

import matplotlib.pyplot as plt
import numpy as np
n_rolls = 10_000
n_sides = 6
rand = np.random.default_rng(seed=0)
die_rolls = rand.integers(low=1, high=1 + n_sides, size=n_rolls)
roll_counts = np.bincount(die_rolls)[1:]
fig, ax = plt.subplots()
ax.bar(np.arange(1, 1 + n_sides), roll_counts)
ax.set_xlabel('Die Face')
ax.set_ylabel('Frequency')
ax.set_title('Die Roll Distribution')
plt.show()

distro

Questions

Is this code simple and readable?

Basically, yes

Are there more insightful statistics or visualisations that can be generated from the die roll data?

That's impossible to answer, because it's a trivially simple simulation and (as the kids say) "there isn't a lot of there there". You could plot a kernel density estimate, or an empirical cumulative PDF; but these would not be interesting since the distribution is so simple.

How could this code be extended or modified for more complex data science tasks or simulations?

Impossible to answer. Different things are different.

Reinderien Reinderien 70.9k5 gold badges76 silver badges256 bronze badges · Accepted Answer · 2023-11-05 18:17:14Z

Pandas should not be used for this purpose. Just use bare Numpy.

Do not generate random numbers one at a time. Use the vectorised Numpy random number generator.

Rather than value_counts on a dataframe, use bincount on an ndarray.

Prefer the Matplotlib OOP interface over the non-reentrant interface.

It's good that you've started writing unit tests, but... the unit test you've written isn't very useful - it's just testing randint itself (something that should already be working, and is tested in this case by the Python test suite).

For reproducible results, set a seed before calling random routines within a simulation.

Suggested

import matplotlib.pyplot as plt
import numpy as np
n_rolls = 10_000
n_sides = 6
rand = np.random.default_rng(seed=0)
die_rolls = rand.integers(low=1, high=1 + n_sides, size=n_rolls)
roll_counts = np.bincount(die_rolls)[1:]
fig, ax = plt.subplots()
ax.bar(np.arange(1, 1 + n_sides), roll_counts)
ax.set_xlabel('Die Face')
ax.set_ylabel('Frequency')
ax.set_title('Die Roll Distribution')
plt.show()

distro

Questions

Is this code simple and readable?

Basically, yes

Are there more insightful statistics or visualisations that can be generated from the die roll data?

That's impossible to answer, because it's a trivially simple simulation and (as the kids say) "there isn't a lot of there there". You could plot a kernel density estimate, or an empirical cumulative PDF; but these would not be interesting since the distribution is so simple.

How could this code be extended or modified for more complex data science tasks or simulations?

Impossible to answer. Different things are different.

Stack Exchange Network

Extending die roll simulations for complex data science tasks

1 Answer 1

Suggested

Questions

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Extending die roll simulations for complex data science tasks

1 Answer 1

Suggested

Questions

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions