I've developed a Python script that simulates die rolls and analyses the results. I'm now looking to extend and modify this code for more complex data science tasks and simulations.
- Is this code simple and readable?
- Are there more insightful statistics or visualisations that can be generated from the die roll data?
- How could this code be extended or modified for more complex data science tasks or simulations?
import unittest
from random import randint
import matplotlib.pyplot as plt
import pandas as pd
def roll_die() -> int:
"""Simulate rolling a fair six-sided die and return the result"""
return randint(1, 6)
num_rolls = 1000
die_rolls = [roll_die() for _ in range(num_rolls)]
df = pd.DataFrame({"Rolls": die_rolls})
roll_counts = df["Rolls"].value_counts().sort_index()
print(df)
print(roll_counts)
plt.bar(roll_counts.index, roll_counts.values)
plt.xlabel("Die Face")
plt.ylabel("Frequency")
plt.title("Die Roll Distribution")
plt.show()
class TestRollDie(unittest.TestCase):
def test_roll_die(self):
result = roll_die()
self.assertTrue(1 <= result <= 6)
if __name__ == "__main__":
unittest.main()
1 Answer 1
Pandas should not be used for this purpose. Just use bare Numpy.
Do not generate random numbers one at a time. Use the vectorised Numpy random number generator.
Rather than value_counts
on a dataframe, use bincount
on an ndarray
.
Prefer the Matplotlib OOP interface over the non-reentrant interface.
It's good that you've started writing unit tests, but... the unit test you've written isn't very useful - it's just testing randint
itself (something that should already be working, and is tested in this case by the Python test suite).
For reproducible results, set a seed before calling random routines within a simulation.
Suggested
import matplotlib.pyplot as plt
import numpy as np
n_rolls = 10_000
n_sides = 6
rand = np.random.default_rng(seed=0)
die_rolls = rand.integers(low=1, high=1 + n_sides, size=n_rolls)
roll_counts = np.bincount(die_rolls)[1:]
fig, ax = plt.subplots()
ax.bar(np.arange(1, 1 + n_sides), roll_counts)
ax.set_xlabel('Die Face')
ax.set_ylabel('Frequency')
ax.set_title('Die Roll Distribution')
plt.show()
Questions
Is this code simple and readable?
Basically, yes
Are there more insightful statistics or visualisations that can be generated from the die roll data?
That's impossible to answer, because it's a trivially simple simulation and (as the kids say) "there isn't a lot of there there". You could plot a kernel density estimate, or an empirical cumulative PDF; but these would not be interesting since the distribution is so simple.
How could this code be extended or modified for more complex data science tasks or simulations?
Impossible to answer. Different things are different.