2
\$\begingroup\$

I've developed a Python script that simulates die rolls and analyses the results. I'm now looking to extend and modify this code for more complex data science tasks and simulations.

  • Is this code simple and readable?
  • Are there more insightful statistics or visualisations that can be generated from the die roll data?
  • How could this code be extended or modified for more complex data science tasks or simulations?
import unittest
from random import randint
import matplotlib.pyplot as plt
import pandas as pd
def roll_die() -> int:
 """Simulate rolling a fair six-sided die and return the result"""
 return randint(1, 6)
num_rolls = 1000
die_rolls = [roll_die() for _ in range(num_rolls)]
df = pd.DataFrame({"Rolls": die_rolls})
roll_counts = df["Rolls"].value_counts().sort_index()
print(df)
print(roll_counts)
plt.bar(roll_counts.index, roll_counts.values)
plt.xlabel("Die Face")
plt.ylabel("Frequency")
plt.title("Die Roll Distribution")
plt.show()
class TestRollDie(unittest.TestCase):
 def test_roll_die(self):
 result = roll_die()
 self.assertTrue(1 <= result <= 6)
if __name__ == "__main__":
 unittest.main()
Reinderien
70.9k5 gold badges76 silver badges256 bronze badges
asked Nov 5, 2023 at 15:19
\$\endgroup\$

1 Answer 1

2
\$\begingroup\$

Pandas should not be used for this purpose. Just use bare Numpy.

Do not generate random numbers one at a time. Use the vectorised Numpy random number generator.

Rather than value_counts on a dataframe, use bincount on an ndarray.

Prefer the Matplotlib OOP interface over the non-reentrant interface.

It's good that you've started writing unit tests, but... the unit test you've written isn't very useful - it's just testing randint itself (something that should already be working, and is tested in this case by the Python test suite).

For reproducible results, set a seed before calling random routines within a simulation.

Suggested

import matplotlib.pyplot as plt
import numpy as np
n_rolls = 10_000
n_sides = 6
rand = np.random.default_rng(seed=0)
die_rolls = rand.integers(low=1, high=1 + n_sides, size=n_rolls)
roll_counts = np.bincount(die_rolls)[1:]
fig, ax = plt.subplots()
ax.bar(np.arange(1, 1 + n_sides), roll_counts)
ax.set_xlabel('Die Face')
ax.set_ylabel('Frequency')
ax.set_title('Die Roll Distribution')
plt.show()

distro

Questions

Is this code simple and readable?

Basically, yes

Are there more insightful statistics or visualisations that can be generated from the die roll data?

That's impossible to answer, because it's a trivially simple simulation and (as the kids say) "there isn't a lot of there there". You could plot a kernel density estimate, or an empirical cumulative PDF; but these would not be interesting since the distribution is so simple.

How could this code be extended or modified for more complex data science tasks or simulations?

Impossible to answer. Different things are different.

answered Nov 5, 2023 at 18:17
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.