I apologize beforehand if the question has been repeated so many times. This is a practice task from Automate the Boring Stuff with Python. In brief, the task entails writing a code that carries out an experiment of checking if there is a streak of 6 'heads' or 'tails' in 100 coin tosses, then replicates it 10,000 times and gives a percentage of the success rate.
When I wrote the code, I tried to be different by making the code applicable to any streaks in a number of predetermined experiments (in my case, the sample was 1 million coin tosses). I also tried to find the longest streak possible in that said experiment.
I also want to apologize beforehand that the comments were awfully verbose.
import random, copy, time
def torai(seq,pop): # seq is for #=streak, pop is for total sample/population/experiment
# Creating a random chance of heads and tails
tosses = []
for i in range(pop):
tosses.append(random.randint(1,2)) # 1 and 2 for head and tail, and vice versa
# Defining initial values for the main loop
streak = 0 # Iterated streak
curlongstr = 0 # Current longest streak
longeststr = 0 # Longest streak evaluated
peak = [] # Record local streaks from 'tosses' list
# The main loop
for i in range(len(tosses)): # Looping based on list indexes
if i == 0: # Conditional for preventing tosses[0] == tosses[-1]
continue
elif tosses[i] == tosses[i-1]: # Conditional for checking if an i element has the same value as the previous element value, i-1
streak += 1 # Adding tally mark if the line above is fulfilled
if i == len(tosses)-1: # A nested conditional for adding the last tally mark from 'tosses' into the overall list of steaks 'peak', see lines 27-33
peak.append(streak)
elif tosses[i] != tosses[i-1]: # Conditional for checking if an i element value is different than the previous element value, i-1
curlongstr = copy.copy(streak) # Creating a variable by returning a copy of streak before it resets to 0, see line 31
if curlongstr > longeststr: # A nested conditional for comparing the current longest streak and the longest streak that has happened when looping the 'tosses' list
longeststr = curlongstr
streak = 0 # This is where streaks ended and then resets to 0, so before that, the value of the streak is copied first, see line 28
if curlongstr > streak: # After streak is reset to 0, the value of current long streak is compared to 0, so that we create a list of streaks from 'tosses' list
peak.append(curlongstr)
truepeak = []
for i in peak: # Example: a 2-streak is equal to either [1,1,1] or [2,2,2], a 4-streak is either [1,1,1,1,1] or [2,2,2,2,2]
truepeak.append(i+1)
apr = []
# Loop for finding how many #-streaks happened
for i in truepeak:
if i == seq:
apr.append(i)
print('%s-streak count: ' %seq, len(apr)) # Total of #-streaks happened in 'tosses' list
print('%s-streak prob (percent): ' %seq, (len(apr)/pop)*100) # Calculating probability if how many #-streak happened in given n times tosses
print('longest streak: ',longeststr + 1) # Similar reason as line 36
print('process time: ',time.process_time(), 'second\n')
return (len(apr)/pop)*100
x = torai(2,1000000)
y = torai(6,1000000)
z = torai(10,1000000)
print(x, y, z)
I tried to increase the sample to 10 million coin tosses. However, the program will run 9-10 slower each time the function was called.
My request are can anyone check whether if the result (probability of n-streak) is right or not and are there any ways to make the code and process time shorter?
-
\$\begingroup\$ What on earth is "torai" supposed to mean? \$\endgroup\$superb rain– superb rain2020年11月18日 11:58:10 +00:00Commented Nov 18, 2020 at 11:58
-
\$\begingroup\$ It's just an arbitrary name :), from トライ (try). I'll change the function name to be more understandable and edit other things based on Aryan's input. \$\endgroup\$yfr– yfr2020年11月18日 12:17:09 +00:00Commented Nov 18, 2020 at 12:17
-
1\$\begingroup\$ Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers . \$\endgroup\$Mast– Mast ♦2020年11月19日 09:04:43 +00:00Commented Nov 19, 2020 at 9:04
1 Answer 1
Bugs
torai(1, 10000)
This should print something around 50 %
, since it's the individual count. But instead, it prints
1-streak count: 0
1-streak prob (percent): 0.0
longest streak: 19
process time: 0.046875 second
Avoid too many comments
There are too many comments in your code, which makes the code look unnecessarily convoluted. What I recommend is the usage of docstrings. IMO It isn't very important here, but its better than a million comments
def torai(seq,pop):
tosses = []
for i in range(pop):
tosses.append(random.randint(1,2))
streak = 0
curlongstr = 0
longeststr = 0
peak = []
for i in range(len(tosses)):
if i == 0:
continue
elif tosses[i] == tosses[i-1]:
streak += 1
if i == len(tosses)-1:
peak.append(streak)
elif tosses[i] != tosses[i-1]:
curlongstr = copy.copy(streak)
if curlongstr > longeststr:
longeststr = curlongstr
streak = 0
if curlongstr > streak:
peak.append(curlongstr)
truepeak = []
for i in peak:
truepeak.append(i+1)
apr = []
for i in truepeak:
if i == seq:
apr.append(i)
print('%s-streak count: ' %seq, len(apr))
print('%s-streak prob (percent): ' %seq, (len(apr)/pop)*100)
print('longest streak: ',longeststr + 1)
print('process time: ',time.process_time(), 'second\n')
return (len(apr)/pop)*100
Simplify #1
for i in range(len(tosses)):
if i == 0:
continue
It's clear to me that you want to skip the first element. In that case, you can specify the starting point for range()
for i in range(1, len(tosses)):
Simplify #2
for i in range(pop):
tosses.append(random.randint(1,2))
Since this is going to be an immutable sequence, use a tuple, with a generator
tosses = tuple(random.randint(1, 2) for _ in range(pop)
Simplify #3
if curlongstr > longeststr:
longeststr = curlongstr
Your condition is simple. The new value is always the larger of the two
Just use the max()
function
longeststr = max(longeststr, curlongstr)
Simplify #4
truepeak = []
for i in peak:
truepeak.append(i+1)
You're creating an entirely new list, and fill it up with the exact same elements as peak
except with a constant 1
added to them. Very inefficient. Either add the values with the +1
from the beginning or use the +1
where necessary.
for i in peak:
if i + 1 == seq:
apr.append(i + 1)
But again, all you do with apr
is get its length, so there's absolutely no point in maintaining so many lists when all you have to do is keep a counter. That also removes the need to maintain peak
Calculate tosses as you go
After removing all of the previous loops, there will still be 2 left. One for calculating the tosses and the other goes through them to calculate them. What I propose is, go through it only once, and keep track of two things. The current flip and the previous flip
def torai(seq, iterations ):
total_streaks = 0
previous_flip = random.randint(1, 2)
for _ in range(1, iterations):
current_flip = random.randint(1, 2)
if current_flip == previous_flip:
total_streaks += 1
# other calculations
current_flip = previous_flip
print(f"Total streaks: {total_streaks}")
-
1\$\begingroup\$ I disagree with all of your removal of comments—instead, I would take the comment and make that the variable name. For example,
curlongstr = 0 # Current longest streak
could be replaced withcurrent_longest_streak
. \$\endgroup\$lights0123– lights01232020年11月18日 17:24:33 +00:00Commented Nov 18, 2020 at 17:24 -
1\$\begingroup\$ @lights0123 A few comments here and there are fine and sometimes needed but in this case, it serves very less purpose, more in making the code less readable. \$\endgroup\$user228914– user2289142020年11月18日 17:47:35 +00:00Commented Nov 18, 2020 at 17:47