Coin Flip Streak from Automate the Boring Stuff with Python

Question 1

I apologize beforehand if the question has been repeated so many times. This is a practice task from Automate the Boring Stuff with Python. In brief, the task entails writing a code that carries out an experiment of checking if there is a streak of 6 'heads' or 'tails' in 100 coin tosses, then replicates it 10,000 times and gives a percentage of the success rate.

When I wrote the code, I tried to be different by making the code applicable to any streaks in a number of predetermined experiments (in my case, the sample was 1 million coin tosses). I also tried to find the longest streak possible in that said experiment.

I also want to apologize beforehand that the comments were awfully verbose.

import random, copy, time
def torai(seq,pop): # seq is for #=streak, pop is for total sample/population/experiment
 # Creating a random chance of heads and tails
 tosses = []
 for i in range(pop):
 tosses.append(random.randint(1,2)) # 1 and 2 for head and tail, and vice versa
 # Defining initial values for the main loop
 streak = 0 # Iterated streak
 curlongstr = 0 # Current longest streak
 longeststr = 0 # Longest streak evaluated
 peak = [] # Record local streaks from 'tosses' list
 # The main loop
 for i in range(len(tosses)): # Looping based on list indexes
 if i == 0: # Conditional for preventing tosses[0] == tosses[-1]
 continue
 elif tosses[i] == tosses[i-1]: # Conditional for checking if an i element has the same value as the previous element value, i-1
 streak += 1 # Adding tally mark if the line above is fulfilled
 if i == len(tosses)-1: # A nested conditional for adding the last tally mark from 'tosses' into the overall list of steaks 'peak', see lines 27-33
 peak.append(streak)
 elif tosses[i] != tosses[i-1]: # Conditional for checking if an i element value is different than the previous element value, i-1
 curlongstr = copy.copy(streak) # Creating a variable by returning a copy of streak before it resets to 0, see line 31
 if curlongstr > longeststr: # A nested conditional for comparing the current longest streak and the longest streak that has happened when looping the 'tosses' list
 longeststr = curlongstr
 streak = 0 # This is where streaks ended and then resets to 0, so before that, the value of the streak is copied first, see line 28
 if curlongstr > streak: # After streak is reset to 0, the value of current long streak is compared to 0, so that we create a list of streaks from 'tosses' list
 peak.append(curlongstr)
 truepeak = []
 for i in peak: # Example: a 2-streak is equal to either [1,1,1] or [2,2,2], a 4-streak is either [1,1,1,1,1] or [2,2,2,2,2]
 truepeak.append(i+1)
 apr = []
 # Loop for finding how many #-streaks happened
 for i in truepeak:
 if i == seq:
 apr.append(i)
 print('%s-streak count: ' %seq, len(apr)) # Total of #-streaks happened in 'tosses' list
 print('%s-streak prob (percent): ' %seq, (len(apr)/pop)*100) # Calculating probability if how many #-streak happened in given n times tosses
 print('longest streak: ',longeststr + 1) # Similar reason as line 36
 print('process time: ',time.process_time(), 'second\n')
 return (len(apr)/pop)*100
x = torai(2,1000000)
y = torai(6,1000000)
z = torai(10,1000000)
print(x, y, z)

I tried to increase the sample to 10 million coin tosses. However, the program will run 9-10 slower each time the function was called.

My request are can anyone check whether if the result (probability of n-streak) is right or not and are there any ways to make the code and process time shorter?

Question 2

What on earth is "torai" supposed to mean?

Question 3

It's just an arbitrary name :), from トライ (try). I'll change the function name to be more understandable and edit other things based on Aryan's input.

Question 4

Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers .

Question 5

Bugs

torai(1, 10000)

This should print something around 50 %, since it's the individual count. But instead, it prints

1-streak count: 0
1-streak prob (percent): 0.0
longest streak: 19
process time: 0.046875 second

Avoid too many comments

There are too many comments in your code, which makes the code look unnecessarily convoluted. What I recommend is the usage of docstrings. IMO It isn't very important here, but its better than a million comments

def torai(seq,pop): 
 tosses = []
 for i in range(pop):
 tosses.append(random.randint(1,2))
 streak = 0
 curlongstr = 0
 longeststr = 0
 peak = []
 for i in range(len(tosses)): 
 if i == 0: 
 continue
 elif tosses[i] == tosses[i-1]: 
 streak += 1 
 if i == len(tosses)-1: 
 peak.append(streak)
 elif tosses[i] != tosses[i-1]: 
 curlongstr = copy.copy(streak) 
 if curlongstr > longeststr: 
 longeststr = curlongstr
 streak = 0 
 if curlongstr > streak: 
 peak.append(curlongstr)
 truepeak = []
 for i in peak: 
 truepeak.append(i+1)
 apr = []
 
 for i in truepeak:
 if i == seq:
 apr.append(i)
 print('%s-streak count: ' %seq, len(apr)) 
 print('%s-streak prob (percent): ' %seq, (len(apr)/pop)*100) 
 print('longest streak: ',longeststr + 1) 
 print('process time: ',time.process_time(), 'second\n')
 return (len(apr)/pop)*100

Simplify #1

 for i in range(len(tosses)): 
 if i == 0: 
 continue

It's clear to me that you want to skip the first element. In that case, you can specify the starting point for range()

 for i in range(1, len(tosses)):

Simplify #2

 for i in range(pop):
 tosses.append(random.randint(1,2))

Since this is going to be an immutable sequence, use a tuple, with a generator

tosses = tuple(random.randint(1, 2) for _ in range(pop)

Simplify #3

 if curlongstr > longeststr:
 longeststr = curlongstr

Your condition is simple. The new value is always the larger of the two
Just use the max() function

 longeststr = max(longeststr, curlongstr)

Simplify #4

truepeak = []
 for i in peak:
 truepeak.append(i+1)

You're creating an entirely new list, and fill it up with the exact same elements as peak except with a constant 1 added to them. Very inefficient. Either add the values with the +1 from the beginning or use the +1 where necessary.

 for i in peak:
 if i + 1 == seq:
 apr.append(i + 1)

But again, all you do with apr is get its length, so there's absolutely no point in maintaining so many lists when all you have to do is keep a counter. That also removes the need to maintain peak

Calculate tosses as you go

After removing all of the previous loops, there will still be 2 left. One for calculating the tosses and the other goes through them to calculate them. What I propose is, go through it only once, and keep track of two things. The current flip and the previous flip

def torai(seq, iterations ):
 total_streaks = 0
 previous_flip = random.randint(1, 2)
 for _ in range(1, iterations):
 current_flip = random.randint(1, 2)
 if current_flip == previous_flip:
 total_streaks += 1
 # other calculations
 current_flip = previous_flip
 print(f"Total streaks: {total_streaks}")

Question 6

I disagree with all of your removal of comments—instead, I would take the comment and make that the variable name. For example, curlongstr = 0 # Current longest streak could be replaced with current_longest_streak.

Question 7

@lights0123 A few comments here and there are fine and sometimes needed but in this case, it serves very less purpose, more in making the code less readable.

user228914user228914 · Accepted Answer · 2020-11-18 10:04:40Z

Bugs

torai(1, 10000)

This should print something around 50 %, since it's the individual count. But instead, it prints

1-streak count: 0
1-streak prob (percent): 0.0
longest streak: 19
process time: 0.046875 second

Avoid too many comments

There are too many comments in your code, which makes the code look unnecessarily convoluted. What I recommend is the usage of docstrings. IMO It isn't very important here, but its better than a million comments

def torai(seq,pop): 
 tosses = []
 for i in range(pop):
 tosses.append(random.randint(1,2))
 streak = 0
 curlongstr = 0
 longeststr = 0
 peak = []
 for i in range(len(tosses)): 
 if i == 0: 
 continue
 elif tosses[i] == tosses[i-1]: 
 streak += 1 
 if i == len(tosses)-1: 
 peak.append(streak)
 elif tosses[i] != tosses[i-1]: 
 curlongstr = copy.copy(streak) 
 if curlongstr > longeststr: 
 longeststr = curlongstr
 streak = 0 
 if curlongstr > streak: 
 peak.append(curlongstr)
 truepeak = []
 for i in peak: 
 truepeak.append(i+1)
 apr = []
 
 for i in truepeak:
 if i == seq:
 apr.append(i)
 print('%s-streak count: ' %seq, len(apr)) 
 print('%s-streak prob (percent): ' %seq, (len(apr)/pop)*100) 
 print('longest streak: ',longeststr + 1) 
 print('process time: ',time.process_time(), 'second\n')
 return (len(apr)/pop)*100

Simplify #1

 for i in range(len(tosses)): 
 if i == 0: 
 continue

It's clear to me that you want to skip the first element. In that case, you can specify the starting point for range()

 for i in range(1, len(tosses)):

Simplify #2

 for i in range(pop):
 tosses.append(random.randint(1,2))

Since this is going to be an immutable sequence, use a tuple, with a generator

tosses = tuple(random.randint(1, 2) for _ in range(pop)

Simplify #3

 if curlongstr > longeststr:
 longeststr = curlongstr

Your condition is simple. The new value is always the larger of the two
Just use the max() function

 longeststr = max(longeststr, curlongstr)

Simplify #4

truepeak = []
 for i in peak:
 truepeak.append(i+1)

You're creating an entirely new list, and fill it up with the exact same elements as peak except with a constant 1 added to them. Very inefficient. Either add the values with the +1 from the beginning or use the +1 where necessary.

 for i in peak:
 if i + 1 == seq:
 apr.append(i + 1)

But again, all you do with apr is get its length, so there's absolutely no point in maintaining so many lists when all you have to do is keep a counter. That also removes the need to maintain peak

Calculate tosses as you go

After removing all of the previous loops, there will still be 2 left. One for calculating the tosses and the other goes through them to calculate them. What I propose is, go through it only once, and keep track of two things. The current flip and the previous flip

def torai(seq, iterations ):
 total_streaks = 0
 previous_flip = random.randint(1, 2)
 for _ in range(1, iterations):
 current_flip = random.randint(1, 2)
 if current_flip == previous_flip:
 total_streaks += 1
 # other calculations
 current_flip = previous_flip
 print(f"Total streaks: {total_streaks}")

I disagree with all of your removal of comments—instead, I would take the comment and make that the variable name. For example, curlongstr = 0 # Current longest streak could be replaced with current_longest_streak.
@lights0123 A few comments here and there are fine and sometimes needed but in this case, it serves very less purpose, more in making the code less readable.

Stack Exchange Network

Coin Flip Streak from Automate the Boring Stuff with Python

1 Answer 1

Bugs

Avoid too many comments

Simplify #1

Simplify #2

Simplify #3

Simplify #4

Calculate tosses as you go

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Coin Flip Streak from Automate the Boring Stuff with Python

1 Answer 1

Bugs

Avoid too many comments

Simplify #1

Simplify #2

Simplify #3

Simplify #4

Calculate tosses as you go

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions