Probability of event using Central Limit Theorem + plotting results

Question 1

I've been doing a Udemy course called: "Statistics for Data Science" and I decided to solve one of the homework with Python to kill two birds with one rocket #elon.

The task was:

The team of traders under your supervision earns profits which can be approximated with Laplace distribution. Profits (of any trade) have a mean of 95ドル.70 and a std. dev. of 1,247ドル. Your team makes about 100 trades every week.

Questions:

A. What is the probability of my team making a loss in any given week? B. What is the probability of my team making over 20,000ドル in any given week?

As I just started to learn Python I would be happy for some hints and opinions.

# set up
import math
import scipy.stats as st
import matplotlib.pyplot as plt
import numpy as np
# Data
mu = 95.7 # mean
sigma = 1247 # standard deviation
n = 100 # sampling size (trades here)
xcritical1 = 0 # making a loss
xcritical2 = 20000 / n # Earning 20ドルk a weak by 100 trades
mu_1 = mu # Based on Central Limit Theorem
sigma_1 = sigma / (math.sqrt(n)) # Based on CLT
# Calc
def Z(xcritical, mu, sigma):
 return (xcritical - mu) / sigma # Standard Score (z-value)
Z1 = Z(xcritical1, mu_1, sigma_1)
Z2 = Z(xcritical2, mu_1, sigma_1)
P1 = st.norm.cdf(Z1) # Cumulative Distribution Function for ND
P2 = 1 - st.norm.cdf(Z2)
print('A. Probability of making loss in any given week is', '{0:.4g}'.format(P1*100) + '%')
print('B. Probability of making over 20ドルk in any given week is', '{0:.4g}'.format(P2*100) + '%')
# Plots
def draw_z_score(x, cond, mu, sigma, title):
 y = st.norm.pdf(x, mu, sigma) # Probability Density function for ND
 z = x[cond]
 plt.plot(x, y)
 plt.fill_between(z, 0, st.norm.pdf(z, mu, sigma))
 plt.title(title)
 plt.text(-300, 0.0020, r'$\mu=' + str(mu_1) + ',\ \sigma=' + str(sigma_1) + '$')
 plt.show()
x = np.arange(-400, 500, 1) # Fixed interval by experimenting
title1 = 'Probability of making loss: ' + '{0:.4g}'.format(P1*100) + '%'
title2 = 'Probability of earning more than 20ドルk: ' + '{0:.4g}'.format(P2*100) + '%'
draw_z_score(x, x < xcritical1, mu_1, sigma_1, title1)
draw_z_score(x, x > xcritical2, mu_1, sigma_1, title2)

Question 2

Why are you using a normal distribution, instead of a Laplace distribution as stated in the task description?

Question 3

Because any distribution can be approximated by Normal Distribution by Central Limit Theorem and this is what they expected me in this task.

Question 4

Fair enough (if you mean "the sum of random variables from any random distribution", instead of "any random distribution"). And in this case the approximation is actually already good enough with N=100: repl.it/@graipher/Sumlaplace-vs-Gauss

Question 5

You seem to know about str.format (since you use it), yet you are still doing string addition:

title1 = 'Probability of making loss: ' + '{0:.4g}'.format(P1*100) + '%'
title2 = 'Probability of earning more than 20ドルk: ' + '{0:.4g}'.format(P2*100) + '%'

Just make it a single string:

title1 = 'Probability of making loss: {0:.4g}%'.format(P1*100)
title2 = 'Probability of earning more than 20ドルk: {0:.4g}%'.format(P2*100)

You could also use the % formatting option for floats (described in the fifth table here), which automatically multiplies with 100 and adds the % sign. The number of digits after the decimal point work differently, though, compared to the g format:

"{:.4%}".format(0.5)
# '50.0000%'
"{:.4g}%".format(0.5*100.)
# '50%'
"{:.4g}%".format(0.5123123*100.)
# '51.23%'
"{:.4%}".format(0.5123123)
# '51.2312%'

You should also re-organize your code according to the following scheme:

import something
GLOBAL_CONSTANT = "value"
class Definitions
def functions()
if __name__ == '__main__':
 main()
 # or some small code using the stuff defined above

That last part is there to protect your code from being executed if you want to import some part of this script from another script. At the moment if you want to do from laplace import Z, your whole code would run. With the if __name__ == '__main__' guard, it will only run when you execute this script.

You should add docstrings to your functions, describing what they do, what arguments they take and what they return. Have a look at PEP257 for the guidelines regarding docstrings.

And, in order to answer my own question from the comments:

Yes, it is justified to use the normal distribution here, since the Central Limit Theorem guarantees that the sum of random variables tends towards a normal distribution. The approximation is quite good for \$N = 100\,ドル as can be seen with the following small script:

import numpy as np
import matplotlib.pyplot as plt
mu = 95.7
sigma = 1247.
n = 100 # how many random variables to sum for each value
N = 1000 # how many values to generate
x_l = np.random.laplace(mu, sigma/np.sqrt(2), (n, N)).sum(axis=0)
x_g = np.random.normal(n*mu, np.sqrt(n)*sigma, N)
plt.hist(x_g, label="gaus")
plt.hist(x_l, label="sum(laplace)", histtype='step')
plt.legend()
plt.show()

enter image description here

Graipher Graipher 41.6k7 gold badges70 silver badges134 bronze badges · Accepted Answer · 2018-02-12 15:40:17Z

You seem to know about str.format (since you use it), yet you are still doing string addition:

title1 = 'Probability of making loss: ' + '{0:.4g}'.format(P1*100) + '%'
title2 = 'Probability of earning more than 20ドルk: ' + '{0:.4g}'.format(P2*100) + '%'

Just make it a single string:

title1 = 'Probability of making loss: {0:.4g}%'.format(P1*100)
title2 = 'Probability of earning more than 20ドルk: {0:.4g}%'.format(P2*100)

You could also use the % formatting option for floats (described in the fifth table here), which automatically multiplies with 100 and adds the % sign. The number of digits after the decimal point work differently, though, compared to the g format:

"{:.4%}".format(0.5)
# '50.0000%'
"{:.4g}%".format(0.5*100.)
# '50%'
"{:.4g}%".format(0.5123123*100.)
# '51.23%'
"{:.4%}".format(0.5123123)
# '51.2312%'

You should also re-organize your code according to the following scheme:

import something
GLOBAL_CONSTANT = "value"
class Definitions
def functions()
if __name__ == '__main__':
 main()
 # or some small code using the stuff defined above

That last part is there to protect your code from being executed if you want to import some part of this script from another script. At the moment if you want to do from laplace import Z, your whole code would run. With the if __name__ == '__main__' guard, it will only run when you execute this script.

You should add docstrings to your functions, describing what they do, what arguments they take and what they return. Have a look at PEP257 for the guidelines regarding docstrings.

And, in order to answer my own question from the comments:

Yes, it is justified to use the normal distribution here, since the Central Limit Theorem guarantees that the sum of random variables tends towards a normal distribution. The approximation is quite good for \$N = 100\,ドル as can be seen with the following small script:

import numpy as np
import matplotlib.pyplot as plt
mu = 95.7
sigma = 1247.
n = 100 # how many random variables to sum for each value
N = 1000 # how many values to generate
x_l = np.random.laplace(mu, sigma/np.sqrt(2), (n, N)).sum(axis=0)
x_g = np.random.normal(n*mu, np.sqrt(n)*sigma, N)
plt.hist(x_g, label="gaus")
plt.hist(x_l, label="sum(laplace)", histtype='step')
plt.legend()
plt.show()

enter image description here

Stack Exchange Network

Probability of event using Central Limit Theorem + plotting results

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Probability of event using Central Limit Theorem + plotting results

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions