6
\$\begingroup\$

I've been doing a Udemy course called: "Statistics for Data Science" and I decided to solve one of the homework with Python to kill two birds with one rocket #elon.

The task was:

The team of traders under your supervision earns profits which can be approximated with Laplace distribution. Profits (of any trade) have a mean of 95ドル.70 and a std. dev. of 1,247ドル. Your team makes about 100 trades every week.

Questions:

A. What is the probability of my team making a loss in any given week? B. What is the probability of my team making over 20,000ドル in any given week?

As I just started to learn Python I would be happy for some hints and opinions.

# set up
import math
import scipy.stats as st
import matplotlib.pyplot as plt
import numpy as np
# Data
mu = 95.7 # mean
sigma = 1247 # standard deviation
n = 100 # sampling size (trades here)
xcritical1 = 0 # making a loss
xcritical2 = 20000 / n # Earning 20ドルk a weak by 100 trades
mu_1 = mu # Based on Central Limit Theorem
sigma_1 = sigma / (math.sqrt(n)) # Based on CLT
# Calc
def Z(xcritical, mu, sigma):
 return (xcritical - mu) / sigma # Standard Score (z-value)
Z1 = Z(xcritical1, mu_1, sigma_1)
Z2 = Z(xcritical2, mu_1, sigma_1)
P1 = st.norm.cdf(Z1) # Cumulative Distribution Function for ND
P2 = 1 - st.norm.cdf(Z2)
print('A. Probability of making loss in any given week is', '{0:.4g}'.format(P1*100) + '%')
print('B. Probability of making over 20ドルk in any given week is', '{0:.4g}'.format(P2*100) + '%')
# Plots
def draw_z_score(x, cond, mu, sigma, title):
 y = st.norm.pdf(x, mu, sigma) # Probability Density function for ND
 z = x[cond]
 plt.plot(x, y)
 plt.fill_between(z, 0, st.norm.pdf(z, mu, sigma))
 plt.title(title)
 plt.text(-300, 0.0020, r'$\mu=' + str(mu_1) + ',\ \sigma=' + str(sigma_1) + '$')
 plt.show()
x = np.arange(-400, 500, 1) # Fixed interval by experimenting
title1 = 'Probability of making loss: ' + '{0:.4g}'.format(P1*100) + '%'
title2 = 'Probability of earning more than 20ドルk: ' + '{0:.4g}'.format(P2*100) + '%'
draw_z_score(x, x < xcritical1, mu_1, sigma_1, title1)
draw_z_score(x, x > xcritical2, mu_1, sigma_1, title2) 
Graipher
41.6k7 gold badges70 silver badges134 bronze badges
asked Feb 8, 2018 at 23:21
\$\endgroup\$
3
  • \$\begingroup\$ Why are you using a normal distribution, instead of a Laplace distribution as stated in the task description? \$\endgroup\$ Commented Feb 12, 2018 at 6:13
  • \$\begingroup\$ Because any distribution can be approximated by Normal Distribution by Central Limit Theorem and this is what they expected me in this task. \$\endgroup\$ Commented Feb 12, 2018 at 14:48
  • \$\begingroup\$ Fair enough (if you mean "the sum of random variables from any random distribution", instead of "any random distribution"). And in this case the approximation is actually already good enough with N=100: repl.it/@graipher/Sumlaplace-vs-Gauss \$\endgroup\$ Commented Feb 12, 2018 at 15:30

1 Answer 1

5
\$\begingroup\$

You seem to know about str.format (since you use it), yet you are still doing string addition:

title1 = 'Probability of making loss: ' + '{0:.4g}'.format(P1*100) + '%'
title2 = 'Probability of earning more than 20ドルk: ' + '{0:.4g}'.format(P2*100) + '%'

Just make it a single string:

title1 = 'Probability of making loss: {0:.4g}%'.format(P1*100)
title2 = 'Probability of earning more than 20ドルk: {0:.4g}%'.format(P2*100)

You could also use the % formatting option for floats (described in the fifth table here), which automatically multiplies with 100 and adds the % sign. The number of digits after the decimal point work differently, though, compared to the g format:

"{:.4%}".format(0.5)
# '50.0000%'
"{:.4g}%".format(0.5*100.)
# '50%'
"{:.4g}%".format(0.5123123*100.)
# '51.23%'
"{:.4%}".format(0.5123123)
# '51.2312%'

You should also re-organize your code according to the following scheme:

import something
GLOBAL_CONSTANT = "value"
class Definitions
def functions()
if __name__ == '__main__':
 main()
 # or some small code using the stuff defined above

That last part is there to protect your code from being executed if you want to import some part of this script from another script. At the moment if you want to do from laplace import Z, your whole code would run. With the if __name__ == '__main__' guard, it will only run when you execute this script.


You should add docstrings to your functions, describing what they do, what arguments they take and what they return. Have a look at PEP257 for the guidelines regarding docstrings.


And, in order to answer my own question from the comments:

Yes, it is justified to use the normal distribution here, since the Central Limit Theorem guarantees that the sum of random variables tends towards a normal distribution. The approximation is quite good for \$N = 100\,ドル as can be seen with the following small script:

import numpy as np
import matplotlib.pyplot as plt
mu = 95.7
sigma = 1247.
n = 100 # how many random variables to sum for each value
N = 1000 # how many values to generate
x_l = np.random.laplace(mu, sigma/np.sqrt(2), (n, N)).sum(axis=0)
x_g = np.random.normal(n*mu, np.sqrt(n)*sigma, N)
plt.hist(x_g, label="gaus")
plt.hist(x_l, label="sum(laplace)", histtype='step')
plt.legend()
plt.show()

enter image description here

answered Feb 12, 2018 at 15:40
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.