Normal Distribution
Many processes or occurrences in nature and real life roughly follows
a type of distribution called 'Normal Distribution' (albeit closely
and not exactly).
Normal Distribution is a hypothetical curve and is a probability distribution
for continuous random variables. It is specified completely by two variables,
mean, μ and standard deviation, σ. The tails of the curve never touches
the axis as it is asymptotic to the x-axis, indicating that a variable can take on
any theoretically possible value, although the probability of it is very less.
Each side of the curve has 1⁄2 of the total area. Some of the other very important
properties of normal curve are:
68.26% of the area (or total number of observations) fall between +/-1 σ.
95.44% of the area (or total number of observations) fall between +/-2 σ.
99.74% of the area (or total number of observations) fall between +/-3 σ.
z-score
If we can standardize a Normal Curve such that it's mean is zero and it's standard deviation is one then we can use
it to easily standardize any normally distributed variable. Also it will provide us with a reference distribution
that we can use to compare two distributions with different mean and standard deviation. For this purpose, we will
use something called 'z-score'. z-score helps us transform the measured data value into the standardize normal curve
(or theoretical probability distribution). This conversion of data value to z-score does not change its underlying
value. The formula for this conversion is:
Example: 1. Calculate whose income variation is more on an annual basis. (Use the data provided in following table).
Example 2. In 2011 Joe is expecting to earn USD 80000 and John is expecting to earn GBP 57000. Who has more chances of achieving their goal in 2011 - base your decision on z-score!?
| Year | 2006 | 2007 | 2008 | 2009 | 2010 |
|---|---|---|---|---|---|
| Joe-USD | 68000 | 72000 | 78000 | 65000 | 70000 |
| John-GBP | 45000 | 49000 | 41000 | 50000 | 51000 |
Answer: Since you have read so far, you cranked your calculator and found out that the mean income and standard
deviation for Joe's income over the past 5 years is USD 70,600 and USD 4363.4 respectively. You also found that
μ and σ for John's over the past 5 years is GBP 47,200 and GBP 3709.4 respectively. Also, to get
answer to first part, you calculated the coefficient of variation (CV) for Joe and John.
The CV for Joe is (4363.4*100/70,600)=6.18 and CV for John is (3709.4*100/47,200)=7.86. Therefore, John's income
varies more. (Note that if we made our decision solely based on comparison of σ, we would be wrong!!)
For part 2, you cannot compare the above figures and decide whose income expectation is more because the units (currency)
for Joe and John's income are different. Well, here comes the z-score to the rescue!
z-score for Joe's income: (80,000 - 70,600)/4363.4 = 2.15.
z-score for John's income: (57,000 - 47,200)/3709.4 = 2.64.
(Note that the z-score is dimensionless and therefore it provides us a reference to compare). As we can see that
z-score for John is higher than Joe, therefore probability of achieving his goal is less compared to Joe. Higher
the z-score, less is the probability of that data occurring !!.