The probability distribution of a discrete random variable is a list of probabilities associated with each of its possible values. It is also sometimes called the probability function or the probability mass function.
(Definitions taken from Valerie J. Easton and John H. McColl's Statistics Glossary v1.1)
Suppose a random variable X may take k different values, with the probability that X = xi defined to be P(X = xi) = pi. The probabilities pi must satisfy the following:
Outcome 1 2 3 4 Probability 0.1 0.3 0.4 0.2The probability that X is equal to 2 or 3 is the sum of the two probabilities: P(X = 2 or X = 3) = P(X = 2) + P(X = 3) = 0.3 + 0.4 = 0.7. Similarly, the probability that X is greater than 1 is equal to 1 - P(X = 1) = 1 - 0.1 = 0.9, by the complement rule.
This distribution may also be described by the probability
histogram shown to the right:
(Definition taken from Valerie J. Easton and John H. McColl's Statistics Glossary v1.1)
The probability histogram for the cumulative distribution of this
random variable is shown to the right:
(Definition taken from Valerie J. Easton and John H. McColl's Statistics Glossary v1.1)
A continuous random variable is not defined at specific values. Instead, it is defined over an interval of values, and is represented by the area under a curve (in advanced mathematics, this is known as an integral). The probability of observing any single value is equal to 0, since the number of values which may be assumed by the random variable is infinite.
Suppose a random variable X may take all values over an interval of real numbers. Then the probability that X is in the set of outcomes A, P(A), is defined to be the area above A and under a curve. The curve, which represents a function p(x), must satisfy the following:
A curve meeting these requirements is known as a density curve.
The following graphs plot the density curves for random number generators
over the intervals (4,5) (top left), (2,6) (top right), (5,5.5) (lower left),
and (3,5) (lower right). The distributions corresponding to these curves are
known as uniform distributions.
Consider the uniform random variable X defined on the interval (2,6). Since the interval
has width = 4, the curve has height = 0.25 over the interval and 0 elsewhere. The probability
that X is less than or equal to 5 is the area between 2 and 5, or (5-2)*0.25 = 0.75.
The probability that X is greater than 3 but less than 4 is the area between 3 and 4,
(4-3)*0.25 = 0.25. To find that probability that X is less than 3 or greater than
5, add the two probabilities:
P(X < 3 and X > 5) = P(X < 3) + P(X > 5) =
(3-2)*0.25 +
(6-5)*0.25 = 0.25 + 0.25 = 0.5.
The uniform distribution is often used to simulate data. Suppose you would like to simulate data for 10 rolls of a regular 6-sided die. Using the MINITAB "RAND" command with the "UNIF" subcommand generates 10 numbers in the interval (0,6):
MTB> RAND 10 c2; SUBC> unif 0 6.Assign the discrete random variable X to the values 1, 2, 3, 4, 5, or 6 as follows:
Uniform Data X Value 4.53786 5 5.77474 6 3.69518 4 1.03929 2 4.23835 5 0.37096 1 0.75272 1 5.56563 6 0.89045 1 3.18086 4