In many real-world situations, outcomes are not limited to a fixed set of values. For instance, the exact time it takes for a chemical reaction to occur, the precise weight of apples picked from a tree, or the temperature measured at a weather station can take on any value within a range. Unlike discrete random variables that can only assume specific, countable values, continuous random variables can take on any value within an interval. Understanding continuous random variables allows us to model and analyze phenomena where measurements are infinitely precise, at least in theory. This chapter explores the fundamental properties, probability distributions, and key measures associated with continuous random variables.
A continuous random variable is a variable that can assume any value within a given interval or collection of intervals. The set of possible values is uncountable, meaning there are infinitely many possible outcomes between any two points.
Think of filling a cup with water. The exact amount of water-measured to infinite precision-could be 250.1 milliliters, 250.15 milliliters, 250.153982 milliliters, and so on. The possibilities are endless within the range.
Common examples of continuous random variables include:
Because a continuous random variable can take on infinitely many values, the probability that it equals any single exact value is zero. Instead, we focus on the probability that the variable falls within a particular interval.
For continuous random variables, probabilities are described using a probability density function (PDF), often denoted \( f(x) \). The PDF is a function that satisfies two important properties:
Here, \( \int \) represents the integral (a concept from calculus that measures the area under a curve), \( f(x) \) is the probability density function, and \( dx \) indicates we are summing infinitesimally small pieces along the \( x \)-axis.
The probability that a continuous random variable \( X \) falls between two values \( a \) and \( b \) is given by the area under the PDF curve between those two points:
\[ P(a \leq X \leq b) = \int_{a}^{b} f(x) \, dx \]This integral represents the shaded area under the curve from \( a \) to \( b \).
Example: Suppose a continuous random variable \( X \) has a probability density function defined by \( f(x) = 2x \) for \( 0 \leq x \leq 1 \) and \( f(x) = 0 \) otherwise.
Find the probability that \( X \) falls between 0.2 and 0.5.
Solution:
We need to calculate \( P(0.2 \leq X \leq 0.5) \).
This is given by the integral:
\[ P(0.2 \leq X \leq 0.5) = \int_{0.2}^{0.5} 2x \, dx \]We evaluate the integral by finding the antiderivative of \( 2x \), which is \( x^2 \):
\[ = \left[ x^2 \right]_{0.2}^{0.5} = (0.5)^2 - (0.2)^2 \]Calculate each term:
\( (0.5)^2 = 0.25 \)
\( (0.2)^2 = 0.04 \)
Subtract: \( 0.25 - 0.04 = 0.21 \)
The probability that \( X \) is between 0.2 and 0.5 is 0.21 or 21%.
Several key properties help us work with PDFs effectively:
The cumulative distribution function (CDF), denoted \( F(x) \), gives the probability that a continuous random variable \( X \) is less than or equal to a particular value \( x \):
\[ F(x) = P(X \leq x) = \int_{-\infty}^{x} f(t) \, dt \]Here, \( t \) is a dummy variable of integration. The CDF accumulates probability from the left tail up to the point \( x \).
Key properties of the CDF:
Example: Consider the same probability density function from the previous example: \( f(x) = 2x \) for \( 0 \leq x \leq 1 \).
Find the cumulative distribution function \( F(x) \).
Solution:
For \( x < 0="" \),="" the="" variable="" has="" not="" yet="" entered="" its="" valid="" range,="" so="" \(="" f(x)="0">
For \( 0 \leq x \leq 1 \), we integrate the PDF from 0 to \( x \):
\[ F(x) = \int_{0}^{x} 2t \, dt = \left[ t^2 \right]_{0}^{x} = x^2 - 0 = x^2 \]For \( x > 1 \), all probability has been accumulated, so \( F(x) = 1 \).
The cumulative distribution function is:
\[ F(x) = \begin{cases} 0 & \text{if } x < 0="" \\="" x^2="" &="" \text{if="" }="" 0="" \leq="" x="" \leq="" 1="" \\="" 1="" &="" \text{if="" }="" x=""> 1 \end{cases} \]The CDF is \( F(x) = x^2 \) for \( 0 \leq x \leq 1 \), with appropriate boundary values outside this range.
Just as with discrete random variables, continuous random variables have measures of center and spread.
The expected value or mean of a continuous random variable \( X \), denoted \( \mu \) or \( E(X) \), is a weighted average of all possible values, where the weights are given by the probability density function:
\[ E(X) = \mu = \int_{-\infty}^{\infty} x \cdot f(x) \, dx \]This represents the "balance point" or center of the distribution.
Example: Using the PDF \( f(x) = 2x \) for \( 0 \leq x \leq 1 \).
Find the expected value \( E(X) \).
Solution:
We compute:
\[ E(X) = \int_{0}^{1} x \cdot 2x \, dx = \int_{0}^{1} 2x^2 \, dx \]Find the antiderivative of \( 2x^2 \), which is \( \frac{2x^3}{3} \):
\[ = \left[ \frac{2x^3}{3} \right]_{0}^{1} = \frac{2(1)^3}{3} - \frac{2(0)^3}{3} = \frac{2}{3} - 0 = \frac{2}{3} \]The expected value is \( \frac{2}{3} \) or approximately 0.667.
The variance of a continuous random variable \( X \), denoted \( \sigma^2 \) or \( Var(X) \), measures how spread out the values are around the mean:
\[ Var(X) = \sigma^2 = \int_{-\infty}^{\infty} (x - \mu)^2 \cdot f(x) \, dx \]Alternatively, variance can be computed using the formula:
\[ Var(X) = E(X^2) - [E(X)]^2 \]where
\[ E(X^2) = \int_{-\infty}^{\infty} x^2 \cdot f(x) \, dx \]The standard deviation \( \sigma \) is the square root of the variance:
\[ \sigma = \sqrt{Var(X)} \]Example: Using the PDF \( f(x) = 2x \) for \( 0 \leq x \leq 1 \) and \( E(X) = \frac{2}{3} \).
Find the variance and standard deviation of \( X \).
Solution:
First, we find \( E(X^2) \):
\[ E(X^2) = \int_{0}^{1} x^2 \cdot 2x \, dx = \int_{0}^{1} 2x^3 \, dx \]The antiderivative of \( 2x^3 \) is \( \frac{2x^4}{4} = \frac{x^4}{2} \):
\[ = \left[ \frac{x^4}{2} \right]_{0}^{1} = \frac{(1)^4}{2} - \frac{(0)^4}{2} = \frac{1}{2} - 0 = \frac{1}{2} \]Now compute the variance:
\[ Var(X) = E(X^2) - [E(X)]^2 = \frac{1}{2} - \left( \frac{2}{3} \right)^2 = \frac{1}{2} - \frac{4}{9} \]Convert to a common denominator (18):
\[ = \frac{9}{18} - \frac{8}{18} = \frac{1}{18} \]The standard deviation is:
\[ \sigma = \sqrt{\frac{1}{18}} = \frac{1}{\sqrt{18}} = \frac{1}{3\sqrt{2}} \approx 0.236 \]The variance is \( \frac{1}{18} \) and the standard deviation is approximately 0.236.
Several important continuous distributions appear frequently in statistics and real-world applications.
The uniform distribution describes a scenario where all values in an interval \([a, b]\) are equally likely. The PDF is constant across the interval:
\[ f(x) = \begin{cases} \frac{1}{b - a} & \text{if } a \leq x \leq b \\ 0 & \text{otherwise} \end{cases} \]The expected value and variance of a uniform distribution are:
\[ E(X) = \frac{a + b}{2} \] \[ Var(X) = \frac{(b - a)^2}{12} \]Imagine a spinner that can land anywhere on a number line from 0 to 10 with equal likelihood. This is a uniform distribution on [0, 10].
The exponential distribution models the time between events in a process where events occur continuously and independently at a constant average rate. It is often used to model waiting times. The PDF is:
\[ f(x) = \lambda e^{-\lambda x} \quad \text{for } x \geq 0 \]where \( \lambda > 0 \) is the rate parameter, and \( e \approx 2.71828 \) is the base of natural logarithms.
The expected value and variance are:
\[ E(X) = \frac{1}{\lambda} \] \[ Var(X) = \frac{1}{\lambda^2} \]For example, the time until the next customer arrives at a store, or the lifespan of a light bulb, can often be modeled with an exponential distribution.
The normal distribution (also called the Gaussian distribution) is the most important continuous distribution. It has a symmetric, bell-shaped curve and is characterized by two parameters: the mean \( \mu \) and the standard deviation \( \sigma \). The PDF is:
\[ f(x) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{(x - \mu)^2}{2\sigma^2}} \]where \( \pi \approx 3.14159 \).
The normal distribution is central to statistics because of the Central Limit Theorem, which states that the sum (or average) of many independent random variables tends to follow a normal distribution, regardless of the original distribution of the variables.
Key properties:
The standard normal distribution is a special case with \( \mu = 0 \) and \( \sigma = 1 \). Any normal distribution can be converted to the standard normal distribution using the transformation:
\[ Z = \frac{X - \mu}{\sigma} \]where \( Z \) follows a standard normal distribution.
Because the integral of the normal PDF cannot be expressed in terms of elementary functions, probabilities are typically found using statistical tables or technology.
Example: The heights of adult women in a population are normally distributed with a mean of 65 inches and a standard deviation of 3 inches.
Let \( X \) represent the height of a randomly selected woman.What is the probability that a randomly selected woman is between 62 and 68 inches tall?
Solution:
First, standardize the values using the \( Z \)-score formula:
For \( X = 62 \):
\[ Z = \frac{62 - 65}{3} = \frac{-3}{3} = -1 \]For \( X = 68 \):
\[ Z = \frac{68 - 65}{3} = \frac{3}{3} = 1 \]We need \( P(-1 \leq Z \leq 1) \).
From the standard normal table or calculator, \( P(Z \leq 1) \approx 0.8413 \) and \( P(Z \leq -1) \approx 0.1587 \).
So \( P(-1 \leq Z \leq 1) = 0.8413 - 0.1587 = 0.6826 \).
The probability that a randomly selected woman is between 62 and 68 inches tall is approximately 0.683 or 68.3%.
Continuous random variables are essential in many fields:
Understanding continuous random variables enables us to quantify uncertainty, make informed predictions, and draw conclusions from data in situations where outcomes are measured rather than counted. Mastery of probability density functions, cumulative distribution functions, and key distributions like the uniform, exponential, and normal distributions forms the foundation for advanced statistical analysis and data science.