Grade 9 Exam  >  Grade 9 Notes  >  Statistics & Probability  >  Chapter Notes: Nonlinear Regression

Chapter Notes: Nonlinear Regression

Not all relationships between variables follow a straight line. Sometimes data increases rapidly at first and then levels off, or it curves upward at an accelerating rate, or it shows a periodic pattern. Nonlinear regression is a statistical technique used to model relationships between variables when a linear model does not fit well. Instead of fitting a straight line to data, nonlinear regression fits a curve-such as an exponential curve, a logarithmic curve, or a power curve-that better captures the pattern in the data. Understanding when and how to use nonlinear regression helps us make better predictions and understand complex real-world phenomena.

Understanding Nonlinear Relationships

A linear relationship between two variables means that as one variable changes by a constant amount, the other variable also changes by a constant amount. When you graph a linear relationship, you get a straight line. However, many real-world relationships are nonlinear-the rate of change itself changes. For example, population growth often accelerates over time, the brightness of a light decreases rapidly as you move away from it, and the cooling of a hot object slows down as it approaches room temperature.

In a nonlinear relationship, the graph of the data forms a curve rather than a straight line. The goal of nonlinear regression is to find the equation of the curve that best fits the data points, just as linear regression finds the best-fitting line.

Recognizing Nonlinear Patterns

Before performing nonlinear regression, it is essential to recognize that the relationship is nonlinear. This is typically done by creating a scatterplot of the data. Look for these patterns:

  • Exponential pattern: The data increases (or decreases) at an accelerating rate. The curve rises steeply or falls steeply.
  • Logarithmic pattern: The data increases rapidly at first, then levels off. The curve shows diminishing returns.
  • Power pattern: The data follows a curved path that may increase or decrease, often seen in physical laws like area and volume relationships.
  • Quadratic pattern: The data forms a parabola, either opening upward or downward.

If you fit a linear model to clearly nonlinear data, the residuals (the differences between observed and predicted values) will show a clear pattern rather than being randomly scattered. This is a key indicator that a nonlinear model is needed.

Common Nonlinear Models

Several types of functions are commonly used in nonlinear regression. Each type of function has characteristics that make it suitable for modeling particular kinds of data.

Exponential Model

An exponential model has the form:

\[ y = a \cdot b^x \]

where \( y \) is the dependent variable, \( x \) is the independent variable, \( a \) is the initial value (the value of \( y \) when \( x = 0 \)), and \( b \) is the growth factor (if \( b > 1 \)) or decay factor (if \( 0 < b="">< 1="">

Exponential models are appropriate when:

  • The data shows constant percentage growth or decay
  • The rate of change is proportional to the current value
  • Examples include population growth, radioactive decay, compound interest, and the spread of diseases

Example:  A biologist measures the population of bacteria in a culture at different times.
The data collected are: (0 hours, 50 bacteria), (1 hour, 100 bacteria), (2 hours, 200 bacteria), (3 hours, 400 bacteria).

Does this data fit an exponential model, and if so, what is the equation?

Solution:

Notice that the population doubles each hour. This constant percentage growth suggests an exponential model.

Using the form \( y = a \cdot b^x \), we identify \( a = 50 \) (the initial population when \( x = 0 \)).

Since the population doubles each hour, \( b = 2 \).

The exponential model is \( y = 50 \cdot 2^x \), where \( x \) is time in hours and \( y \) is the number of bacteria.

The equation that models this bacterial growth is \( y = 50 \cdot 2^x \).

Logarithmic Model

A logarithmic model has the form:

\[ y = a + b \ln(x) \]

where \( y \) is the dependent variable, \( x \) is the independent variable, \( a \) is the y-intercept (though technically \( \ln(x) \) is undefined at \( x = 0 \)), \( b \) is the scaling factor, and \( \ln(x) \) represents the natural logarithm of \( x \).

Logarithmic models are appropriate when:

  • The data increases or decreases rapidly at first, then slows down and levels off
  • There is a diminishing rate of return
  • Examples include the response to stimulus in psychology, sound intensity (decibels), and some learning curves

Example:  An educator tracks how many new vocabulary words students remember as they spend more time studying.
The data shows: (1 hour, 10 words), (2 hours, 17 words), (3 hours, 21 words), (4 hours, 24 words).

Does this pattern suggest a logarithmic relationship?

Solution:

Calculate the increases between consecutive data points: from 1 to 2 hours, students learn 7 new words; from 2 to 3 hours, they learn 4 new words; from 3 to 4 hours, they learn 3 new words.

The rate of learning decreases as study time increases, which is characteristic of a logarithmic relationship.

A logarithmic model of the form \( y = a + b \ln(x) \) would be appropriate to fit this data.

This data pattern suggests a logarithmic relationship because the learning rate diminishes over time.

Power Model

A power model has the form:

\[ y = a \cdot x^b \]

where \( y \) is the dependent variable, \( x \) is the independent variable, \( a \) is a constant coefficient, and \( b \) is the power or exponent.

Power models are appropriate when:

  • The relationship involves geometric scaling
  • Physical laws relate the variables (such as area, volume, or gravitational force)
  • Examples include the relationship between the radius and area of a circle, the relationship between mass and metabolic rate in animals, and gravitational force

Example:  A physicist measures the gravitational force between two objects as the distance between them changes.
The data collected are: (1 meter, 400 Newtons), (2 meters, 100 Newtons), (4 meters, 25 Newtons).

What type of model fits this data?

Solution:

Examine the pattern: when distance doubles from 1 to 2 meters, force becomes \( \frac{1}{4} \) of the original (400 to 100). When distance doubles again from 2 to 4 meters, force again becomes \( \frac{1}{4} \) of the previous value (100 to 25).

This suggests an inverse square relationship, which is a power model with \( b = -2 \).

The model is \( y = 400 \cdot x^{-2} \) or equivalently \( y = \frac{400}{x^2} \).

This data fits a power model with exponent -2, which is consistent with the inverse square law of gravity.

Quadratic Model

A quadratic model has the form:

\[ y = ax^2 + bx + c \]

where \( y \) is the dependent variable, \( x \) is the independent variable, and \( a \), \( b \), and \( c \) are constants. The coefficient \( a \) determines whether the parabola opens upward (\( a > 0 \)) or downward (\( a < 0="">

Quadratic models are appropriate when:

  • The data shows a parabolic pattern with a single maximum or minimum
  • The relationship involves a projectile motion, optimization problems, or profit/loss scenarios
  • Examples include the height of a thrown ball over time, profit as a function of price, and area optimization problems

Fitting Nonlinear Models to Data

Once you identify the appropriate type of nonlinear model for your data, the next step is to determine the specific values of the parameters (the constants in the equation). There are several approaches to fitting nonlinear models.

Linearization Through Transformation

Some nonlinear models can be transformed into linear models by applying a mathematical transformation to one or both variables. This allows you to use linear regression techniques on the transformed data, then convert back to the original form.

Exponential to Linear: If your model is \( y = a \cdot b^x \), taking the natural logarithm of both sides gives:

\[ \ln(y) = \ln(a) + x \ln(b) \]

This is linear in form if you let \( Y = \ln(y) \), \( A = \ln(a) \), and \( B = \ln(b) \). Then \( Y = A + Bx \), which is a linear equation. You can perform linear regression on \( x \) and \( \ln(y) \), then transform the results back.

Power to Linear: If your model is \( y = a \cdot x^b \), taking the natural logarithm of both sides gives:

\[ \ln(y) = \ln(a) + b \ln(x) \]

This is linear if you let \( Y = \ln(y) \) and \( X = \ln(x) \). Then \( Y = \ln(a) + bX \), which is linear. Perform linear regression on \( \ln(x) \) and \( \ln(y) \), then transform back.

Example:  Given data points for an exponential relationship: (1, 6), (2, 18), (3, 54), (4, 162).
Use logarithmic transformation to find the exponential model.

What is the exponential equation?

Solution:

Take the natural logarithm of each y-value: \( \ln(6) \approx 1.79 \), \( \ln(18) \approx 2.89 \), \( \ln(54) \approx 3.99 \), \( \ln(162) \approx 5.09 \).

Notice that the differences between consecutive \( \ln(y) \) values are approximately 1.10, suggesting a linear relationship between \( x \) and \( \ln(y) \).

The linear form is \( \ln(y) = \ln(a) + x \ln(b) \). From the pattern, \( \ln(b) \approx 1.10 \), so \( b = e^{1.10} \approx 3 \).

When \( x = 1 \), \( y = 6 \), so \( 6 = a \cdot 3^1 \), giving \( a = 2 \).

The exponential model is \( y = 2 \cdot 3^x \).

Using Technology

Modern graphing calculators and statistical software can perform nonlinear regression directly without requiring manual transformation. These tools use numerical methods to find the parameter values that minimize the sum of squared residuals. Common technology tools include:

  • Graphing calculators (TI-83, TI-84, etc.) with built-in regression functions
  • Spreadsheet software like Microsoft Excel or Google Sheets
  • Statistical software such as R, Python (with libraries like scipy and statsmodels), or specialized packages

When using technology, you typically:

  1. Enter your data into the calculator or software
  2. Select the type of regression model (exponential, logarithmic, power, quadratic, etc.)
  3. The software calculates the best-fit parameters
  4. The software provides the equation and often goodness-of-fit statistics

Evaluating Model Fit

After fitting a nonlinear model to data, it is important to evaluate how well the model fits. Several measures help assess model quality.

Residual Plots

A residual is the difference between an observed value and the value predicted by the model. For each data point:

\[ \text{Residual} = \text{Observed value} - \text{Predicted value} \]

A residual plot graphs the residuals on the vertical axis against the independent variable (or predicted values) on the horizontal axis. For a good-fitting model:

  • Residuals should be randomly scattered around zero with no clear pattern
  • The spread of residuals should be roughly constant across all values of x
  • If residuals show a curved pattern, a different model type may be needed
  • If residuals fan out or narrow, the model may not capture all features of the data

Coefficient of Determination (R²)

The coefficient of determination, denoted \( R^2 \), measures the proportion of variance in the dependent variable that is explained by the model. It ranges from 0 to 1:

  • \( R^2 = 1 \) means the model perfectly fits the data
  • \( R^2 = 0 \) means the model does not explain any variability in the data
  • Higher \( R^2 \) values indicate better fit, but context matters

For nonlinear regression, \( R^2 \) is calculated as:

\[ R^2 = 1 - \frac{SS_{\text{res}}}{SS_{\text{tot}}} \]

where \( SS_{\text{res}} \) is the sum of squared residuals and \( SS_{\text{tot}} \) is the total sum of squares (the sum of squared differences between observed values and the mean of observed values).

Think of \( R^2 \) as a percentage: an \( R^2 \) of 0.85 means that 85% of the variation in the data is explained by the model, while 15% remains unexplained.

Comparing Models

Sometimes more than one type of nonlinear model appears to fit the data reasonably well. To choose the best model:

  • Compare \( R^2 \) values: higher is generally better
  • Examine residual plots: look for the most random scatter
  • Consider the context: does the model make sense theoretically for your situation?
  • Use parsimony: simpler models are often preferable if fit quality is similar
  • Check predictions: which model gives more reasonable predictions outside the data range?

Interpolation and Extrapolation

Interpolation means using the model to predict values within the range of the observed data. Extrapolation means predicting values outside the range of the observed data. Both are common uses of regression models, but they carry different levels of risk.

Interpolation is generally reliable if the model fits well. Since you are predicting within the observed range, you can be more confident that the pattern continues.

Extrapolation is riskier because you are assuming the pattern continues beyond where you have data. Nonlinear models can behave very differently outside the observed range. For example, exponential models grow without bound, which may not be realistic for many real-world situations (populations eventually face limits, for instance).

Example:  A company models its revenue using a quadratic function \( R(x) = -2x^2 + 40x + 100 \), where \( x \) is the price in dollars and \( R(x) \) is revenue in thousands of dollars.
The data used to create this model came from prices between $5 and $15.

Is it appropriate to use this model to predict revenue at a price of $30?

Solution:

Calculate \( R(30) = -2(30)^2 + 40(30) + 100 = -1800 + 1200 + 100 = -500 \).

The model predicts a revenue of -$500,000, which is impossible-revenue cannot be negative.

This illustrates the danger of extrapolation: the quadratic model may fit well within the observed range ($5 to $15), but it does not make sense outside that range.

It is not appropriate to use this model for a price of $30 because that involves extrapolation far beyond the data range, leading to unrealistic predictions.

Applications of Nonlinear Regression

Nonlinear regression is used across many fields to model complex relationships:

  • Biology and Medicine: Modeling population growth, the spread of diseases, dose-response curves for medications, and enzyme kinetics
  • Economics and Business: Modeling revenue and profit functions, demand curves, depreciation of assets, and market saturation
  • Environmental Science: Modeling carbon dioxide levels over time, the decay of pollutants, and species diversity
  • Physics and Engineering: Modeling radioactive decay, cooling rates, electrical circuits, and planetary motion
  • Psychology and Education: Modeling learning curves, memory retention, and reaction times

Limitations and Considerations

While nonlinear regression is a powerful tool, it has important limitations:

  • Model selection requires judgment: Choosing the right type of model requires understanding both the data and the context. There is no one "correct" model in many situations.
  • Overfitting: Using overly complex models with many parameters can fit the sample data very well but perform poorly on new data. Simpler models often generalize better.
  • Computational complexity: Fitting nonlinear models often requires iterative numerical methods, which can be sensitive to starting values and may not always converge to a solution.
  • Extrapolation risks: Predictions outside the observed data range can be unreliable, especially for models like exponentials that grow or decay without bound.
  • Outliers: A few unusual data points can have large effects on the fitted model, especially in small datasets.
  • Causation vs. correlation: Just like with linear regression, a good-fitting nonlinear model does not prove that changes in \( x \) cause changes in \( y \). Other variables may be involved.

Always interpret regression results in context, consider alternative models, and be cautious when making predictions-especially when extrapolating beyond the range of your data. Nonlinear regression is a valuable tool for understanding and predicting complex relationships, but it must be used thoughtfully and with awareness of its assumptions and limitations.

The document Chapter Notes: Nonlinear Regression is a part of the Grade 9 Course Statistics & Probability.
All you need of Grade 9 at this link: Grade 9
Explore Courses for Grade 9 exam
Get EduRev Notes directly in your Google search
Related Searches
study material, Free, shortcuts and tricks, mock tests for examination, Chapter Notes: Nonlinear Regression, Chapter Notes: Nonlinear Regression, Chapter Notes: Nonlinear Regression, Important questions, Extra Questions, Summary, pdf , Semester Notes, Viva Questions, Objective type Questions, video lectures, Exam, Sample Paper, MCQs, past year papers, Previous Year Questions with Solutions, practice quizzes, ppt;