Open App

CFA Level 2 Exam > CFA Level 2 Notes > Quantitative Methods > Multiple Regression

Multiple Regression

Table of Contents
1. The Multiple Regression Model
2. Assumptions of the Multiple Regression Model
3. Goodness of Fit
4. Hypothesis Testing in Multiple Regression
5. Model Misspecification
6. Multicollinearity
7. Heteroskedasticity
8. Serial Correlation (Autocorrelation)
9. Model Selection & Evaluation
10. Indicator (Dummy) Variables
11. Logarithmic & Semi-Log Models
12. Regression in Finance Applications
13. Common Exam Pitfalls & Key Reminders
View more

Multiple Regression extends simple linear regression to model a dependent variable using two or more independent variables simultaneously. This powerful tool is fundamental in finance for asset pricing, risk analysis, and forecasting. Understanding the model assumptions, diagnostic tests, and practical applications is critical for portfolio management, equity valuation, and empirical finance research.

1. The Multiple Regression Model

Multiple regression models the relationship between one dependent variable and multiple independent variables. The model estimates how each independent variable affects the dependent variable while holding other variables constant.

1.1 General Form of the Model

Population Regression Equation:

Y_i = β₀ + β₁X_1i + β₂X_2i + ... + β_kX_ki + ε_i

Y_i: Dependent variable (what we want to explain or predict)
β₀: Intercept term (value of Y when all X variables equal zero)
β_j: Slope coefficient for X_j (marginal effect of X_j on Y, holding other variables constant)
X_ji: Independent variable j for observation i (explanatory variables)
ε_i: Error term (captures all factors affecting Y not included in the model)
k: Number of independent variables

Sample Regression Equation (Estimated):

Ŷ_i = b₀ + b₁X_1i + b₂X_2i + ... + b_kX_ki

Ŷ_i: Predicted or fitted value of Y
b_j: Estimated coefficient (sample estimate of β_j)
Residual: e_i = Y_i - Ŷ_i (difference between actual and predicted values)

1.2 Interpretation of Regression Coefficients

Slope Coefficient (b_j): Measures the change in the dependent variable for a one-unit change in X_j, holding all other independent variables constant (ceteris paribus)
Example: In a regression of stock returns on market return and company size, b₁ = 1.2 for market return means a 1% increase in market return increases stock return by 1.2%, holding size constant
Intercept (b₀): Expected value of Y when all independent variables equal zero. Often lacks economic interpretation in finance applications

1.3 Ordinary Least Squares (OLS) Estimation

OLS minimizes the sum of squared residuals to find the best-fitting regression line.

Objective Function: Minimize Σe_i² = Σ(Y_i - Ŷ_i)²

Best Linear Unbiased Estimators (BLUE): OLS estimators are BLUE if regression assumptions are satisfied (Gauss-Markov Theorem)
Properties: Sum of residuals equals zero; regression line passes through the mean of Y and X variables

2. Assumptions of the Multiple Regression Model

Six critical assumptions must hold for OLS estimates to be valid and reliable. Violation of these assumptions leads to biased, inconsistent, or inefficient estimates.

2.1 Assumption 1: Linearity

Statement: The relationship between dependent and independent variables is linear in parameters
Implication: The model correctly specifies the functional form
Violation Impact: Model misspecification; biased and inconsistent coefficient estimates
Detection: Residual plots showing patterns (non-random scatter)

2.2 Assumption 2: Independent Variables are Not Random

Statement: Independent variables are not random; they are fixed in repeated samples
Implication: Covariance between independent variables and error term is zero
Practical Note: This assumption is often relaxed in practice; variables can be random but must be uncorrelated with error term

2.3 Assumption 3: Expected Value of Error Term is Zero

Statement: E(ε_i) = 0 for all observations
Implication: Error term has mean zero; no systematic bias in predictions
Ensured by: Including an intercept term in the regression

2.4 Assumption 4: Homoskedasticity (Constant Error Variance)

Statement: Var(ε_i) = σ² for all i (variance of error term is constant across all observations)
Violation: Heteroskedasticity occurs when error variance changes with independent variables
Impact of Violation: Standard errors are incorrect (usually underestimated); t-statistics and F-statistics are unreliable; hypothesis tests invalid
Common in Finance: Cross-sectional data (larger firms have more volatile returns)

2.5 Assumption 5: No Serial Correlation (No Autocorrelation)

Statement: Cov(ε_i, ε_j) = 0 for all i ≠ j (error terms are uncorrelated across observations)
Violation: Serial correlation/autocorrelation when errors are correlated over time
Impact of Violation: Standard errors biased; hypothesis tests unreliable; coefficient estimates still unbiased but inefficient
Common in Finance: Time-series data (today's error affects tomorrow's error)

2.6 Assumption 6: No Perfect Multicollinearity

Statement: No exact linear relationship exists among independent variables
Perfect Multicollinearity: One independent variable is an exact linear combination of others
Impact: Regression coefficients cannot be estimated; computer software will fail or drop variables
High (but not perfect) Multicollinearity: Coefficients can be estimated but have high standard errors and are unstable

3. Goodness of Fit

Goodness of fit measures assess how well the regression model explains variation in the dependent variable.

3.1 Decomposition of Total Variation

Total Sum of Squares (SST): SST = Σ(Y_i - Ȳ)²

Measures total variation in the dependent variable around its mean

Sum of Squared Errors/Residuals (SSE or RSS): SSE = Σ(Y_i - Ŷ_i)² = Σe_i²

Measures unexplained variation (variation not captured by the model)

Regression Sum of Squares (SSR or ESS): SSR = Σ(Ŷ_i - Ȳ)²

Measures explained variation (variation captured by the regression model)

Fundamental Relationship: SST = SSR + SSE

3.2 Coefficient of Determination (R²)

Formula: R² = SSR/SST = 1 - (SSE/SST)

Interpretation: Proportion of total variation in Y explained by the independent variables
Range: 0 ≤ R² ≤ 1 (expressed as percentage: 0% to 100%)
Example: R² = 0.75 means 75% of variation in Y is explained by the model; 25% remains unexplained
Limitation: R² always increases when adding more variables, even if they are irrelevant (never decreases)

Trap Alert: High R² does not necessarily mean a good model. A high R² can occur with biased coefficients, violated assumptions, or spurious relationships. Low R² does not mean the model is useless if coefficients are significant and economically meaningful.

3.3 Adjusted R² (R̄²)

Formula: R̄² = 1 - [(n - 1)/(n - k - 1)] × (1 - R²)

n: Number of observations
k: Number of independent variables
Purpose: Penalizes the addition of independent variables that do not improve explanatory power
Advantage: Can decrease when irrelevant variables are added, unlike R²
Use: Preferred over R² for comparing models with different numbers of independent variables

Relationship: R̄² ≤ R² (adjusted R² is always less than or equal to R²)

3.4 Standard Error of Estimate (SEE)

Formula: SEE = √[SSE/(n - k - 1)] = √[Σe_i²/(n - k - 1)]

Interpretation: Standard deviation of the regression residuals; measures average distance of observations from the regression line
Units: Same units as the dependent variable
Use: Lower SEE indicates better fit; used in prediction intervals
Denominator (n - k - 1): Degrees of freedom (observations minus parameters estimated)

4. Hypothesis Testing in Multiple Regression

Hypothesis tests evaluate whether independent variables have statistically significant relationships with the dependent variable and whether the overall model is significant.

4.1 t-Test for Individual Coefficients

Tests whether an individual independent variable is statistically significant.

Null Hypothesis: H₀: β_j = 0 (the variable has no effect on Y)

Alternative Hypothesis: H_a: β_j ≠ 0 (two-tailed test)

Test Statistic: t = (b_j - β_j)/s_bj

b_j: Estimated coefficient
β_j: Hypothesized value (usually zero)
s_bj: Standard error of coefficient b_j

Degrees of Freedom: df = n - k - 1

Decision Rule: Reject H₀ if |t-statistic| > t-critical value at chosen significance level (typically 5%)

p-value Approach: Reject H₀ if p-value < significance="" level="">

p-value: Smallest significance level at which H₀ can be rejected
Interpretation: If p-value = 0.03, the coefficient is significant at 5% level but not at 1% level

4.2 Confidence Intervals for Coefficients

Formula: b_j ± (t_critical × s_bj)

95% Confidence Interval: Use t-critical value at 5% significance level with (n - k - 1) degrees of freedom
Interpretation: We are 95% confident the true population parameter β_j lies within this interval
Significance Test: If the confidence interval does not include zero, the coefficient is statistically significant

4.3 F-Test for Overall Significance

Tests whether at least one independent variable is statistically significant (whether the overall regression model has explanatory power).

Null Hypothesis: H₀: β₁ = β₂ = ... = β_k = 0 (all slope coefficients are zero)

Alternative Hypothesis: H_a: At least one β_j ≠ 0

Test Statistic: F = [SSR/k] / [SSE/(n - k - 1)] = [MSR/MSE]

MSR (Mean Square Regression): SSR/k (average explained variation per independent variable)
MSE (Mean Square Error): SSE/(n - k - 1) (average unexplained variation)

Alternative Formula using R²: F = [R²/k] / [(1 - R²)/(n - k - 1)]

Degrees of Freedom: Numerator df = k; Denominator df = n - k - 1

Decision Rule: Reject H₀ if F-statistic > F-critical value at chosen significance level

One-tailed test: F-test is always one-tailed (right tail only) because F-statistic is always positive
Significance: Rejecting H₀ means the regression model as a whole is statistically significant

Trap Alert: A significant F-test does not mean all coefficients are significant, only that at least one is significant. Conversely, individual t-tests can be significant even with an insignificant F-test in rare cases.

4.4 Testing Multiple Coefficients Jointly

Tests whether a subset of coefficients are jointly equal to zero using a restricted F-test.

Formula: F = [(SSE_restricted - SSE_unrestricted)/q] / [SSE_unrestricted/(n - k - 1)]

q: Number of restrictions (number of coefficients being tested)
SSE_restricted: Sum of squared errors from restricted model (with restrictions imposed)
SSE_unrestricted: Sum of squared errors from full model
Degrees of Freedom: Numerator df = q; Denominator df = n - k - 1

5. Model Misspecification

Model misspecification occurs when the regression model is incorrectly formulated. This violates regression assumptions and produces unreliable results.

5.1 Types of Misspecification

Omitted Variable Bias: Important independent variable left out of the model
Inappropriate Form of Variables: Wrong functional form (e.g., linear model when relationship is nonlinear)
Inappropriate Scaling: Variables not properly transformed (e.g., should use logarithms)
Inappropriate Data Pooling: Combining data from different populations or time periods incorrectly

5.2 Consequences of Omitted Variables

Omitting a relevant variable that is correlated with included variables causes omitted variable bias.

Biased Coefficients: Estimated coefficients of included variables are biased and inconsistent
Direction of Bias: Depends on correlation between omitted variable and included variables, and the omitted variable's effect on Y
Example: Regressing stock returns only on market returns (omitting company size). If size affects returns and correlates with market returns, the market beta is biased

Condition for Bias: Omitted variable must be correlated with both the dependent variable AND at least one included independent variable.

5.3 Consequences of Including Irrelevant Variables

Unbiased but Inefficient: Coefficient estimates remain unbiased but have higher standard errors (less precise)
Reduced t-statistics: Lower power to detect significant relationships
Lower Adjusted R²: Model fit deteriorates when irrelevant variables are added
Preferable to Omission: Including an irrelevant variable is less serious than omitting a relevant one

6. Multicollinearity

Multicollinearity exists when two or more independent variables are highly correlated with each other. Perfect multicollinearity (exact linear relationship) violates regression assumptions.

6.1 Detecting Multicollinearity

High Pairwise Correlations: Correlation coefficient between two independent variables exceeds 0.7 or 0.8 (rule of thumb)
High R² but Insignificant t-statistics: Overall model is significant (F-test) but individual coefficients are not significant
Signs Opposite to Expected: Coefficient signs contradict economic theory or prior expectations
Variance Inflation Factor (VIF): VIF > 10 indicates problematic multicollinearity for that variable

VIF Formula: VIF_j = 1/(1 - R²_j)

R²_j: R² from regressing X_j on all other independent variables
Interpretation: VIF = 5 means variance of coefficient is inflated by factor of 5 due to multicollinearity

6.2 Consequences of Multicollinearity

High Standard Errors: Coefficient estimates have large standard errors (imprecise estimates)
Low t-statistics: Difficult to reject null hypothesis even when variables are actually significant
Unstable Coefficients: Small changes in data cause large changes in coefficient estimates
Wide Confidence Intervals: Confidence intervals for coefficients are very wide
Still Unbiased: Coefficient estimates remain unbiased and consistent, but inefficient

Trap Alert: Multicollinearity does NOT bias coefficient estimates, but makes them unreliable and difficult to interpret individually. The overall model predictions (Ŷ) may still be accurate.

6.3 Remedies for Multicollinearity

Drop One Variable: Remove one of the highly correlated variables (loses information)
Increase Sample Size: More data can reduce standard errors
Use Different Specification: Transform variables or use different functional form
Do Nothing: If prediction is the goal (not interpretation), multicollinearity may not matter

7. Heteroskedasticity

Heteroskedasticity occurs when the variance of the regression residuals is not constant across observations. This violates the homoskedasticity assumption.

7.1 Detecting Heteroskedasticity

Graphical Analysis: Plot residuals against predicted values or independent variables.

Pattern Indicates Heteroskedasticity: Residuals form a cone or funnel shape (increasing or decreasing variance)
No Pattern Indicates Homoskedasticity: Residuals randomly scattered around zero with constant spread

Breusch-Pagan (BP) Test: Formal statistical test for heteroskedasticity.

Null Hypothesis: H₀: No heteroskedasticity (constant error variance)
Alternative Hypothesis: H_a: Heteroskedasticity present
Procedure: Regress squared residuals (e²) on independent variables; test for overall significance with chi-square test
Test Statistic: BP = n × R² (from auxiliary regression) ~ χ²_k
Decision: Reject H₀ if BP statistic > chi-square critical value at chosen significance level

White Test: More general test that detects heteroskedasticity without assuming a specific form.

7.2 Consequences of Heteroskedasticity

Unbiased Coefficients: OLS coefficient estimates remain unbiased and consistent
Biased Standard Errors: Standard errors of coefficients are incorrect (usually underestimated)
Unreliable Hypothesis Tests: t-statistics and F-statistics are unreliable; Type I error rates increase
Inefficient Estimates: OLS is no longer the Best Linear Unbiased Estimator (BLUE)

Common in Finance: Cross-sectional data often exhibits heteroskedasticity. Larger firms tend to have more volatile returns than smaller firms.

7.3 Correcting for Heteroskedasticity

Robust Standard Errors (White-corrected): Compute heteroskedasticity-consistent standard errors; no change to coefficients, only standard errors adjusted
Generalized Least Squares (GLS): Use weighted least squares if form of heteroskedasticity is known
Transform Variables: Use logarithmic transformation to stabilize variance

Preferred Method: Use robust standard errors (White-corrected standard errors). This is the most common correction in practice.

8. Serial Correlation (Autocorrelation)

Serial correlation exists when regression residuals are correlated across time periods. This violates the assumption of independent error terms and commonly occurs in time-series data.

8.1 Detecting Serial Correlation

Graphical Analysis: Plot residuals over time.

Positive Serial Correlation: Residuals cluster together; positive residuals followed by positive residuals, negative by negative
Negative Serial Correlation: Residuals alternate signs (rare in finance)

Durbin-Watson (DW) Test: Most common test for first-order serial correlation (AR(1)).

Formula: DW = Σ(e_t - e_t-1)² / Σe_t²

Range: 0 ≤ DW ≤ 4
DW ≈ 2: No serial correlation
DW <> Positive serial correlation (DW approaching 0 indicates strong positive correlation)
DW > 2: Negative serial correlation (DW approaching 4 indicates strong negative correlation)

Approximate Relationship: DW ≈ 2(1 - r), where r is the sample correlation between consecutive residuals.

Critical Values: Compare DW to lower (d_L) and upper (d_U) critical values from DW tables.

DW <>_L: Reject H₀ (positive serial correlation present)
DW > d_U: Do not reject H₀ (no positive serial correlation)
d_L < dw=""><>_U: Inconclusive region

Limitations of DW Test:

Only detects first-order autocorrelation AR(1)
Not valid if lagged dependent variable is an independent variable
Inconclusive region makes interpretation difficult

8.2 Consequences of Serial Correlation

Unbiased Coefficients: OLS estimates remain unbiased and consistent
Biased Standard Errors: Standard errors are incorrect (typically underestimated with positive autocorrelation)
Unreliable Hypothesis Tests: t-statistics and F-statistics are not valid; confidence intervals are too narrow
Inefficient Estimates: OLS is not BLUE; coefficient estimates have larger variance than necessary

Common Cause: Omitted variables that are serially correlated, or misspecified functional form in time-series models.

8.3 Correcting for Serial Correlation

Hansen Method (HAC Standard Errors): Use heteroskedasticity and autocorrelation consistent standard errors
Add Lagged Variables: Include lagged dependent or independent variables to capture dynamic relationships
Improve Model Specification: Add omitted variables or use better functional form
Generalized Least Squares: If structure of autocorrelation is known, use GLS estimation

9. Model Selection & Evaluation

Model selection involves choosing the best regression specification from competing models. Multiple criteria should be evaluated together.

9.1 Criteria for Model Selection

Economic Theory: Model should make economic sense; coefficients should have expected signs and magnitudes
Adjusted R²: Higher adjusted R² indicates better fit while penalizing excessive variables
Standard Error of Estimate (SEE): Lower SEE indicates better predictive accuracy
Significance of Coefficients: More significant coefficients (low p-values) preferred
Parsimony: Simpler models preferred over complex models with similar explanatory power (Occam's Razor)

9.2 Information Criteria

Akaike Information Criterion (AIC): AIC = n × ln(SSE/n) + 2(k + 1)

Schwarz Bayesian Criterion (SBC or BIC): SBC = n × ln(SSE/n) + ln(n) × (k + 1)

Selection Rule: Choose model with lowest AIC or SBC
Penalty: Both criteria penalize model complexity; SBC penalizes more heavily than AIC
Use: Compare non-nested models (models with different independent variables)

9.3 Out-of-Sample Validation

Split Sample: Estimate model on one portion of data (training set); test predictive accuracy on another portion (test set)
Cross-Validation: Systematically test model on different data subsets
Rolling Windows: In time series, estimate model on historical data and test on subsequent periods

9.4 Specification Tests

RESET Test (Ramsey): Tests for functional form misspecification by adding powers of fitted values
Likelihood Ratio Test: Compares nested models (one model is subset of another)
Nested Models: Use F-test or likelihood ratio test; Non-nested Models: Use AIC or SBC

10. Indicator (Dummy) Variables

Dummy variables (indicator variables) represent qualitative or categorical information in regression models. They take values of 0 or 1.

10.1 Binary Dummy Variables

Structure: Variable takes value 1 if condition is true, 0 otherwise.

Example: Gender dummy: Male = 1, Female = 0

Model with One Dummy: Y = b₀ + b₁X₁ + b₂D + ε

D: Dummy variable (0 or 1)
Interpretation of b₂: Difference in intercept when D = 1 versus D = 0, holding other variables constant
Two Groups: Intercept for D = 0 is b₀; Intercept for D = 1 is b₀ + b₂

Example Application: Company performance dummy (Profit = 1, Loss = 0) in stock return regression. Coefficient shows average return difference between profitable and unprofitable firms.

10.2 Multiple Categories

For a categorical variable with n categories, include (n - 1) dummy variables to avoid the dummy variable trap.

Dummy Variable Trap: Perfect multicollinearity occurs if n dummies are included for n categories (one dummy is redundant).

Example: Season variable (Spring, Summer, Fall, Winter) requires 3 dummies:

D1 = 1 if Summer, 0 otherwise
D2 = 1 if Fall, 0 otherwise
D3 = 1 if Winter, 0 otherwise
Spring is the reference category (all dummies = 0)

Interpretation: Each dummy coefficient represents the difference in the dependent variable for that category relative to the reference category.

10.3 Interaction Terms with Dummies

Model: Y = b₀ + b₁X + b₂D + b₃(D × X) + ε

D × X: Interaction term (product of dummy and continuous variable)
Purpose: Allows the slope coefficient on X to differ between the two groups (D = 0 vs. D = 1)
Slope when D = 0: b₁
Slope when D = 1: b₁ + b₃
Interpretation of b₃: Difference in slope between the two groups

Example: Testing if relationship between company size and returns differs between developed and emerging markets.

11. Logarithmic & Semi-Log Models

Logarithmic transformations address nonlinearity, reduce heteroskedasticity, and provide economically meaningful interpretations. Common in finance for returns, growth rates, and elasticities.

11.1 Log-Linear Model (Linear-Log)

Model: Y = b₀ + b₁ln(X) + ε

Dependent Variable: Level form (not logged)
Independent Variable: Natural logarithm
Interpretation of b₁: A 1% increase in X leads to a change of (b₁/100) units in Y
Use: Diminishing marginal effects (effect of X on Y decreases as X increases)

11.2 Log-Log Model (Double-Log)

Model: ln(Y) = b₀ + b₁ln(X) + ε

Both Variables: Natural logarithm form
Interpretation of b₁: Elasticity - A 1% increase in X leads to a b₁% change in Y
Constant Elasticity: Elasticity does not change with the level of X or Y
Example: ln(Sales) = 2.5 + 0.8 ln(Advertising). A 1% increase in advertising increases sales by 0.8%

Finance Applications: Estimating price elasticity of demand, relationship between market cap and trading volume.

11.3 Log-Linear Model (Exponential Model)

Model: ln(Y) = b₀ + b₁X + ε

Dependent Variable: Natural logarithm
Independent Variable: Level form (not logged)
Interpretation of b₁: A one-unit increase in X leads to approximately a (100 × b₁)% change in Y
Exact Formula: Percentage change in Y = 100 × [e^b₁ - 1]
Common in Finance: Continuous compounding, growth models

Example - Time Trend Model: ln(Stock Price) = b₀ + b₁(Time). Coefficient b₁ represents continuous growth rate.

11.4 Advantages of Logarithmic Models

Reduces Heteroskedasticity: Compresses scale, stabilizes variance
Handles Nonlinearity: Captures diminishing returns and exponential relationships
Percentage Interpretation: Coefficients interpreted as percentage changes or elasticities
Normality: Log-transformed variables often more normally distributed

Limitation: Cannot use logarithms of zero or negative values. Must add a small constant if data contains zeros.

12. Regression in Finance Applications

Multiple regression is extensively used in asset pricing, risk modeling, performance evaluation, and empirical finance research.

12.1 Capital Asset Pricing Model (CAPM) Extension

Market Model (Single-Factor): R_i = α_i + β_iR_m + ε_i

R_i: Return on asset i
R_m: Market return
β_i: Systematic risk (sensitivity to market movements)
α_i: Jensen's alpha (excess return not explained by market; should be zero if CAPM holds)

Multifactor Models (APT/Fama-French): Add factors beyond market return.

Fama-French Three-Factor Model: R_i - R_f = α_i + β₁(R_m - R_f) + β₂SMB + β₃HML + ε_i

R_i - R_f: Excess return on asset i
R_m - R_f: Market risk premium
SMB (Small Minus Big): Size factor (return difference between small and large cap stocks)
HML (High Minus Low): Value factor (return difference between high and low book-to-market stocks)

12.2 Estimating Cost of Equity

Regression estimates beta, which is used in CAPM to calculate required return:

CAPM Formula: E(R_i) = R_f + β_i[E(R_m) - R_f]

β from Regression: Slope coefficient from regressing stock returns on market returns
Use: Discount rate for valuation, hurdle rate for capital budgeting

12.3 Performance Attribution Analysis

Returns-Based Style Analysis: Regress fund returns on index returns to identify investment style and sources of performance.

Model: R_fund = b₀ + b₁R_{large cap} + b₂R_{small cap} + b₃R_growth + b₄R_value + ε

Coefficients (b₁, b₂, etc.): Portfolio weights/exposures to different styles
Intercept (α): Manager's skill (return above what style exposures would predict)

12.4 Event Studies

Measure impact of corporate events (mergers, earnings announcements, regulatory changes) on stock returns.

Procedure:

Estimate normal return model using pre-event data (regression of stock return on market return)
Calculate predicted return during event period
Compute abnormal return: AR = Actual Return - Predicted Return
Test if abnormal return is statistically significant

12.5 Credit Risk Modeling

Default Prediction Models: Regress probability of default on financial ratios and macroeconomic variables.

Model: P(Default) = f(Leverage, Profitability, Liquidity, GDP Growth, Interest Rates)

Technique: Logistic regression (not OLS) for binary dependent variable (default/no default)
Output: Probability of default used in credit scoring and pricing

12.6 Macroeconomic Forecasting

Multiple regression models forecast economic variables (GDP, inflation, unemployment) using leading indicators.

Example: GDP Growth = b₀ + b₁(Leading Index) + b₂(Money Supply Growth) + b₃(Interest Rate Spread) + ε

Application: Asset allocation decisions based on economic cycle predictions.

12.7 Trading Strategy Development

Factor-Based Strategies: Identify factors that predict future returns.

Momentum Strategy Model: Future Return = b₀ + b₁(Past 6-Month Return) + b₂(Volume) + ε

Significant b₁: Evidence of momentum effect (past winners continue to outperform)
Backtesting: Test strategy out-of-sample to validate predictive power

Trap Alert - Overfitting: Adding too many variables can make model fit historical data perfectly but perform poorly out-of-sample. Always validate on holdout data. Adjusted R² and cross-validation help prevent overfitting.

13. Common Exam Pitfalls & Key Reminders

Interpretation Error: In multiple regression, each coefficient measures the effect holding other variables constant (ceteris paribus). Do not interpret coefficients in isolation.
R² vs. Adjusted R²: Always use adjusted R² when comparing models with different numbers of independent variables. R² can never decrease when adding variables.
F-test vs. t-test: F-test evaluates overall model significance; t-test evaluates individual coefficients. A significant F-test does not guarantee all coefficients are significant.
Heteroskedasticity Impact: Coefficients remain unbiased, but standard errors are wrong. Solution: Use robust (White-corrected) standard errors.
Serial Correlation Impact: Coefficients unbiased, but standard errors biased (typically underestimated). Most common in time-series data.
Multicollinearity Impact: Coefficients unbiased but have high standard errors. Does not affect overall model predictions.
Dummy Variable Trap: For n categories, use (n-1) dummies. Including all n creates perfect multicollinearity.
Log Interpretation: ln(Y) on ln(X) = elasticity; ln(Y) on X = growth rate; Y on ln(X) = level change per % change in X
Omitted Variable Bias: More serious than including irrelevant variables. Causes biased and inconsistent estimates if omitted variable correlates with included variables.
Degrees of Freedom: For hypothesis tests: df = n - k - 1 (observations minus independent variables minus intercept)

Mastering multiple regression requires understanding model assumptions, diagnostic tests, and appropriate corrections for violations. Practice interpreting regression output, identifying assumption violations through statistical tests and residual plots, and applying models to finance contexts. Focus on the economic interpretation of coefficients and the practical implications of statistical findings for investment decisions. Regular practice with past exam questions reinforces pattern recognition for typical exam scenarios involving model specification, hypothesis testing, and correcting for violations.

The document Multiple Regression is a part of the CFA Level 2 Course Quantitative Methods.

All you need of CFA Level 2 at this link: CFA Level 2

Quantitative Methods

Join Course for Free

About this Document

4.66/5 Rating

Apr 21, 2026 Last updated

Related Exams

CFA Level 2

Document Description: Multiple Regression for CFA Level 2 2026 is part of Quantitative Methods preparation. The notes and questions for Multiple Regression have been prepared according to the CFA Level 2 exam syllabus. Information about Multiple Regression covers topics like and Multiple Regression Example, for CFA Level 2 2026 Exam. Find important definitions, questions, notes, meanings, examples, exercises and tests below for Multiple Regression.

Introduction of Multiple Regression in English is available as part of our Quantitative Methods for CFA Level 2 & Multiple Regression in Hindi for Quantitative Methods course. Download more important topics related with notes, lectures and mock test series for CFA Level 2 Exam by signing up for free. CFA Level 2: Multiple Regression

Description

Multiple Regression of Quant Methods covers all the important topics, helping you prepare for the CFA Level 2 exam on EduRev. Start for free!

Information about Multiple Regression

In this doc you can find the meaning of Multiple Regression defined & explained in the simplest way possible. Besides explaining types of Multiple Regression theory, EduRev gives you an ample number of questions to practice Multiple Regression tests, examples and also practice CFA Level 2 tests

Quantitative Methods

Join Course for Free

Download as PDF

Explore Courses for CFA Level 2 exam

Get EduRev Notes directly in your Google search

Multiple Regression Free PDF Download

The Multiple Regression is an invaluable resource that delves deep into the core of the CFA Level 2 exam. These study notes are curated by experts and cover all the essential topics and concepts, making your preparation more efficient and effective. With the help of these notes, you can grasp complex subjects quickly, revise important points easily, and reinforce your understanding of key concepts. The study notes are presented in a concise and easy-to-understand manner, allowing you to optimize your learning process. Whether you're looking for best-recommended books, sample papers, study material, or toppers' notes, this PDF has got you covered. Download the Multiple Regression now and kickstart your journey towards success in the CFA Level 2 exam.

Importance of Multiple Regression

The importance of Multiple Regression cannot be overstated, especially for CFA Level 2 aspirants. This document holds the key to success in the CFA Level 2 exam. It offers a detailed understanding of the concept, providing invaluable insights into the topic. By knowing the concepts well in advance, students can plan their preparation effectively. Utilize this indispensable guide for a well-rounded preparation and achieve your desired results.

Multiple Regression Notes

Multiple Regression Notes offer in-depth insights into the specific topic to help you master it with ease. This comprehensive document covers all aspects related to Multiple Regression. It includes detailed information about the exam syllabus, recommended books, and study materials for a well-rounded preparation. Practice papers and question papers enable you to assess your progress effectively. Additionally, the paper analysis provides valuable tips for tackling the exam strategically. Access to Toppers' notes gives you an edge in understanding complex concepts. Whether you're a beginner or aiming for advanced proficiency, Multiple Regression Notes on EduRev are your ultimate resource for success.

Multiple Regression CFA Level 2 Questions

The "Multiple Regression CFA Level 2 Questions" guide is a valuable resource for all aspiring students preparing for the CFA Level 2 exam. It focuses on providing a wide range of practice questions to help students gauge their understanding of the exam topics. These questions cover the entire syllabus, ensuring comprehensive preparation. The guide includes previous years' question papers for students to familiarize themselves with the exam's format and difficulty level. Additionally, it offers subject-specific question banks, allowing students to focus on weak areas and improve their performance.

Study Multiple Regression on the App

Students of CFA Level 2 can study Multiple Regression alongwith tests & analysis from the EduRev app, which will help them while preparing for their exam. Apart from the Multiple Regression, students can also utilize the EduRev App for other study materials such as previous year question papers, syllabus, important questions, etc. The EduRev App will make your learning easier as you can access it from anywhere you want. The content of Multiple Regression is prepared as per the latest CFA Level 2 syllabus.

Signup to see your scores go up within 7 days!

Access 1000+ FREE Docs, Videos and Tests

Continue with Google

Takes less than 10 seconds to signup