CFA Level 2 Exam  >  CFA Level 2 Notes  >  Quantitative Methods  >  Multiple Regression

Multiple Regression

Multiple Regression extends simple linear regression to model a dependent variable using two or more independent variables simultaneously. This powerful tool is fundamental in finance for asset pricing, risk analysis, and forecasting. Understanding the model assumptions, diagnostic tests, and practical applications is critical for portfolio management, equity valuation, and empirical finance research.

1. The Multiple Regression Model

Multiple regression models the relationship between one dependent variable and multiple independent variables. The model estimates how each independent variable affects the dependent variable while holding other variables constant.

1.1 General Form of the Model

Population Regression Equation:

Yi = β0 + β1X1i + β2X2i + ... + βkXki + εi

  • Yi: Dependent variable (what we want to explain or predict)
  • β0: Intercept term (value of Y when all X variables equal zero)
  • βj: Slope coefficient for Xj (marginal effect of Xj on Y, holding other variables constant)
  • Xji: Independent variable j for observation i (explanatory variables)
  • εi: Error term (captures all factors affecting Y not included in the model)
  • k: Number of independent variables

Sample Regression Equation (Estimated):

Ŷi = b0 + b1X1i + b2X2i + ... + bkXki

  • Ŷi: Predicted or fitted value of Y
  • bj: Estimated coefficient (sample estimate of βj)
  • Residual: ei = Yi - Ŷi (difference between actual and predicted values)

1.2 Interpretation of Regression Coefficients

  • Slope Coefficient (bj): Measures the change in the dependent variable for a one-unit change in Xj, holding all other independent variables constant (ceteris paribus)
  • Example: In a regression of stock returns on market return and company size, b1 = 1.2 for market return means a 1% increase in market return increases stock return by 1.2%, holding size constant
  • Intercept (b0): Expected value of Y when all independent variables equal zero. Often lacks economic interpretation in finance applications

1.3 Ordinary Least Squares (OLS) Estimation

OLS minimizes the sum of squared residuals to find the best-fitting regression line.

Objective Function: Minimize Σei² = Σ(Yi - Ŷi

  • Best Linear Unbiased Estimators (BLUE): OLS estimators are BLUE if regression assumptions are satisfied (Gauss-Markov Theorem)
  • Properties: Sum of residuals equals zero; regression line passes through the mean of Y and X variables

2. Assumptions of the Multiple Regression Model

Six critical assumptions must hold for OLS estimates to be valid and reliable. Violation of these assumptions leads to biased, inconsistent, or inefficient estimates.

2.1 Assumption 1: Linearity

  • Statement: The relationship between dependent and independent variables is linear in parameters
  • Implication: The model correctly specifies the functional form
  • Violation Impact: Model misspecification; biased and inconsistent coefficient estimates
  • Detection: Residual plots showing patterns (non-random scatter)

2.2 Assumption 2: Independent Variables are Not Random

  • Statement: Independent variables are not random; they are fixed in repeated samples
  • Implication: Covariance between independent variables and error term is zero
  • Practical Note: This assumption is often relaxed in practice; variables can be random but must be uncorrelated with error term

2.3 Assumption 3: Expected Value of Error Term is Zero

  • Statement: E(εi) = 0 for all observations
  • Implication: Error term has mean zero; no systematic bias in predictions
  • Ensured by: Including an intercept term in the regression

2.4 Assumption 4: Homoskedasticity (Constant Error Variance)

  • Statement: Var(εi) = σ² for all i (variance of error term is constant across all observations)
  • Violation: Heteroskedasticity occurs when error variance changes with independent variables
  • Impact of Violation: Standard errors are incorrect (usually underestimated); t-statistics and F-statistics are unreliable; hypothesis tests invalid
  • Common in Finance: Cross-sectional data (larger firms have more volatile returns)

2.5 Assumption 5: No Serial Correlation (No Autocorrelation)

  • Statement: Cov(εi, εj) = 0 for all i ≠ j (error terms are uncorrelated across observations)
  • Violation: Serial correlation/autocorrelation when errors are correlated over time
  • Impact of Violation: Standard errors biased; hypothesis tests unreliable; coefficient estimates still unbiased but inefficient
  • Common in Finance: Time-series data (today's error affects tomorrow's error)

2.6 Assumption 6: No Perfect Multicollinearity

  • Statement: No exact linear relationship exists among independent variables
  • Perfect Multicollinearity: One independent variable is an exact linear combination of others
  • Impact: Regression coefficients cannot be estimated; computer software will fail or drop variables
  • High (but not perfect) Multicollinearity: Coefficients can be estimated but have high standard errors and are unstable

3. Goodness of Fit

Goodness of fit measures assess how well the regression model explains variation in the dependent variable.

3.1 Decomposition of Total Variation

Total Sum of Squares (SST): SST = Σ(Yi - Ȳ)²

  • Measures total variation in the dependent variable around its mean

Sum of Squared Errors/Residuals (SSE or RSS): SSE = Σ(Yi - Ŷi)² = Σei²

  • Measures unexplained variation (variation not captured by the model)

Regression Sum of Squares (SSR or ESS): SSR = Σ(Ŷi - Ȳ)²

  • Measures explained variation (variation captured by the regression model)

Fundamental Relationship: SST = SSR + SSE

3.2 Coefficient of Determination (R²)

Formula: R² = SSR/SST = 1 - (SSE/SST)

  • Interpretation: Proportion of total variation in Y explained by the independent variables
  • Range: 0 ≤ R² ≤ 1 (expressed as percentage: 0% to 100%)
  • Example: R² = 0.75 means 75% of variation in Y is explained by the model; 25% remains unexplained
  • Limitation: R² always increases when adding more variables, even if they are irrelevant (never decreases)

Trap Alert: High R² does not necessarily mean a good model. A high R² can occur with biased coefficients, violated assumptions, or spurious relationships. Low R² does not mean the model is useless if coefficients are significant and economically meaningful.

3.3 Adjusted R² (R̄²)

Formula: R̄² = 1 - [(n - 1)/(n - k - 1)] × (1 - R²)

  • n: Number of observations
  • k: Number of independent variables
  • Purpose: Penalizes the addition of independent variables that do not improve explanatory power
  • Advantage: Can decrease when irrelevant variables are added, unlike R²
  • Use: Preferred over R² for comparing models with different numbers of independent variables

Relationship: R̄² ≤ R² (adjusted R² is always less than or equal to R²)

3.4 Standard Error of Estimate (SEE)

Formula: SEE = √[SSE/(n - k - 1)] = √[Σei²/(n - k - 1)]

  • Interpretation: Standard deviation of the regression residuals; measures average distance of observations from the regression line
  • Units: Same units as the dependent variable
  • Use: Lower SEE indicates better fit; used in prediction intervals
  • Denominator (n - k - 1): Degrees of freedom (observations minus parameters estimated)

4. Hypothesis Testing in Multiple Regression

Hypothesis tests evaluate whether independent variables have statistically significant relationships with the dependent variable and whether the overall model is significant.

4.1 t-Test for Individual Coefficients

Tests whether an individual independent variable is statistically significant.

Null Hypothesis: H0: βj = 0 (the variable has no effect on Y)

Alternative Hypothesis: Ha: βj ≠ 0 (two-tailed test)

Test Statistic: t = (bj - βj)/sbj

  • bj: Estimated coefficient
  • βj: Hypothesized value (usually zero)
  • sbj: Standard error of coefficient bj

Degrees of Freedom: df = n - k - 1

Decision Rule: Reject H0 if |t-statistic| > t-critical value at chosen significance level (typically 5%)

p-value Approach: Reject H0 if p-value < significance="" level="">

  • p-value: Smallest significance level at which H0 can be rejected
  • Interpretation: If p-value = 0.03, the coefficient is significant at 5% level but not at 1% level

4.2 Confidence Intervals for Coefficients

Formula: bj ± (tcritical × sbj)

  • 95% Confidence Interval: Use t-critical value at 5% significance level with (n - k - 1) degrees of freedom
  • Interpretation: We are 95% confident the true population parameter βj lies within this interval
  • Significance Test: If the confidence interval does not include zero, the coefficient is statistically significant

4.3 F-Test for Overall Significance

Tests whether at least one independent variable is statistically significant (whether the overall regression model has explanatory power).

Null Hypothesis: H0: β1 = β2 = ... = βk = 0 (all slope coefficients are zero)

Alternative Hypothesis: Ha: At least one βj ≠ 0

Test Statistic: F = [SSR/k] / [SSE/(n - k - 1)] = [MSR/MSE]

  • MSR (Mean Square Regression): SSR/k (average explained variation per independent variable)
  • MSE (Mean Square Error): SSE/(n - k - 1) (average unexplained variation)

Alternative Formula using R²: F = [R²/k] / [(1 - R²)/(n - k - 1)]

Degrees of Freedom: Numerator df = k; Denominator df = n - k - 1

Decision Rule: Reject H0 if F-statistic > F-critical value at chosen significance level

  • One-tailed test: F-test is always one-tailed (right tail only) because F-statistic is always positive
  • Significance: Rejecting H0 means the regression model as a whole is statistically significant

Trap Alert: A significant F-test does not mean all coefficients are significant, only that at least one is significant. Conversely, individual t-tests can be significant even with an insignificant F-test in rare cases.

4.4 Testing Multiple Coefficients Jointly

Tests whether a subset of coefficients are jointly equal to zero using a restricted F-test.

Formula: F = [(SSErestricted - SSEunrestricted)/q] / [SSEunrestricted/(n - k - 1)]

  • q: Number of restrictions (number of coefficients being tested)
  • SSErestricted: Sum of squared errors from restricted model (with restrictions imposed)
  • SSEunrestricted: Sum of squared errors from full model
  • Degrees of Freedom: Numerator df = q; Denominator df = n - k - 1

5. Model Misspecification

Model misspecification occurs when the regression model is incorrectly formulated. This violates regression assumptions and produces unreliable results.

5.1 Types of Misspecification

  • Omitted Variable Bias: Important independent variable left out of the model
  • Inappropriate Form of Variables: Wrong functional form (e.g., linear model when relationship is nonlinear)
  • Inappropriate Scaling: Variables not properly transformed (e.g., should use logarithms)
  • Inappropriate Data Pooling: Combining data from different populations or time periods incorrectly

5.2 Consequences of Omitted Variables

Omitting a relevant variable that is correlated with included variables causes omitted variable bias.

  • Biased Coefficients: Estimated coefficients of included variables are biased and inconsistent
  • Direction of Bias: Depends on correlation between omitted variable and included variables, and the omitted variable's effect on Y
  • Example: Regressing stock returns only on market returns (omitting company size). If size affects returns and correlates with market returns, the market beta is biased

Condition for Bias: Omitted variable must be correlated with both the dependent variable AND at least one included independent variable.

5.3 Consequences of Including Irrelevant Variables

  • Unbiased but Inefficient: Coefficient estimates remain unbiased but have higher standard errors (less precise)
  • Reduced t-statistics: Lower power to detect significant relationships
  • Lower Adjusted R²: Model fit deteriorates when irrelevant variables are added
  • Preferable to Omission: Including an irrelevant variable is less serious than omitting a relevant one

6. Multicollinearity

Multicollinearity exists when two or more independent variables are highly correlated with each other. Perfect multicollinearity (exact linear relationship) violates regression assumptions.

6.1 Detecting Multicollinearity

  • High Pairwise Correlations: Correlation coefficient between two independent variables exceeds 0.7 or 0.8 (rule of thumb)
  • High R² but Insignificant t-statistics: Overall model is significant (F-test) but individual coefficients are not significant
  • Signs Opposite to Expected: Coefficient signs contradict economic theory or prior expectations
  • Variance Inflation Factor (VIF): VIF > 10 indicates problematic multicollinearity for that variable

VIF Formula: VIFj = 1/(1 - R²j)

  • j: R² from regressing Xj on all other independent variables
  • Interpretation: VIF = 5 means variance of coefficient is inflated by factor of 5 due to multicollinearity

6.2 Consequences of Multicollinearity

  • High Standard Errors: Coefficient estimates have large standard errors (imprecise estimates)
  • Low t-statistics: Difficult to reject null hypothesis even when variables are actually significant
  • Unstable Coefficients: Small changes in data cause large changes in coefficient estimates
  • Wide Confidence Intervals: Confidence intervals for coefficients are very wide
  • Still Unbiased: Coefficient estimates remain unbiased and consistent, but inefficient

Trap Alert: Multicollinearity does NOT bias coefficient estimates, but makes them unreliable and difficult to interpret individually. The overall model predictions (Ŷ) may still be accurate.

6.3 Remedies for Multicollinearity

  • Drop One Variable: Remove one of the highly correlated variables (loses information)
  • Increase Sample Size: More data can reduce standard errors
  • Use Different Specification: Transform variables or use different functional form
  • Do Nothing: If prediction is the goal (not interpretation), multicollinearity may not matter

7. Heteroskedasticity

Heteroskedasticity occurs when the variance of the regression residuals is not constant across observations. This violates the homoskedasticity assumption.

7.1 Detecting Heteroskedasticity

Graphical Analysis: Plot residuals against predicted values or independent variables.

  • Pattern Indicates Heteroskedasticity: Residuals form a cone or funnel shape (increasing or decreasing variance)
  • No Pattern Indicates Homoskedasticity: Residuals randomly scattered around zero with constant spread

Breusch-Pagan (BP) Test: Formal statistical test for heteroskedasticity.

  • Null Hypothesis: H0: No heteroskedasticity (constant error variance)
  • Alternative Hypothesis: Ha: Heteroskedasticity present
  • Procedure: Regress squared residuals (e²) on independent variables; test for overall significance with chi-square test
  • Test Statistic: BP = n × R² (from auxiliary regression) ~ χ²k
  • Decision: Reject H0 if BP statistic > chi-square critical value at chosen significance level

White Test: More general test that detects heteroskedasticity without assuming a specific form.

7.2 Consequences of Heteroskedasticity

  • Unbiased Coefficients: OLS coefficient estimates remain unbiased and consistent
  • Biased Standard Errors: Standard errors of coefficients are incorrect (usually underestimated)
  • Unreliable Hypothesis Tests: t-statistics and F-statistics are unreliable; Type I error rates increase
  • Inefficient Estimates: OLS is no longer the Best Linear Unbiased Estimator (BLUE)

Common in Finance: Cross-sectional data often exhibits heteroskedasticity. Larger firms tend to have more volatile returns than smaller firms.

7.3 Correcting for Heteroskedasticity

  • Robust Standard Errors (White-corrected): Compute heteroskedasticity-consistent standard errors; no change to coefficients, only standard errors adjusted
  • Generalized Least Squares (GLS): Use weighted least squares if form of heteroskedasticity is known
  • Transform Variables: Use logarithmic transformation to stabilize variance

Preferred Method: Use robust standard errors (White-corrected standard errors). This is the most common correction in practice.

8. Serial Correlation (Autocorrelation)

Serial correlation exists when regression residuals are correlated across time periods. This violates the assumption of independent error terms and commonly occurs in time-series data.

8.1 Detecting Serial Correlation

Graphical Analysis: Plot residuals over time.

  • Positive Serial Correlation: Residuals cluster together; positive residuals followed by positive residuals, negative by negative
  • Negative Serial Correlation: Residuals alternate signs (rare in finance)

Durbin-Watson (DW) Test: Most common test for first-order serial correlation (AR(1)).

Formula: DW = Σ(et - et-1)² / Σet²

  • Range: 0 ≤ DW ≤ 4
  • DW ≈ 2: No serial correlation
  • DW <> Positive serial correlation (DW approaching 0 indicates strong positive correlation)
  • DW > 2: Negative serial correlation (DW approaching 4 indicates strong negative correlation)

Approximate Relationship: DW ≈ 2(1 - r), where r is the sample correlation between consecutive residuals.

Critical Values: Compare DW to lower (dL) and upper (dU) critical values from DW tables.

  • DW <>L: Reject H0 (positive serial correlation present)
  • DW > dU: Do not reject H0 (no positive serial correlation)
  • dL < dw=""><>U: Inconclusive region

Limitations of DW Test:

  • Only detects first-order autocorrelation AR(1)
  • Not valid if lagged dependent variable is an independent variable
  • Inconclusive region makes interpretation difficult

8.2 Consequences of Serial Correlation

  • Unbiased Coefficients: OLS estimates remain unbiased and consistent
  • Biased Standard Errors: Standard errors are incorrect (typically underestimated with positive autocorrelation)
  • Unreliable Hypothesis Tests: t-statistics and F-statistics are not valid; confidence intervals are too narrow
  • Inefficient Estimates: OLS is not BLUE; coefficient estimates have larger variance than necessary

Common Cause: Omitted variables that are serially correlated, or misspecified functional form in time-series models.

8.3 Correcting for Serial Correlation

  • Hansen Method (HAC Standard Errors): Use heteroskedasticity and autocorrelation consistent standard errors
  • Add Lagged Variables: Include lagged dependent or independent variables to capture dynamic relationships
  • Improve Model Specification: Add omitted variables or use better functional form
  • Generalized Least Squares: If structure of autocorrelation is known, use GLS estimation

9. Model Selection & Evaluation

Model selection involves choosing the best regression specification from competing models. Multiple criteria should be evaluated together.

9.1 Criteria for Model Selection

  • Economic Theory: Model should make economic sense; coefficients should have expected signs and magnitudes
  • Adjusted R²: Higher adjusted R² indicates better fit while penalizing excessive variables
  • Standard Error of Estimate (SEE): Lower SEE indicates better predictive accuracy
  • Significance of Coefficients: More significant coefficients (low p-values) preferred
  • Parsimony: Simpler models preferred over complex models with similar explanatory power (Occam's Razor)

9.2 Information Criteria

Akaike Information Criterion (AIC): AIC = n × ln(SSE/n) + 2(k + 1)

Schwarz Bayesian Criterion (SBC or BIC): SBC = n × ln(SSE/n) + ln(n) × (k + 1)

  • Selection Rule: Choose model with lowest AIC or SBC
  • Penalty: Both criteria penalize model complexity; SBC penalizes more heavily than AIC
  • Use: Compare non-nested models (models with different independent variables)

9.3 Out-of-Sample Validation

  • Split Sample: Estimate model on one portion of data (training set); test predictive accuracy on another portion (test set)
  • Cross-Validation: Systematically test model on different data subsets
  • Rolling Windows: In time series, estimate model on historical data and test on subsequent periods

9.4 Specification Tests

  • RESET Test (Ramsey): Tests for functional form misspecification by adding powers of fitted values
  • Likelihood Ratio Test: Compares nested models (one model is subset of another)
  • Nested Models: Use F-test or likelihood ratio test; Non-nested Models: Use AIC or SBC

10. Indicator (Dummy) Variables

Dummy variables (indicator variables) represent qualitative or categorical information in regression models. They take values of 0 or 1.

10.1 Binary Dummy Variables

Structure: Variable takes value 1 if condition is true, 0 otherwise.

Example: Gender dummy: Male = 1, Female = 0

Model with One Dummy: Y = b0 + b1X1 + b2D + ε

  • D: Dummy variable (0 or 1)
  • Interpretation of b2: Difference in intercept when D = 1 versus D = 0, holding other variables constant
  • Two Groups: Intercept for D = 0 is b0; Intercept for D = 1 is b0 + b2

Example Application: Company performance dummy (Profit = 1, Loss = 0) in stock return regression. Coefficient shows average return difference between profitable and unprofitable firms.

10.2 Multiple Categories

For a categorical variable with n categories, include (n - 1) dummy variables to avoid the dummy variable trap.

Dummy Variable Trap: Perfect multicollinearity occurs if n dummies are included for n categories (one dummy is redundant).

Example: Season variable (Spring, Summer, Fall, Winter) requires 3 dummies:

  • D1 = 1 if Summer, 0 otherwise
  • D2 = 1 if Fall, 0 otherwise
  • D3 = 1 if Winter, 0 otherwise
  • Spring is the reference category (all dummies = 0)

Interpretation: Each dummy coefficient represents the difference in the dependent variable for that category relative to the reference category.

10.3 Interaction Terms with Dummies

Model: Y = b0 + b1X + b2D + b3(D × X) + ε

  • D × X: Interaction term (product of dummy and continuous variable)
  • Purpose: Allows the slope coefficient on X to differ between the two groups (D = 0 vs. D = 1)
  • Slope when D = 0: b1
  • Slope when D = 1: b1 + b3
  • Interpretation of b3: Difference in slope between the two groups

Example: Testing if relationship between company size and returns differs between developed and emerging markets.

11. Logarithmic & Semi-Log Models

Logarithmic transformations address nonlinearity, reduce heteroskedasticity, and provide economically meaningful interpretations. Common in finance for returns, growth rates, and elasticities.

11.1 Log-Linear Model (Linear-Log)

Model: Y = b0 + b1ln(X) + ε

  • Dependent Variable: Level form (not logged)
  • Independent Variable: Natural logarithm
  • Interpretation of b1: A 1% increase in X leads to a change of (b1/100) units in Y
  • Use: Diminishing marginal effects (effect of X on Y decreases as X increases)

11.2 Log-Log Model (Double-Log)

Model: ln(Y) = b0 + b1ln(X) + ε

  • Both Variables: Natural logarithm form
  • Interpretation of b1: Elasticity - A 1% increase in X leads to a b1% change in Y
  • Constant Elasticity: Elasticity does not change with the level of X or Y
  • Example: ln(Sales) = 2.5 + 0.8 ln(Advertising). A 1% increase in advertising increases sales by 0.8%

Finance Applications: Estimating price elasticity of demand, relationship between market cap and trading volume.

11.3 Log-Linear Model (Exponential Model)

Model: ln(Y) = b0 + b1X + ε

  • Dependent Variable: Natural logarithm
  • Independent Variable: Level form (not logged)
  • Interpretation of b1: A one-unit increase in X leads to approximately a (100 × b1)% change in Y
  • Exact Formula: Percentage change in Y = 100 × [eb1 - 1]
  • Common in Finance: Continuous compounding, growth models

Example - Time Trend Model: ln(Stock Price) = b0 + b1(Time). Coefficient b1 represents continuous growth rate.

11.4 Advantages of Logarithmic Models

  • Reduces Heteroskedasticity: Compresses scale, stabilizes variance
  • Handles Nonlinearity: Captures diminishing returns and exponential relationships
  • Percentage Interpretation: Coefficients interpreted as percentage changes or elasticities
  • Normality: Log-transformed variables often more normally distributed

Limitation: Cannot use logarithms of zero or negative values. Must add a small constant if data contains zeros.

12. Regression in Finance Applications

Multiple regression is extensively used in asset pricing, risk modeling, performance evaluation, and empirical finance research.

12.1 Capital Asset Pricing Model (CAPM) Extension

Market Model (Single-Factor): Ri = αi + βiRm + εi

  • Ri: Return on asset i
  • Rm: Market return
  • βi: Systematic risk (sensitivity to market movements)
  • αi: Jensen's alpha (excess return not explained by market; should be zero if CAPM holds)

Multifactor Models (APT/Fama-French): Add factors beyond market return.

Fama-French Three-Factor Model: Ri - Rf = αi + β1(Rm - Rf) + β2SMB + β3HML + εi

  • Ri - Rf: Excess return on asset i
  • Rm - Rf: Market risk premium
  • SMB (Small Minus Big): Size factor (return difference between small and large cap stocks)
  • HML (High Minus Low): Value factor (return difference between high and low book-to-market stocks)

12.2 Estimating Cost of Equity

Regression estimates beta, which is used in CAPM to calculate required return:

CAPM Formula: E(Ri) = Rf + βi[E(Rm) - Rf]

  • β from Regression: Slope coefficient from regressing stock returns on market returns
  • Use: Discount rate for valuation, hurdle rate for capital budgeting

12.3 Performance Attribution Analysis

Returns-Based Style Analysis: Regress fund returns on index returns to identify investment style and sources of performance.

Model: Rfund = b0 + b1Rlarge cap + b2Rsmall cap + b3Rgrowth + b4Rvalue + ε

  • Coefficients (b1, b2, etc.): Portfolio weights/exposures to different styles
  • Intercept (α): Manager's skill (return above what style exposures would predict)

12.4 Event Studies

Measure impact of corporate events (mergers, earnings announcements, regulatory changes) on stock returns.

Procedure:

  1. Estimate normal return model using pre-event data (regression of stock return on market return)
  2. Calculate predicted return during event period
  3. Compute abnormal return: AR = Actual Return - Predicted Return
  4. Test if abnormal return is statistically significant

12.5 Credit Risk Modeling

Default Prediction Models: Regress probability of default on financial ratios and macroeconomic variables.

Model: P(Default) = f(Leverage, Profitability, Liquidity, GDP Growth, Interest Rates)

  • Technique: Logistic regression (not OLS) for binary dependent variable (default/no default)
  • Output: Probability of default used in credit scoring and pricing

12.6 Macroeconomic Forecasting

Multiple regression models forecast economic variables (GDP, inflation, unemployment) using leading indicators.

Example: GDP Growth = b0 + b1(Leading Index) + b2(Money Supply Growth) + b3(Interest Rate Spread) + ε

Application: Asset allocation decisions based on economic cycle predictions.

12.7 Trading Strategy Development

Factor-Based Strategies: Identify factors that predict future returns.

Momentum Strategy Model: Future Return = b0 + b1(Past 6-Month Return) + b2(Volume) + ε

  • Significant b1: Evidence of momentum effect (past winners continue to outperform)
  • Backtesting: Test strategy out-of-sample to validate predictive power

Trap Alert - Overfitting: Adding too many variables can make model fit historical data perfectly but perform poorly out-of-sample. Always validate on holdout data. Adjusted R² and cross-validation help prevent overfitting.

13. Common Exam Pitfalls & Key Reminders

  • Interpretation Error: In multiple regression, each coefficient measures the effect holding other variables constant (ceteris paribus). Do not interpret coefficients in isolation.
  • R² vs. Adjusted R²: Always use adjusted R² when comparing models with different numbers of independent variables. R² can never decrease when adding variables.
  • F-test vs. t-test: F-test evaluates overall model significance; t-test evaluates individual coefficients. A significant F-test does not guarantee all coefficients are significant.
  • Heteroskedasticity Impact: Coefficients remain unbiased, but standard errors are wrong. Solution: Use robust (White-corrected) standard errors.
  • Serial Correlation Impact: Coefficients unbiased, but standard errors biased (typically underestimated). Most common in time-series data.
  • Multicollinearity Impact: Coefficients unbiased but have high standard errors. Does not affect overall model predictions.
  • Dummy Variable Trap: For n categories, use (n-1) dummies. Including all n creates perfect multicollinearity.
  • Log Interpretation: ln(Y) on ln(X) = elasticity; ln(Y) on X = growth rate; Y on ln(X) = level change per % change in X
  • Omitted Variable Bias: More serious than including irrelevant variables. Causes biased and inconsistent estimates if omitted variable correlates with included variables.
  • Degrees of Freedom: For hypothesis tests: df = n - k - 1 (observations minus independent variables minus intercept)

Mastering multiple regression requires understanding model assumptions, diagnostic tests, and appropriate corrections for violations. Practice interpreting regression output, identifying assumption violations through statistical tests and residual plots, and applying models to finance contexts. Focus on the economic interpretation of coefficients and the practical implications of statistical findings for investment decisions. Regular practice with past exam questions reinforces pattern recognition for typical exam scenarios involving model specification, hypothesis testing, and correcting for violations.

The document Multiple Regression is a part of the CFA Level 2 Course Quantitative Methods.
All you need of CFA Level 2 at this link: CFA Level 2
Explore Courses for CFA Level 2 exam
Get EduRev Notes directly in your Google search
Related Searches
Extra Questions, Objective type Questions, practice quizzes, Free, MCQs, Sample Paper, Exam, Multiple Regression, Multiple Regression, shortcuts and tricks, pdf , ppt, Viva Questions, video lectures, Semester Notes, mock tests for examination, Previous Year Questions with Solutions, Summary, study material, Multiple Regression, past year papers, Important questions;