Mathematics Exam  >  Mathematics Notes  >  Mathematics for IIT JAM, GATE, CSIR NET, UGC NET  >  Canonical correlation - 2, CSIR-NET Mathematical Sciences

Canonical correlation - 2, CSIR-NET Mathematical Sciences | Mathematics for IIT JAM, GATE, CSIR NET, UGC NET PDF Download

Stage 5: Interpreting the Canonical Variate

If the canonical relationship is statistically significant and the magnitudes of the canonical root and the redundancy index are acceptable, the researcher still needs to make substantive interpretations of the results. Making these interpretations involves examining the canonical functions to determine the relative importance of each of the original variables in the canonical relationships. Three methods have been proposed: (1) canonical weights (standardized coefficients), (2) canonical loadings (structure correlations), and (3) canonical cross-loadings. 

 

Canonical Weights 

The traditional approach to interpreting canonical functions involves examining the sign and the magnitude of the canonical weight assigned to each variable in its canonical variate. Variables with relatively larger weights contribute more to the variates, and vice versa. Similarly, variables whose weights have opposite signs exhibit an inverse relationship with each other, and variables with weights of the same sign exhibit a direct relationship. However, interpreting the relative importance or contribution of a variable by its canonical weight is subject to the same criticisms associated with the interpretation of beta weights in regression techniques. For example, a small weight may mean either that its corresponding variable is irrelevant in determining a relationship or that it has been partialed out of the relationship because of a high degree of multicollinearity. Another problem with the use of canonical weights is that these weights are subject to considerable instability (variability) from one sample to another. This instability occurs because the computational procedure for canonical analysis yields weights that maximize the canonical correlations for a particular sample of observed dependent and independent variable sets [7]. These problems suggest considerable caution in using canonical weights to interpret the results of a canonical analysis. 


Canonical Loadings

Canonical loadings have been increasingly used as a basis for interpretation because of the deficiencies inherent in canonical weights. Canonical loadings, also called canonical structure correlations, measure the simple linear correlation between an original observed variable in the dependent or independent set and theset’s canonical variate. The canonical loading reflects the variance that the observed variable shares with the canonical variate and can be interpreted like a factor loading in assessing the relative contribution of each variable to each canonical function. The methodology considers each independent canonical function separately and computes the within-set variable-to-variate correlation. The larger the coefficient, the more important it is in deriving the canonical variate. Also, the criteria for determining the significance of canonical structure correlations are the same as with factor loadingsin factor analysis.

Canonical loadings, like weights, may be subject to considerable variability from one sample to another. This variability suggests that loadings, and hence the relationships ascribed to them, may be sample-specific, resulting from chance or extraneous factors [7]. Although canonical loadings are considered relatively more valid than weights as a means of interpreting the nature of canonical relationships, the researcher still must be cautious when using loadings for interpreting canonical relationships, particularlywith regard to the external validityof the findings. 


Canonical Cross-Loadings

The computation of canonical cross-loadings has been suggested as an alternative to canonical loadings [4]. This procedure involves correlating each of the original observed dependent variables directly with the independent canonical variate, and vice versa. Recall that conventional loadings correlate the original observed variables with their respective variates after the two canonical variates (dependent and independent) are maximally correlated with each other. This may also seem similar to multiple regression, but it differs in that each independent variable, for example, is correlated with the dependent variate instead of a single dependent variable. Thus cross-loadings provide a more direct measure of the dependent–independent variable relationships by eliminating an intermediate step involved in conventional loadings. Some canonical analyses do not compute correlations between the variables and the variates. In such cases the canonical weights are considered comparable but not equivalent for purposes of our discussion. 

 

Which Interpretation Approach to Use 

Several different methods for interpreting the nature of canonical relationships have been discussed. The question remains, however: Which method should the researcher use? Because most canonical problems require a computer, the researcher frequently must use whichever method is available in the standard statistical packages. The cross-loadings approach is preferred, and it is provided by many computer programs, but if the cross-loadings are not available, the researcher is forced either to compute the cross-loadings by hand or to rely on the other methods of interpretation. The canonical loadings approach is somewhat more representative than the use of weights, just as was seen with factor analysis and discriminant analysis. Therefore, whenever possible the loadings approach is recommended as the best alternative to the canonical cross-loadings method.
 

Stage 6: Validation and Diagnosis

As with any other multivariate technique, canonical correlation analysis should be subjected to validation methods to ensure that the results are not specific only to the sample data and can be generalized to the population. The most direct procedure is to create two subsamples of the data (if sample size allows) and perform the analysis on each subsample separately. Then the results can be compared for similarity of canonical functions, variate loadings, and the like. If marked differences are found, the researcher should consider additional investigation to ensure that the final results are representative of the populationvalues, not solelythose of a single sample.

Another approach is to assess the sensitivity of the results to the removal of a dependent and/or independent variable. Because the canonical correlation procedure maximizes the correlation and does not optimize the interpretability, the canonical weights and loadings may vary substantially if one variable is removed from either variate. To ensure the stability of the canonical weights and loading, the researcher should estimate multiple canonical correlations, each time removing a different independent or dependent variable. 

Although there are few diagnostic procedures developed specifically for canonical correlation analysis, the researcher should view the results within the limitations of the technique. Among the limitations that can have the greatest impact on the results and their interpretation are the following:

1. The canonical correlation reflects the variance shared by the linear composites of the sets of variables, not the variance extracted from the variables.

2. Canonical weights derived in computing canonical functions are subject to a great deal of instability.

3. Canonical weights are derived to maximize the correlation between linear composites, not the variance extracted.

4. The interpretation of the canonical variates may be difficult because they are calculated to maximize the relationship, and there are no aids for interpretation, such as rotation of variates, as seen in factor analysis.

5. It is difficult to identify meaningful relationships between the subsets of independent and dependent variables because precise statistics have not yet been developed to interpret canonical analysis, and we must rely on inadequate measures such as loadings or cross-loadings [7].

These limitations are not meant to discourage the use of canonical correlation. Rather, they are pointed out to enhance the effectiveness of canonical correlation as a research tool.
 

An Illustrative Example

To illustrate the application of canonical correlation, we use variables drawn from the a short surveyof customers of the firm known as HATCO. The data consist of a series of measures obtained on a sample of 100 HATCO customers. The variables include ratings of HATCO on seven attributes (X1 to X7) and two measures reflecting the effects of HATCO’s efforts (X9, usage of HATCO products, and X10, customer satisfaction with HATCO). A complete description of the HATCO survey is provided in Exhibit 1.

The discussion of this application of canonical correlation analysis follows the sixstage process discussed earlier in the chapter. At each stage the results illustrating the decisions in that stage are examined.

 

Stage 1: Objectives of Canonical Correlation Analysis

In demonstrating the application of canonical correlation, we use nine variables as input data. The HATCO ratings (Xthrough X7) are designated as the set of independent variables. The measures of usage level and satisfaction level (variables X9 and X10) are specified as the set of dependent variables. The statistical problem involves identifying any latent relationships (relationships between composites of variables rather the individual variables themselves) between a customer’s perceptions about HATCO and the customer’s level of usage and satisfaction. 


Stages 2 and 3: Designing a Canonical Correlation
  Analysis and Testing the Assumptions

The designation of the variables includes two metric-dependent and seven metric-independent variables. The conceptual basis of both sets is well established, so there is no need for alternative model formulations testing different sets of variables. The seven variables resulted in a 13-to-1 ratio of observations to variables, exceeding the guideline of 10 observations per variable. The sample size of 100 is not felt to affect the estimates of sampling errormarkedlyand thus should have no impact on the statistical significance of the results. Finally, for purposes of this example, assume that both dependent and independent variables were assessed for meeting the basic distributional assumptions underlying multivariate analyses and passed all statistical tests. 

 

Stage 4: Deriving the Canonical Functions and Assessing Overall Fit 

The canonical correlation analysis was restricted to deriving two canonical functions because the dependent variable set contained only two variables. To determine the number of canonical functions to include in the interpretation stage, the analysis focused on the level of statistical significance, the practical significance of the canonical correlation, and the redundancyindices foreach variate.
 

Statistical and Practical Significance 

The first statistical significance test is for the canonical correlations of each of the two canonical functions. In this example, both canonical correlations are statistically significant (see Table 8.2). In addition to tests of each canonical function separately, multivariate tests of both functions simultaneously are also performed. The test statistics employed are Wilks’ lambda, Pillai’s criterion, Hotelling’s trace, and Roy’s gcr. Table 8.2 also details the multivariate test statistics, which all indicate that the canonical functions, taken collectively, are statisticallysignificant at the .01 level.

In addition to statistical significance, the canonical correlations were both of sufficient size to be deemed practically significant. The final step was to perform redundancyanalyses on both canonical functions.
 

Redundancy Analysis 

A redundancy index is calculated for the independent and dependent variates of the first function in Table 8.3. As can be seen, the redundancy index for the dependent variate is substantial (.751). The independent variate, however, has a markedly lower redundancy index (.242), although in this case, because there is a clear delineation between dependent and independent variables, this lower value is not unexpected or problematic. The low redundancy of the independent variate results from the relatively low shared variance in the independent variate (.276), not the canonical R2. From the redundancy analysis and the statistical significance tests, the first function should be accepted. 

The redundancy analysis for the second function produces quite different results (see Table 8.4). First, the canonical R2 is substantially lower (.260). Moreover, both variable sets have low shared variance in the second function (.145 for the dependent variate and .082 for the independent variate). Their combination with the canonical root in the redundancy index produces values of .038 for the dependent variate and .021 for the independent variate. Thus, although the second function is statistically significant, it has little practical significance. With such a small percentage, one must question the value of the function. This is an excellent example of a statistically significant canonical function that does not have practical significance because it does not explain a large proportion of the dependent variables’ variance. 

The interested researcher should also consider factor analysis with attention to the discussion of scale development. Canonical correlation is in some ways a form of scale development, as the dependent and independent variates represent dimensions of the variable sets similar to the scales developed with factor analysis. The primary difference is that these dimensions are developed to maximize the relationship between them, whereas factor analysis maximizes the explanation (shared variance) of the variable set. 

 

Stage5: Interpreting the Canonical Variates 

With the canonical relationship deemed statistically significant and the magnitude of the canonical root and the redundancy index acceptable, the researcher proceeds to making substantive interpretations of the results. Although the second function could be considered practically nonsignificant, owing to the low redundancy value, it is included in the interpretation phase for illustrative reasons. These interpretations involve examining the canonical functions to determine the relative importance of each of the original variables in deriving the canonical relationships. The three methods for interpretation are (1) canonical weights (standardized coefficients), (2) canonical loadings (structure correlations), and (3) canonical cross-loadings.
 

Canonical Weights 

Table 8.5 contains the standardized canonical weights for each canonical variate for both dependent and independent variables. As discussed earlier, the magnitude of the weights represents their relative contribution to the variate. Based on the size of the weights, the order of contribution of independent variables to the first variate is X3, X5, X4, X1, X2, X6, and X7, and the dependent variable order on the first variate is X10, then X9. Similar rankings can be found for the variates of the second canonical function. Because canonical weights are typically unstable, particularly in instances of multicollinearity, owing to their calculation solely to optimize the canonical correlation, the canonical loading and cross-loadings are considered more appropriate.
 

Canonical Loadings

Table 8.6 contains the canonical loadings for thedependent and independent variates for both canonical functions. The objective of maximizing the variates for the correlation between them results in variates “optimized” not for interpretation, but instead for prediction. This makes identification of relationships more difficult. In the first dependent variate, both variables have loadings exceeding .90, resulting in the high shared variance (.855). This indicates a high degree of intercorrelation among the two variables and suggests that both, or either, measures are representative of the effects of HATCO’s efforts. 

The first independent variate has a quite different pattern, with loadings ranging from .061 to .765, with one independent variable (X7) even having a negative loading, although it is rather small and not of substantive interest. The three variables with the highest loadings on the independent variate are X5 (overall service), X1 (delivery speed), and X(price flexibility). This variate does not correspond to the dimensions that would be extracted in factor analysis, but it would not be expected to because the variates in canonical correlation are extracted only to maximize predictive objectives. As such, it should correspond more to the results from other dependence techniques. There is a close correspondence to multiple regression results with X9 as the dependent variable. Two of these variables (X3 and X5) were included in the stepwise regression analysis in which X9 (one of the two variables in the dependent variate) was the dependent variable. Thus, the first canonical function closely corresponds to the multiple regression results, with the independent variate representing the set of variables best predicting the two dependent measures. The researcher should also perform a sensitivityanalysis of the independent variate in this case to see whether the loadings change when an independent variable is deleted (see stage 6).

The second variate’s poor redundancy values are exhibited in the substantially lower loadings for both variates on the second function. Thus, the poorer interpretability as reflected in the lower loadings, coupled with the low redundancy values, reinforce the low practical significance of the second function.
 

Canonical Cross-Loadings

Table 8.6 also includes the cross-loadings for the two canonical functions. In studying the first canonical function, we see that both independent variables (X9 and X10) exhibit high correlations with the independent canonical variate (function 1): .855 and .877, respectively. This reflects the high shared variance between these two variables. By squaring these terms, we find the percentage of the variance for each of the variables explained by function 1. The results show that 73 percent of the variance in X9 and 77 percent of the variance in X10 is explained by function 1. Looking at the independent variables’ cross-loadings, we see that variables X1 and X5 both have high correlations of roughly .72 with the dependent canonical variate. From this information, approximately 52 percent of the variance in each of these two variables is explained by the dependent variate (the 52 percent is obtained by squaring the correlation coefficient, .72). The correlation of X3 (.584) may appear high, but after squaring this correlation, only 34 percent of the variation is included in the canonical variate.

The final issue of interpretation is examining the signs of the cross-loadings. All independent variables except X7 (product quality) have a positive, direct relationship. For the second function, two independent variables (X4 and X6), plus a dependent variable (X10), are negative. The three highest cross-loadings of the first independent variate correspond to the variables with the highest canonical loadings as well. Thus all the relationships are direct except for one inverse relationship in the first function. 

 

Stage 6: Validation and Diagnosis 

The last stage should involve a validation of the canonical correlation analyses through one of several procedures. Among the available approaches would be (1) splitting the sample into estimation and validation samples, or (2) sensitivity analysis of the independent variable set. Table 8.7 contains the result of such a sensitivity analysis in which the canonical loadings are examined for stability when individual independent variables are deleted from the analysis. As seen, the canonical loadings in our example are remarkably stable and consistent in each of the three cases where an independent variable (X1, X2, or X7) is deleted. The overall canonical correlations also remain stable. But the researcher examining the canonical weights (not presented in the table) would find widely varying results, depending on which variable was deleted. This reinforces the procedure of using the canonical loading and crossloading for interpretation purposes.
 

The canonical correlation analysis addresses two primary objectives: (1) the identification of dimensions among the dependent and independent variables that (2) maximize the relationship between the dimensions. From a managerial perspective, this provides the researcher with some insight into the structure of the different variable sets as they relate to a dependence relationship. First, the results indicate only a single relationship exists, supported by the low practical significance of the second canonical function. In examining this relationship, we first see that the two dependent variables are quite closely related and create a well-defined dimension for representing the outcomes of HATCO’s efforts. Second, this outcome dimension is fairly well predicted by the set of independent variables when acting as a set. The redundancy value of .750 would be a quite acceptable R2 for a comparable multiple regression. When interpreting the independent variate, we see that three variables, X5 (overall service), X1 (delivery speed), and X(price flexibility) provide the substantive contributions and thus are the key predictors of the outcome dimension. These should be the focal points in the development of any strategy directed toward impacting the outcomes of HATCO.

Canonical correlation analysis is a useful and powerful technique for exploring the relationships among multiple dependent and independent variables. The technique is primarily descriptive, although it may be used for predictive purposes. Results obtained from a canonical analysis should suggest answers to questions concerning the number of ways in which the two sets of multiple variables are related, the strengths of the relationships, and the nature of the relationships defined. 

Canonical analysis enables the researcher to combine into a composite measure what otherwise might be an unmanageably large number of bivariate correlations between sets of variables. It is useful for identifying overall relationships between multiple independent and dependent variables, particularly when the data researcher has little a priori knowledge about relationships among the sets of variables. Essentially, the researcher can apply canonical correlation analysis to a set of variables, select those variables (both independent and dependent) that appear to be significantly related, and run subsequent canonical correlations with the more significant variables remaining, or perform individual regressions with these variables.

The document Canonical correlation - 2, CSIR-NET Mathematical Sciences | Mathematics for IIT JAM, GATE, CSIR NET, UGC NET is a part of the Mathematics Course Mathematics for IIT JAM, GATE, CSIR NET, UGC NET.
All you need of Mathematics at this link: Mathematics
556 videos|198 docs
556 videos|198 docs
Download as PDF
Explore Courses for Mathematics exam
Signup for Free!
Signup to see your scores go up within 7 days! Learn & Practice with 1000+ FREE Notes, Videos & Tests.
10M+ students study on EduRev
Related Searches

practice quizzes

,

Sample Paper

,

mock tests for examination

,

MCQs

,

CSIR NET

,

Free

,

CSIR-NET Mathematical Sciences | Mathematics for IIT JAM

,

past year papers

,

study material

,

GATE

,

Canonical correlation - 2

,

CSIR-NET Mathematical Sciences | Mathematics for IIT JAM

,

ppt

,

Canonical correlation - 2

,

UGC NET

,

Exam

,

UGC NET

,

GATE

,

pdf

,

CSIR-NET Mathematical Sciences | Mathematics for IIT JAM

,

Canonical correlation - 2

,

UGC NET

,

Semester Notes

,

Summary

,

shortcuts and tricks

,

Viva Questions

,

video lectures

,

Objective type Questions

,

Important questions

,

Previous Year Questions with Solutions

,

CSIR NET

,

GATE

,

Extra Questions

,

CSIR NET

;