Mathematics Exam  >  Mathematics Notes  >  Mathematics for IIT JAM, GATE, CSIR NET, UGC NET  >  Multivariate non-graphical EDA - Statistics, CSIR-NET Mathematical Sciences

Multivariate non-graphical EDA - Statistics, CSIR-NET Mathematical Sciences | Mathematics for IIT JAM, GATE, CSIR NET, UGC NET PDF Download

 Multivariate non-graphical EDA

Multivariate non-graphical EDA techniques generally show the relationship between two or more variables in the form of either cross-tabulation or statistics.


4.4.1 Cross-tabulation

For categorical data (and quantitative data with only a few different values) an extension of tabulation called cross-tabulation is very useful. For two variables, cross-tabulation is performed by making a two-way table with column headings that match the levels of one variable and row headings that match the levels of the other variable, then filling in the counts of all subjects that share a pair of levels. The two variables might be both explanatory, both outcome, or one of each. Depending on the goals, row percentages (which add to 100% for each row), column percentages (which add to 100% for each column) and/or cell percentages (which add to 100% over all cells) are also useful. 

Here is an example of a cross-tabulation. Consider the data in table 4.1. For each subject we observe sex and age as categorical variables. 

Table 4.2 shows the cross-tabulation.

We can easily see that the total number of young females is 2, and we can calculate, e.g., the corresponding cell percentage is 2/11×100 = 18.2%, the row percentage is 2/5×100 = 40.0%, and the column percentage is 2/7×100 = 28.6%. 

Cross-tabulation can be extended to three (and sometimes more) variables by making separate two-way tables for two variables at each level of a third variable.

Multivariate non-graphical EDA - Statistics, CSIR-NET Mathematical Sciences | Mathematics for IIT JAM, GATE, CSIR NET, UGC NET

Table 4.1: Sample Data for Cross-tabulation
 

Multivariate non-graphical EDA - Statistics, CSIR-NET Mathematical Sciences | Mathematics for IIT JAM, GATE, CSIR NET, UGC NET

Table 4.2: Cross-tabulation of Sample Data

For example, we could make separate age by gender tables for each education level.

Cross-tabulation is the basic bivariate non-graphical EDA technique.

 

4.4.2 Correlation for categorical data

Another statistic that can be calculated for two categorical variables is their correlation. But there are many forms of correlation for categorical variables, and that material is currently beyond the scope of this book.

 

4.4.3 Univariate statistics by category

For one categorical variable (usually explanatory) and one quantitative variable (usually outcome), it is common to produce some of the standard univariate nongraphical statistics for the quantitative variables separately for each level of the categorical variable, and then compare the statistics across levels of the categorical variable. Comparing the means is an informal version of ANOVA. Comparing medians is a robust informal version of one-way ANOVA. Comparing measures of spread is a good informal test of the assumption of equal variances needed for valid analysis of variance.

Especially for a categorical explanatory variable and a quantitative outcome variable, it is useful to produce a variety of univariate statistics for the quantitative variable at each level of the categorical variable.


4.4.4 Correlation and covariance

For two quantitative variables, the basic statistics of interest are the sample covariance and/or sample correlation, which correspond to and are estimates of the corresponding population parameters from section 3.5. The sample covariance is a measure of how much two variables “co-vary”, i.e., how much (and in what direction) should we expect one variable to change when the other changes.

Sample covariance is calculated by computing (signed) deviations of each measurement from the average of all measurements for that variable. Then the deviations for the two measurements are multiplied together separately for each subject. Finally these values are averaged (actually summed and divided by n-1, to keep the statistic unbiased). Note that the units on sample covariance are the products of the units of the two variables.

Positive covariance values suggest that when one measurement is above the mean the other will probably also be above the mean, and vice versa. Negative covariances suggest that when one variable is above its mean, the other is below its mean. And covariances near zero suggest that the two variables vary independently of each other.
 

Technically, independence implies zero correlation, but the reverse is not necessarily true.

Covariances tend to be hard to interpret, so we often use correlation instead. The correlation has the nice property that it is always between -1 and +1, with -1 being a “perfect” negative linear correlation, +1 being a perfect positive linear correlation and 0 indicating that X and Y are uncorrelated. The symbol r or rx,y is often used for sample correlations.

The general formula for sample covariance is 

Multivariate non-graphical EDA - Statistics, CSIR-NET Mathematical Sciences | Mathematics for IIT JAM, GATE, CSIR NET, UGC NET

It is worth noting that Cov(X,X) = Var(X).

If you want to see a “manual example” of calculation of sample covariance and correlation consider an example using the data in table 4.3. For each subject we observe age and a strength measure. 

Table 4.4 shows the calculation of covariance. The mean age is 50 and the mean strength is 19, so we calculate the deviation for age as age-50 and deviation for strength and strength-19. Then we find the product of the deviations and add them up. This total is 1106, and since n=11, the covariance of x and y is -1106/10=-110.6. The fact that the covariance is negative indicates that as age goes up strength tends to go down (and vice versa). 

The formula for the sample correlation is
Multivariate non-graphical EDA - Statistics, CSIR-NET Mathematical Sciences | Mathematics for IIT JAM, GATE, CSIR NET, UGC NET

where sx is the standard deviation of X and sis the standard deviation of Y .

 In this example, Multivariate non-graphical EDA - Statistics, CSIR-NET Mathematical Sciences | Mathematics for IIT JAM, GATE, CSIR NET, UGC NET This is a strong negative correlation.

Multivariate non-graphical EDA - Statistics, CSIR-NET Mathematical Sciences | Mathematics for IIT JAM, GATE, CSIR NET, UGC NET

Table 4.3: Covariance Sample Data
 

4.4.5 Covariance and correlation matrices

When we have many quantitative variables the most common non-graphical EDA technique is to calculate all of the pairwise covariances and/or correlations and assemble them into a matrix. Note that the covariance of X with X is the variance of X and the correlation of X with X is 1.0. For example the covariance matrix of table 4.5 tells us that the variances of X, Y , and Z are 5, 7, and 4 respectively, the covariance of X and Y is 1.77, the covariance of X and Z is -2.24, and the covariance of Y and Z is 3.17. 

Similarly the correlation matrix in figure 4.6 tells us that the correlation of X and Y is 0.3, the correlation of X and Z is -0.5. and the correlation of Y and Z is 0.6.

Multivariate non-graphical EDA - Statistics, CSIR-NET Mathematical Sciences | Mathematics for IIT JAM, GATE, CSIR NET, UGC NET

Table 4.4: Covariance Calculation

Multivariate non-graphical EDA - Statistics, CSIR-NET Mathematical Sciences | Mathematics for IIT JAM, GATE, CSIR NET, UGC NET

Table 4.5: A Covariance Matrix
 

The correlation between two random variables is a number that runs from -1 through 0 to +1 and indicates a strong inverse relationship, no relationship, and a strong direct relationship, respectively.

The document Multivariate non-graphical EDA - Statistics, CSIR-NET Mathematical Sciences | Mathematics for IIT JAM, GATE, CSIR NET, UGC NET is a part of the Mathematics Course Mathematics for IIT JAM, GATE, CSIR NET, UGC NET.
All you need of Mathematics at this link: Mathematics
556 videos|198 docs

FAQs on Multivariate non-graphical EDA - Statistics, CSIR-NET Mathematical Sciences - Mathematics for IIT JAM, GATE, CSIR NET, UGC NET

1. What is multivariate non-graphical EDA?
Ans. Multivariate non-graphical EDA refers to the process of analyzing and understanding the relationships and patterns between multiple variables or factors without using graphical representations. It involves using statistical techniques and methods to explore and summarize the data.
2. What are the advantages of multivariate non-graphical EDA?
Ans. Multivariate non-graphical EDA offers several advantages, such as: - It allows for the analysis of complex datasets with multiple variables, providing a comprehensive understanding of the relationships between them. - It helps in identifying patterns, trends, and correlations in the data, leading to valuable insights and discoveries. - It provides a quantitative approach to analyzing data, allowing for statistical testing and hypothesis generation. - It is useful for identifying outliers, influential observations, and unusual patterns in multivariate datasets. - It enables the exploration of interactions and dependencies between variables, aiding in decision-making and problem-solving.
3. What statistical techniques are commonly used in multivariate non-graphical EDA?
Ans. Several statistical techniques are commonly used in multivariate non-graphical EDA, including: - Descriptive statistics: These methods summarize and describe the main characteristics of the variables, such as mean, median, variance, and correlation coefficients. - Inferential statistics: These techniques are used to make inferences and draw conclusions about the population based on sample data, using methods like hypothesis testing and confidence intervals. - Multivariate analysis: This involves analyzing multiple variables simultaneously to understand their relationships and dependencies. Techniques like principal component analysis, factor analysis, and cluster analysis are commonly used. - Regression analysis: This technique helps in modeling the relationship between a dependent variable and one or more independent variables, allowing for prediction and interpretation. - ANOVA (Analysis of Variance): ANOVA is used to compare means between two or more groups, helping to determine if there are significant differences between them.
4. How does multivariate non-graphical EDA contribute to CSIR-NET Mathematical Sciences Mathematics?
Ans. Multivariate non-graphical EDA is an essential tool in CSIR-NET Mathematical Sciences Mathematics as it allows researchers and mathematicians to analyze complex datasets with multiple variables. By understanding the relationships and patterns between these variables, it helps in developing mathematical models, making predictions, and solving real-world problems. It provides a quantitative approach to data analysis, allowing for statistical testing and hypothesis generation, which are fundamental aspects of mathematical research.
5. How can multivariate non-graphical EDA be applied in practical situations?
Ans. Multivariate non-graphical EDA finds applications in various practical situations, including: - Market research: Analyzing customer preferences, buying patterns, and demographic data to identify market segments and develop targeted marketing strategies. - Healthcare: Investigating the relationships between medical variables to understand disease progression, identify risk factors, and develop treatment protocols. - Finance: Analyzing financial data to identify correlations between variables, assess investment risks, and develop predictive models for stock market trends. - Environmental research: Exploring the relationships between environmental factors, such as temperature, rainfall, and pollution levels, to understand their impact on ecosystems and climate change. - Social sciences: Studying the relationships between socioeconomic factors, demographic variables, and social trends to gain insights into human behavior, population dynamics, and policy planning.
556 videos|198 docs
Download as PDF
Explore Courses for Mathematics exam
Signup for Free!
Signup to see your scores go up within 7 days! Learn & Practice with 1000+ FREE Notes, Videos & Tests.
10M+ students study on EduRev
Related Searches

video lectures

,

pdf

,

Objective type Questions

,

past year papers

,

MCQs

,

Viva Questions

,

GATE

,

Multivariate non-graphical EDA - Statistics

,

Previous Year Questions with Solutions

,

Exam

,

CSIR NET

,

Multivariate non-graphical EDA - Statistics

,

Important questions

,

UGC NET

,

CSIR-NET Mathematical Sciences | Mathematics for IIT JAM

,

GATE

,

CSIR NET

,

Multivariate non-graphical EDA - Statistics

,

Sample Paper

,

study material

,

CSIR-NET Mathematical Sciences | Mathematics for IIT JAM

,

Extra Questions

,

Free

,

mock tests for examination

,

ppt

,

UGC NET

,

Summary

,

UGC NET

,

shortcuts and tricks

,

Semester Notes

,

practice quizzes

,

GATE

,

CSIR-NET Mathematical Sciences | Mathematics for IIT JAM

,

CSIR NET

;