Mathematics Exam  >  Mathematics Notes  >  Mathematics for IIT JAM, GATE, CSIR NET, UGC NET  >  Multivariate graphical EDA - Statistics, CSIR-NET Mathematical Sciences

Multivariate graphical EDA - Statistics, CSIR-NET Mathematical Sciences | Mathematics for IIT JAM, GATE, CSIR NET, UGC NET PDF Download

4.5 Multivariate graphical EDA

There are few useful techniques for graphical EDA of two categorical random variables. The only one used commonly is a grouped barplot with each group representing one level of one of the variables and each bar within a group representing the levels of the other variable.

Multivariate graphical EDA - Statistics, CSIR-NET Mathematical Sciences | Mathematics for IIT JAM, GATE, CSIR NET, UGC NET

Table 4.6: A Correlation Matrix
 

4.5.1 Univariate graphs by category

When we have one categorical (usually explanatory) and one quantitative (usually outcome) variable, graphical EDA usually takes the form of “conditioning” on the categorical random variable. This simply indicates that we focus on all of the subjects with a particular level of the categorical random variable, then make plots of the quantitative variable for those subjects. We repeat this for each level of the categorical variable, then compare the plots. The most commonly used of these are side-by-side boxplots, as in figure 4.15. Here we see the data from EDA3.dat, which consists of strength data for each of three age groups. You can see the downward trend in the median as the ages increase. The spreads (IQRs) are similar for the three groups. And all three groups are roughly symmetrical with one high strength outlier in the youngest age group.

Side-by-side boxplots are the best graphical EDA technique for examining the relationship between a categorical variable and a quantitative variable, as well as the distribution of the quantitative variable at each level of the categorical variable.


4.5.2 Scatterplots

For two quantitative variables, the basic graphical EDA technique is the scatterplot which has one variable on the x-axis, one on the y-axis and a point for each case in your dataset. If one variable is explanatory and the other is outcome, it is a very, very strong convention to put the outcome on the y (vertical) axis. 

One or two additional categorical variables can be accommodated on the scatterplot by encoding the additional information in the symbol type and/or color.

Multivariate graphical EDA - Statistics, CSIR-NET Mathematical Sciences | Mathematics for IIT JAM, GATE, CSIR NET, UGC NET

Multivariate graphical EDA - Statistics, CSIR-NET Mathematical Sciences | Mathematics for IIT JAM, GATE, CSIR NET, UGC NET

An example is shown in figure 4.16. Age vs. strength is shown, and different colors and symbols are used to code political party and gender.

In a nutshell: You should always perform appropriate EDA before further analysis of your data. Perform whatever steps are necessary to become more familiar with your data, check for obvious mistakes, learn about variable distributions, and learn about relationships between variables. EDA is not an exact science – it is a very important art!



4.6 A note on degrees of freedom

Degrees of freedom are numbers that characterize specific distributions in a family of distributions. Often we find that a certain family of distributions is needed in a some general situation, and then we need to calculate the degrees of freedom to know which specific distribution within the family is appropriate. 

The most common situation is when we have a particular statistic and want to know its sampling distribution. If the sampling distribution falls in the “t” family as when performing a t-test, or in the “F” family when performing an ANOVA, or in several other families, we need to find the number of degrees of freedom to figure out which particular member of the family actually represents the desired sampling distribution. One way to think about degrees of freedom for a statistic is that they represent the number of independent pieces of information that go into the calculation of the statistic, 

Consider 5 numbers with a mean of 10. To calculate the variance of these numbers we need to sum the squared deviations (from the mean). It really doesn’t matter whether the mean is 10 or any other number: as long as all five deviations are the same, the variance will be the same. This make sense because variance is a pure measure of spread, not affected by central tendency. But by mathematically rearranging the definition of mean, it is not too hard to show that the sum of the deviations (not squared) is always zero. Therefore, the first four deviations can (freely) be any numbers, but then the last one is forced to be the number that makes the deviations add to zero, and we are not free to choose it. It is in this sense that five numbers used for calculating a variance or standard deviation have only four degrees of freedom (or independent useful pieces of information). In general, a variance or standard deviation calculated from n data values and one mean has n−1 df. 

Another example is the “pooled” variance from k independent groups. If the sizes of the groups are n1 through nk, then each of the k individual variance estimates is based on deviations from a different mean, and each has one less degree of freedom than its sample size, e.g., n−1 for group i. We also say that each numerator of a variance estimate, e.g., SSi, has ni−1 df. The pooled estimate of variance is 

Multivariate graphical EDA - Statistics, CSIR-NET Mathematical Sciences | Mathematics for IIT JAM, GATE, CSIR NET, UGC NET

and we say that both the numerator SS and the entire pooled variance has df1+···+ dfdegrees of freedom, which suggests how many independent pieces of information are available for the calculation.

The document Multivariate graphical EDA - Statistics, CSIR-NET Mathematical Sciences | Mathematics for IIT JAM, GATE, CSIR NET, UGC NET is a part of the Mathematics Course Mathematics for IIT JAM, GATE, CSIR NET, UGC NET.
All you need of Mathematics at this link: Mathematics
556 videos|198 docs

FAQs on Multivariate graphical EDA - Statistics, CSIR-NET Mathematical Sciences - Mathematics for IIT JAM, GATE, CSIR NET, UGC NET

1. What is multivariate graphical EDA?
Multivariate graphical EDA (Exploratory Data Analysis) refers to the analysis of multiple variables or features simultaneously through the use of visual representations. It helps in understanding the relationships, patterns, and distributions among different variables in a dataset.
2. How is multivariate graphical EDA useful in statistics?
Multivariate graphical EDA is useful in statistics as it allows us to visually explore the relationships between multiple variables. It helps in identifying correlations, trends, outliers, and clusters in the data, which can provide valuable insights for further analysis and modeling.
3. What are some commonly used techniques for multivariate graphical EDA?
Some commonly used techniques for multivariate graphical EDA include scatter plots, heatmaps, parallel coordinate plots, box plots, and histograms. These visualizations help in understanding the distribution, dispersion, and relationships between multiple variables in a dataset.
4. How does multivariate graphical EDA assist in CSIR-NET Mathematical Sciences exam?
In the CSIR-NET Mathematical Sciences exam, multivariate graphical EDA can assist in analyzing and interpreting complex datasets involving multiple variables. It helps in visualizing the data, identifying patterns, and making informed decisions based on the relationships observed among the variables.
5. What are the benefits of using multivariate graphical EDA in mathematics research?
Using multivariate graphical EDA in mathematics research provides several benefits. It allows researchers to visually explore the relationships between multiple mathematical variables, which can aid in hypothesis generation, model selection, and validation. It also helps in communicating research findings effectively through visual representations.
Explore Courses for Mathematics exam
Signup for Free!
Signup to see your scores go up within 7 days! Learn & Practice with 1000+ FREE Notes, Videos & Tests.
10M+ students study on EduRev
Related Searches

Extra Questions

,

practice quizzes

,

CSIR-NET Mathematical Sciences | Mathematics for IIT JAM

,

UGC NET

,

shortcuts and tricks

,

past year papers

,

Exam

,

Summary

,

ppt

,

Multivariate graphical EDA - Statistics

,

Viva Questions

,

study material

,

GATE

,

CSIR NET

,

CSIR NET

,

MCQs

,

pdf

,

CSIR NET

,

UGC NET

,

UGC NET

,

Previous Year Questions with Solutions

,

Multivariate graphical EDA - Statistics

,

GATE

,

video lectures

,

mock tests for examination

,

CSIR-NET Mathematical Sciences | Mathematics for IIT JAM

,

Multivariate graphical EDA - Statistics

,

Sample Paper

,

Important questions

,

CSIR-NET Mathematical Sciences | Mathematics for IIT JAM

,

GATE

,

Free

,

Semester Notes

,

Objective type Questions

;