Open App

PE Exam Exam > PE Exam Notes > Engineering Fundamentals Revision for PE > Formula Sheet: Descriptive Statistics

Formula Sheet: Descriptive Statistics

Table of Contents
1. Measures of Central Tendency
2. Measures of Dispersion (Variability)
3. Measures of Position
4. Measures of Shape
5. Correlation and Covariance
6. Linear Regression
7. Frequency Distributions
8. Outlier Detection
9. Special Statistical Properties
10. Empirical Rule (68-95-99.7 Rule)
11. Chebyshev's Theorem
View more

Measures of Central Tendency

Mean (Arithmetic Average)

Sample Mean: \[ \bar{x} = \frac{1}{n}\sum_{i=1}^{n}x_i = \frac{x_1 + x_2 + ... + x_n}{n} \]

\(\bar{x}\) = sample mean
\(x_i\) = individual data values
\(n\) = number of observations in the sample

Population Mean: \[ \mu = \frac{1}{N}\sum_{i=1}^{N}x_i \]

\(\mu\) = population mean
\(N\) = total number of observations in the population

Weighted Mean: \[ \bar{x}_w = \frac{\sum_{i=1}^{n}w_i x_i}{\sum_{i=1}^{n}w_i} \]

\(\bar{x}_w\) = weighted mean
\(w_i\) = weight assigned to observation \(x_i\)
\(x_i\) = individual data values
\(n\) = number of observations

Median

Definition: The middle value when data is arranged in ascending or descending order. For odd number of observations: \[ \text{Median} = x_{\frac{n+1}{2}} \] For even number of observations: \[ \text{Median} = \frac{x_{\frac{n}{2}} + x_{\frac{n}{2}+1}}{2} \]

\(n\) = number of observations
\(x_i\) = data values arranged in order

Mode

Definition: The value that occurs most frequently in a dataset.

A dataset can have no mode, one mode (unimodal), two modes (bimodal), or multiple modes (multimodal)
Mode is the only measure of central tendency applicable to nominal data

Measures of Dispersion (Variability)

Range

\[ \text{Range} = x_{max} - x_{min} \]

\(x_{max}\) = maximum value in dataset
\(x_{min}\) = minimum value in dataset

Variance

Sample Variance: \[ s^2 = \frac{1}{n-1}\sum_{i=1}^{n}(x_i - \bar{x})^2 \]

\(s^2\) = sample variance
\(x_i\) = individual data values
\(\bar{x}\) = sample mean
\(n\) = number of observations in the sample
Note: Division by \(n-1\) provides an unbiased estimate (Bessel's correction)

Population Variance: \[ \sigma^2 = \frac{1}{N}\sum_{i=1}^{N}(x_i - \mu)^2 \]

\(\sigma^2\) = population variance
\(\mu\) = population mean
\(N\) = total number of observations in the population

Computational Formula for Sample Variance: \[ s^2 = \frac{\sum_{i=1}^{n}x_i^2 - \frac{(\sum_{i=1}^{n}x_i)^2}{n}}{n-1} \]

Standard Deviation

Sample Standard Deviation: \[ s = \sqrt{s^2} = \sqrt{\frac{1}{n-1}\sum_{i=1}^{n}(x_i - \bar{x})^2} \]

\(s\) = sample standard deviation
Units are the same as the original data

Population Standard Deviation: \[ \sigma = \sqrt{\sigma^2} = \sqrt{\frac{1}{N}\sum_{i=1}^{N}(x_i - \mu)^2} \]

\(\sigma\) = population standard deviation

Coefficient of Variation

\[ CV = \frac{s}{\bar{x}} \times 100\% \] or for population: \[ CV = \frac{\sigma}{\mu} \times 100\% \]

CV = coefficient of variation (expressed as percentage)
\(s\) = sample standard deviation
\(\bar{x}\) = sample mean
Note: Dimensionless measure of relative variability; useful for comparing variability between datasets with different units or means

Interquartile Range (IQR)

\[ IQR = Q_3 - Q_1 \]

IQR = interquartile range
\(Q_3\) = third quartile (75th percentile)
\(Q_1\) = first quartile (25th percentile)
Measures the spread of the middle 50% of the data
Resistant to outliers

Measures of Position

Percentiles

Position of kth Percentile: \[ L_k = \frac{k}{100}(n+1) \]

\(L_k\) = position of the kth percentile
\(k\) = desired percentile (0 to 100)
\(n\) = number of observations
Note: If \(L_k\) is not an integer, interpolate between the two nearest data values

Quartiles

\(Q_1\) = first quartile = 25th percentile
\(Q_2\) = second quartile = 50th percentile = median
\(Q_3\) = third quartile = 75th percentile

Standard Score (Z-Score)

Sample Z-Score: \[ z = \frac{x - \bar{x}}{s} \] Population Z-Score: \[ z = \frac{x - \mu}{\sigma} \]

\(z\) = standardized score
\(x\) = data value
\(\bar{x}\) or \(\mu\) = mean
\(s\) or \(\sigma\) = standard deviation
Z-score indicates how many standard deviations a value is from the mean
Positive z-score: value is above the mean
Negative z-score: value is below the mean

Measures of Shape

Skewness

Sample Skewness (Pearson's moment coefficient): \[ g_1 = \frac{n}{(n-1)(n-2)}\sum_{i=1}^{n}\left(\frac{x_i - \bar{x}}{s}\right)^3 \] Approximate Skewness: \[ \text{Skewness} \approx \frac{3(\bar{x} - \text{Median})}{s} \]

\(g_1\) = sample skewness coefficient
\(\bar{x}\) = sample mean
\(s\) = sample standard deviation
Interpretation:
- \(g_1 = 0\): symmetric distribution
- \(g_1 > 0\): positively skewed (right-skewed, tail extends to the right)
- \(g_1 < 0\):="" negatively="" skewed="" (left-skewed,="" tail="" extends="" to="" the="">

Kurtosis

Sample Kurtosis (excess kurtosis): \[ g_2 = \frac{n(n+1)}{(n-1)(n-2)(n-3)}\sum_{i=1}^{n}\left(\frac{x_i - \bar{x}}{s}\right)^4 - \frac{3(n-1)^2}{(n-2)(n-3)} \]

\(g_2\) = excess kurtosis coefficient
Interpretation:
- \(g_2 = 0\): mesokurtic (normal distribution)
- \(g_2 > 0\): leptokurtic (heavier tails, more peaked)
- \(g_2 < 0\):="" platykurtic="" (lighter="" tails,="">

Correlation and Covariance

Covariance

Sample Covariance: \[ s_{xy} = \frac{1}{n-1}\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y}) \] Population Covariance: \[ \sigma_{xy} = \frac{1}{N}\sum_{i=1}^{N}(x_i - \mu_x)(y_i - \mu_y) \]

\(s_{xy}\) = sample covariance between variables x and y
\(\bar{x}\), \(\bar{y}\) = sample means of x and y
\(n\) = number of paired observations
Positive covariance indicates variables tend to move together
Negative covariance indicates variables tend to move in opposite directions

Correlation Coefficient

Pearson Correlation Coefficient (r): \[ r = \frac{s_{xy}}{s_x s_y} = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i=1}^{n}(x_i - \bar{x})^2}\sqrt{\sum_{i=1}^{n}(y_i - \bar{y})^2}} \] Alternative Computational Formula: \[ r = \frac{n\sum x_i y_i - \sum x_i \sum y_i}{\sqrt{n\sum x_i^2 - (\sum x_i)^2}\sqrt{n\sum y_i^2 - (\sum y_i)^2}} \]

\(r\) = Pearson correlation coefficient
\(s_x\), \(s_y\) = sample standard deviations of x and y
Range: \(-1 \leq r \leq +1\)
Interpretation:
- \(r = +1\): perfect positive linear relationship
- \(r = -1\): perfect negative linear relationship
- \(r = 0\): no linear relationship
- \(|r| > 0.7\): strong correlation
- \(0.3 < |r|="">< 0.7\):="" moderate="">
- \(|r| < 0.3\):="" weak="">

Coefficient of Determination

\[ r^2 = \text{(Pearson correlation coefficient)}^2 \]

\(r^2\) = coefficient of determination
Range: \(0 \leq r^2 \leq 1\)
Represents the proportion of variance in one variable that is predictable from the other variable
Expressed as a percentage when multiplied by 100

Linear Regression

Simple Linear Regression Model

\[ y = a + bx \] or \[ \hat{y} = a + bx \]

\(\hat{y}\) = predicted value of dependent variable
\(x\) = independent variable
\(a\) = y-intercept
\(b\) = slope of the regression line

Slope of Regression Line

\[ b = \frac{s_{xy}}{s_x^2} = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^{n}(x_i - \bar{x})^2} \] Alternative Computational Formula: \[ b = \frac{n\sum x_i y_i - \sum x_i \sum y_i}{n\sum x_i^2 - (\sum x_i)^2} \]

\(b\) = slope (regression coefficient)
\(s_{xy}\) = covariance of x and y
\(s_x^2\) = variance of x

Y-Intercept of Regression Line

\[ a = \bar{y} - b\bar{x} \]

\(a\) = y-intercept
\(\bar{x}\), \(\bar{y}\) = means of x and y
\(b\) = slope
The regression line always passes through the point \((\bar{x}, \bar{y})\)

Residual

\[ e_i = y_i - \hat{y}_i \]

\(e_i\) = residual for observation i
\(y_i\) = actual observed value
\(\hat{y}_i\) = predicted value from regression equation

Sum of Squared Errors (SSE)

\[ SSE = \sum_{i=1}^{n}(y_i - \hat{y}_i)^2 = \sum_{i=1}^{n}e_i^2 \]

SSE = sum of squared errors (residuals)
Measure of variation not explained by the regression model

Standard Error of Estimate

\[ s_e = \sqrt{\frac{SSE}{n-2}} = \sqrt{\frac{\sum_{i=1}^{n}(y_i - \hat{y}_i)^2}{n-2}} \]

\(s_e\) = standard error of the estimate
\(n\) = number of observations
Measures the typical deviation of observed values from the regression line
Smaller values indicate better fit

Frequency Distributions

Class Width

\[ \text{Class Width} = \frac{\text{Range}}{\text{Number of Classes}} = \frac{x_{max} - x_{min}}{k} \]

\(k\) = number of classes
Typically round up to a convenient number

Sturges' Rule for Number of Classes

\[ k = 1 + 3.322 \log_{10}(n) \] or \[ k = 1 + \frac{\log(n)}{\log(2)} \]

\(k\) = suggested number of classes
\(n\) = number of observations
Provides a starting point; final number may be adjusted

Class Midpoint

\[ \text{Midpoint} = \frac{\text{Lower Class Limit} + \text{Upper Class Limit}}{2} \]

Relative Frequency

\[ \text{Relative Frequency} = \frac{\text{Class Frequency}}{n} \]

\(n\) = total number of observations
Sum of all relative frequencies equals 1.0

Cumulative Frequency

Sum of frequencies up to and including the current class
The cumulative frequency of the last class equals \(n\)

Outlier Detection

Interquartile Range (IQR) Method

Lower Fence: \[ \text{Lower Fence} = Q_1 - 1.5 \times IQR \] Upper Fence: \[ \text{Upper Fence} = Q_3 + 1.5 \times IQR \]

Data points below the lower fence or above the upper fence are considered outliers
\(Q_1\) = first quartile
\(Q_3\) = third quartile
\(IQR\) = interquartile range = \(Q_3 - Q_1\)

Z-Score Method

A data point is typically considered an outlier if:
\(|z| > 2\) (for small samples)
\(|z| > 3\) (for large samples or more conservative approach)
Where \(z\) is the z-score of the data point

Special Statistical Properties

Properties of Mean

Sum of deviations from the mean equals zero: \(\sum_{i=1}^{n}(x_i - \bar{x}) = 0\)
Mean is sensitive to outliers and extreme values
Mean of a constant times a variable: \(\overline{kx} = k\bar{x}\)
Mean of sum/difference: \(\overline{x \pm y} = \bar{x} \pm \bar{y}\)

Properties of Variance

Variance of a constant is zero: \(\text{Var}(k) = 0\)
Variance of a constant times a variable: \(\text{Var}(kx) = k^2 \text{Var}(x)\)
For independent variables: \(\text{Var}(x \pm y) = \text{Var}(x) + \text{Var}(y)\)

Properties of Standard Deviation

Standard deviation of a constant is zero: \(\text{SD}(k) = 0\)
Standard deviation of a constant times a variable: \(\text{SD}(kx) = |k| \times \text{SD}(x)\)

Empirical Rule (68-95-99.7 Rule)

For normal (bell-shaped) distributions:

Approximately 68% of data falls within \(\mu \pm \sigma\) (one standard deviation from mean)
Approximately 95% of data falls within \(\mu \pm 2\sigma\) (two standard deviations from mean)
Approximately 99.7% of data falls within \(\mu \pm 3\sigma\) (three standard deviations from mean)

Chebyshev's Theorem

For any distribution (regardless of shape): \[ \text{Minimum proportion} = 1 - \frac{1}{k^2} \]

\(k\) = number of standard deviations from the mean (\(k > 1\))
At least \(1 - \frac{1}{k^2}\) of the data falls within \(k\) standard deviations of the mean
Examples:
- \(k = 2\): at least 75% of data within \(\mu \pm 2\sigma\)
- \(k = 3\): at least 89% of data within \(\mu \pm 3\sigma\)

The document Formula Sheet: Descriptive Statistics is a part of the PE Exam Course Engineering Fundamentals Revision for PE.

All you need of PE Exam at this link: PE Exam

Engineering Fundamentals Revision for PE

Join Course for Free

About this Document

Apr 20, 2026 Last updated

Related Exams

PE Exam

Document Description: Formula Sheet: Descriptive Statistics for PE Exam 2026 is part of Engineering Fundamentals Revision for PE preparation. The notes and questions for Formula Sheet: Descriptive Statistics have been prepared according to the PE Exam exam syllabus. Information about Formula Sheet: Descriptive Statistics covers topics like and Formula Sheet: Descriptive Statistics Example, for PE Exam 2026 Exam. Find important definitions, questions, notes, meanings, examples, exercises and tests below for Formula Sheet: Descriptive Statistics.

Introduction of Formula Sheet: Descriptive Statistics in English is available as part of our Engineering Fundamentals Revision for PE for PE Exam & Formula Sheet: Descriptive Statistics in Hindi for Engineering Fundamentals Revision for PE course. Download more important topics related with notes, lectures and mock test series for PE Exam Exam by signing up for free. PE Exam: Formula Sheet: Descriptive Statistics

Description

Formula Sheet: Descriptive Statistics of Engineering Fundamentals Revision on EduRev - covers all the formulas compiled in a single place. Download free PDF for last-minute revision.

Information about Formula Sheet: Descriptive Statistics

In this doc you can find the meaning of Formula Sheet: Descriptive Statistics defined & explained in the simplest way possible. Besides explaining types of Formula Sheet: Descriptive Statistics theory, EduRev gives you an ample number of questions to practice Formula Sheet: Descriptive Statistics tests, examples and also practice PE Exam tests

Engineering Fundamentals Revision for PE

Join Course for Free

Download as PDF

Explore Courses for PE Exam exam

Get EduRev Notes directly in your Google search

Formula Sheet: Descriptive Statistics Free PDF Download

The Formula Sheet: Descriptive Statistics is an invaluable resource that delves deep into the core of the PE Exam exam. These study notes are curated by experts and cover all the essential topics and concepts, making your preparation more efficient and effective. With the help of these notes, you can grasp complex subjects quickly, revise important points easily, and reinforce your understanding of key concepts. The study notes are presented in a concise and easy-to-understand manner, allowing you to optimize your learning process. Whether you're looking for best-recommended books, sample papers, study material, or toppers' notes, this PDF has got you covered. Download the Formula Sheet: Descriptive Statistics now and kickstart your journey towards success in the PE Exam exam.

Importance of Formula Sheet: Descriptive Statistics

The importance of Formula Sheet: Descriptive Statistics cannot be overstated, especially for PE Exam aspirants. This document holds the key to success in the PE Exam exam. It offers a detailed understanding of the concept, providing invaluable insights into the topic. By knowing the concepts well in advance, students can plan their preparation effectively. Utilize this indispensable guide for a well-rounded preparation and achieve your desired results.

Formula Sheet: Descriptive Statistics Notes

Formula Sheet: Descriptive Statistics Notes offer in-depth insights into the specific topic to help you master it with ease. This comprehensive document covers all aspects related to Formula Sheet: Descriptive Statistics. It includes detailed information about the exam syllabus, recommended books, and study materials for a well-rounded preparation. Practice papers and question papers enable you to assess your progress effectively. Additionally, the paper analysis provides valuable tips for tackling the exam strategically. Access to Toppers' notes gives you an edge in understanding complex concepts. Whether you're a beginner or aiming for advanced proficiency, Formula Sheet: Descriptive Statistics Notes on EduRev are your ultimate resource for success.

Formula Sheet: Descriptive Statistics PE Exam Questions

The "Formula Sheet: Descriptive Statistics PE Exam Questions" guide is a valuable resource for all aspiring students preparing for the PE Exam exam. It focuses on providing a wide range of practice questions to help students gauge their understanding of the exam topics. These questions cover the entire syllabus, ensuring comprehensive preparation. The guide includes previous years' question papers for students to familiarize themselves with the exam's format and difficulty level. Additionally, it offers subject-specific question banks, allowing students to focus on weak areas and improve their performance.

Study Formula Sheet: Descriptive Statistics on the App

Students of PE Exam can study Formula Sheet: Descriptive Statistics alongwith tests & analysis from the EduRev app, which will help them while preparing for their exam. Apart from the Formula Sheet: Descriptive Statistics, students can also utilize the EduRev App for other study materials such as previous year question papers, syllabus, important questions, etc. The EduRev App will make your learning easier as you can access it from anywhere you want. The content of Formula Sheet: Descriptive Statistics is prepared as per the latest PE Exam syllabus.

Signup to see your scores go up
within 7 days!

Continue with Google

Takes less than 10 seconds to signup