PE Exam Exam  >  PE Exam Notes  >  Engineering Fundamentals Revision for PE  >  Cheatsheet: Descriptive Statistics

Cheatsheet: Descriptive Statistics

1. Measures of Central Tendency

1.1 Mean

Type Formula
Arithmetic Mean (Population) μ = (Σxi) / N, where N is population size
Arithmetic Mean (Sample) x̄ = (Σxi) / n, where n is sample size
Weighted Mean w = (Σwixi) / (Σwi), where wi are weights
Geometric Mean GM = (x1 × x2 × ... × xn)1/n = n√(Πxi)
Harmonic Mean HM = n / (Σ(1/xi))
  • Mean is sensitive to outliers and extreme values
  • Geometric mean used for growth rates and ratios
  • Harmonic mean used for rates and ratios (e.g., average speed)

1.2 Median

  • Middle value when data is ordered from smallest to largest
  • For odd n: median is the ((n+1)/2)th value
  • For even n: median is the average of (n/2)th and ((n/2)+1)th values
  • Not affected by outliers; robust measure of central tendency
  • Preferred for skewed distributions

1.3 Mode

  • Most frequently occurring value in the dataset
  • Can have no mode, one mode (unimodal), two modes (bimodal), or multiple modes (multimodal)
  • Only measure of central tendency applicable to categorical data
  • Not affected by extreme values

1.4 Relationships Between Mean, Median, and Mode

Distribution Type Relationship
Symmetric Distribution Mean = Median = Mode
Right-Skewed (Positive Skew) Mode < median=""><>
Left-Skewed (Negative Skew) Mean < median=""><>

2. Measures of Dispersion

2.1 Range

  • Range = Maximum value - Minimum value
  • Simplest measure of spread; highly sensitive to outliers
  • Interquartile Range (IQR) = Q3 - Q1 (more robust)

2.2 Variance

Type Formula
Population Variance σ² = Σ(xi - μ)² / N
Sample Variance s² = Σ(xi - x̄)² / (n - 1)
Computational Formula s² = [Σxi² - (Σxi)²/n] / (n - 1)
  • Sample variance uses (n-1) denominator (Bessel's correction) for unbiased estimator
  • Units are squared units of original data

2.3 Standard Deviation

Type Formula
Population Standard Deviation σ = √[Σ(xi - μ)² / N]
Sample Standard Deviation s = √[Σ(xi - x̄)² / (n - 1)]
  • Most common measure of dispersion; same units as original data
  • Approximately 68% of data within ±1σ, 95% within ±2σ, 99.7% within ±3σ (for normal distribution)

2.4 Coefficient of Variation

  • CV = (s / x̄) × 100% for sample data
  • CV = (σ / μ) × 100% for population data
  • Dimensionless measure; useful for comparing variability between datasets with different units or means

2.5 Mean Absolute Deviation

  • MAD = Σ|xi - x̄| / n
  • Average absolute distance from the mean
  • Less sensitive to outliers than standard deviation

3. Measures of Position

3.1 Percentiles

  • Pk is the value below which k% of the data falls
  • Position: Lp = (p/100) × (n + 1), where p is the percentile
  • If Lp is not an integer, interpolate between adjacent values

3.2 Quartiles

Quartile Definition
Q1 (First Quartile) 25th percentile; lower quartile
Q2 (Second Quartile) 50th percentile; median
Q3 (Third Quartile) 75th percentile; upper quartile
  • IQR = Q3 - Q1
  • Outlier detection: values below Q1 - 1.5×IQR or above Q3 + 1.5×IQR

3.3 Z-Score (Standard Score)

  • z = (x - μ) / σ for population
  • z = (x - x̄) / s for sample
  • Indicates how many standard deviations a value is from the mean
  • Positive z-score: value above mean; negative z-score: value below mean
  • |z| > 3 often considered an outlier

4. Measures of Shape

4.1 Skewness

Formula Interpretation
Skewness = [n / ((n-1)(n-2))] × Σ[(xi - x̄)/s]³ Measures asymmetry of distribution
  • Skewness = 0: symmetric distribution
  • Skewness > 0: right-skewed (positive skew); tail extends to the right
  • Skewness < 0:="" left-skewed="" (negative="" skew);="" tail="" extends="" to="" the="">
  • |Skewness| < 0.5:="" approximately="">
  • 0.5 ≤ |Skewness| ≤ 1: moderately skewed
  • |Skewness| > 1: highly skewed

4.2 Kurtosis

Formula Interpretation
Kurtosis = [n(n+1) / ((n-1)(n-2)(n-3))] × Σ[(xi - x̄)/s]⁴ - [3(n-1)² / ((n-2)(n-3))] Measures tailedness of distribution
  • Excess kurtosis = 0: mesokurtic (normal distribution)
  • Excess kurtosis > 0: leptokurtic (heavy tails, peaked)
  • Excess kurtosis < 0:="" platykurtic="" (light="" tails,="">

5. Grouped Data

5.1 Frequency Distributions

Term Definition
Class Width Upper limit - Lower limit of a class interval
Class Midpoint xm = (Lower limit + Upper limit) / 2
Relative Frequency fi / n, where fi is class frequency
Cumulative Frequency Sum of frequencies up to and including current class

5.2 Mean for Grouped Data

  • x̄ = Σ(fi × xm,i) / n, where xm,i is class midpoint and fi is frequency

5.3 Variance for Grouped Data

  • s² = [Σfi(xm,i - x̄)²] / (n - 1)
  • Computational formula: s² = [Σfixm,i² - (Σfixm,i)²/n] / (n - 1)

5.4 Modal Class

  • Class interval with the highest frequency
  • Mode approximated by midpoint of modal class

6. Data Visualization

6.1 Graphical Methods

Graph Type Use
Histogram Shows frequency distribution of continuous data; bars touch
Bar Chart Shows frequencies of categorical data; bars separated
Box Plot (Box-and-Whisker) Displays five-number summary: minimum, Q1, median, Q3, maximum
Stem-and-Leaf Plot Preserves original data while showing distribution shape
Scatter Plot Shows relationship between two quantitative variables
Pie Chart Shows proportions of categorical data; parts of a whole

6.2 Five-Number Summary

  • Consists of: Minimum, Q1, Median (Q2), Q3, Maximum
  • Used to construct box plots
  • Provides robust description of data distribution

7. Correlation and Covariance

7.1 Covariance

Type Formula
Sample Covariance sxy = Σ(xi - x̄)(yi - ȳ) / (n - 1)
Population Covariance σxy = Σ(xi - μx)(yi - μy) / N
  • Measures direction of linear relationship between two variables
  • Positive covariance: variables increase together
  • Negative covariance: one variable increases as other decreases
  • Units depend on units of x and y; difficult to interpret magnitude

7.2 Correlation Coefficient (Pearson's r)

  • r = sxy / (sx × sy) = Σ(xi - x̄)(yi - ȳ) / √[Σ(xi - x̄)² × Σ(yi - ȳ)²]
  • Dimensionless; range: -1 ≤ r ≤ 1
  • r = 1: perfect positive linear relationship
  • r = -1: perfect negative linear relationship
  • r = 0: no linear relationship
  • |r| > 0.8: strong correlation; 0.5 < |r|="">< 0.8:="" moderate;="" |r|="">< 0.5:="">
  • r² (coefficient of determination): proportion of variance in y explained by x

7.3 Spearman's Rank Correlation

  • rs = 1 - [6Σdi²] / [n(n² - 1)], where di is difference in ranks
  • Used for ordinal data or when relationship is non-linear but monotonic
  • Less sensitive to outliers than Pearson's r

8. Empirical Rule and Chebyshev's Theorem

8.1 Empirical Rule (68-95-99.7 Rule)

  • Applies to bell-shaped (normal) distributions
  • Approximately 68% of data within μ ± σ
  • Approximately 95% of data within μ ± 2σ
  • Approximately 99.7% of data within μ ± 3σ

8.2 Chebyshev's Theorem

  • Applies to any distribution shape
  • At least [1 - (1/k²)] of data within μ ± kσ, where k > 1
  • k = 2: at least 75% within μ ± 2σ
  • k = 3: at least 89% within μ ± 3σ

9. Common Data Transformations

9.1 Linear Transformation

  • y = a + bx
  • Mean: ȳ = a + bx̄
  • Variance: sy² = b²sx²
  • Standard deviation: sy = |b|sx

9.2 Standardization

  • z = (x - x̄) / s
  • Standardized data has mean = 0 and standard deviation = 1
  • Allows comparison of variables with different units or scales

9.3 Logarithmic Transformation

  • y = log(x) or y = ln(x)
  • Reduces right skewness
  • Stabilizes variance for data with increasing spread
  • Useful for data spanning several orders of magnitude

10. Sampling and Data Collection

10.1 Types of Data

Data Type Description
Categorical (Qualitative) Non-numeric; represents categories or groups
Nominal Categories with no inherent order (e.g., color, gender)
Ordinal Categories with meaningful order (e.g., rankings, grades)
Numerical (Quantitative) Numeric values with mathematical meaning
Discrete Countable values (e.g., number of defects)
Continuous Measurable values on a continuum (e.g., temperature, length)

10.2 Levels of Measurement

Level Properties
Nominal Classification only; no ordering or meaningful zero
Ordinal Classification and ordering; no meaningful intervals
Interval Classification, ordering, and equal intervals; no true zero
Ratio All properties including true zero; ratios are meaningful

10.3 Sampling Methods

Method Description
Simple Random Sample Each member has equal probability of selection
Stratified Sample Population divided into strata; random sample from each stratum
Systematic Sample Select every kth member after random start
Cluster Sample Population divided into clusters; randomly select entire clusters
Convenience Sample Sample based on ease of access (non-random)
The document Cheatsheet: Descriptive Statistics is a part of the PE Exam Course Engineering Fundamentals Revision for PE.
All you need of PE Exam at this link: PE Exam
Explore Courses for PE Exam exam
Get EduRev Notes directly in your Google search
Related Searches
Previous Year Questions with Solutions, Extra Questions, Cheatsheet: Descriptive Statistics, past year papers, Sample Paper, Cheatsheet: Descriptive Statistics, Semester Notes, Important questions, shortcuts and tricks, Cheatsheet: Descriptive Statistics, pdf , Viva Questions, MCQs, Summary, video lectures, mock tests for examination, practice quizzes, study material, Objective type Questions, Exam, Free, ppt;