UPSC Exam  >  UPSC Notes  >  CSAT Preparation  >  Important Formulas: Statistics

Important Formulas: Statistics

Important Formulas: Statistics

Why Statistics is Important for CSAT?

Statistics is a critical component of the CSAT (Civil Services Aptitude Test) examination, particularly in the quantitative aptitude section of Paper II (CSAT). Understanding statistics is essential for aspiring civil servants for several compelling reasons:

  • High Scoring Potential: Statistics questions in CSAT are typically straightforward and formula-based. With proper preparation and practice, these questions can be solved quickly and accurately, making them excellent scoring opportunities.
  • Data Interpretation Questions: A significant portion of CSAT consists of data interpretation questions involving tables, graphs, and charts. Statistical knowledge helps you analyze this data efficiently and extract meaningful insights within time constraints.
  • Real-World Application: As a civil servant, you will frequently encounter statistical data in reports, surveys, census data, economic indicators, and policy documents. Understanding measures of central tendency, dispersion, and probability helps in making informed administrative decisions.
  • Scoring in Limited Time: CSAT is known for its time pressure (120 minutes for 80 questions). Statistics questions, when you know the formulas, can be solved in 1-2 minutes, allowing you to allocate more time to challenging comprehension passages.
  • Consistent Question Pattern: Statistics questions in CSAT follow a predictable pattern year after year. Topics like mean, median, mode, range, and basic probability appear regularly, making preparation more focused and result-oriented.
  • Gateway to Data Analysis: Many CSAT questions combine statistics with logical reasoning and data sufficiency. A strong foundation in statistics helps you tackle these integrated questions with confidence.

Exam Strategy: In CSAT, you need to score a minimum of 33% (approximately 27 questions out of 80) to qualify. Statistics questions can form your core scoring area, typically contributing 8-12 questions in the paper. Mastering these formulas ensures you secure these marks reliably.

Bottom Line: Statistics is not just about numbers; it's about understanding patterns, making comparisons, and drawing conclusions from data. For CSAT success, treat statistics as your scoring fortress - predictable, manageable, and highly rewarding with the right preparation.

1. Measures of Central Tendency

1.1 Arithmetic Mean (Average)
$$\bar{x} = \frac{\text{Sum of all observations}}{\text{Number of observations}}$$
$$\bar{x} = \frac{x_1 + x_2 + x_3 + \cdots + x_n}{n}$$
$$\bar{x} = \frac{\sum x}{n}$$
Where:
\(\bar{x}\) = Mean
\(\sum x\) = Sum of all observations
\(n\) = Number of observations
Example: For data: 10, 20, 30, 40, 50
Mean = \(\frac{10+20+30+40+50}{5} = \frac{150}{5} = 30\)
1.2 Weighted Mean
$$\text{Weighted Mean} = \frac{\sum (w \times x)}{\sum w}$$
Where:
\(w\) = weight assigned to each value
\(x\) = value of observation
\(\sum w\) = Sum of all weights
1.3 Median
For odd number of observations:
$$\text{Median} = \text{Middle value when arranged in order}$$
For even number of observations:
$$\text{Median} = \frac{\left(\frac{n}{2}\right)^{\text{th}} \text{ term} + \left(\frac{n}{2} + 1\right)^{\text{th}} \text{ term}}{2}$$
Note: Always arrange data in ascending or descending order before finding median.
Example: For 5, 3, 8, 1, 9
Arranged: 1, 3, 5, 8, 9
Median = 5 (middle value)
1.4 Mode
$$\text{Mode} = \text{Most frequently occurring value in the dataset}$$
Note: A dataset can have one mode (unimodal), two modes (bimodal), multiple modes (multimodal), or no mode.
1.5 Relationship between Mean, Median, and Mode
For moderately skewed distribution:
$$\text{Mode} = 3 \times \text{Median} - 2 \times \text{Mean}$$

2. Measures of Dispersion

2.1 Range
$$\text{Range} = \text{Maximum Value} - \text{Minimum Value}$$
Example: For data: 5, 12, 3, 18, 7
Range = \(18 - 3 = 15\)
2.2 Variance (Population)
$$\sigma^2 = \frac{\sum (x - \mu)^2}{N}$$
Where:
\(\sigma^2\) = Population variance
\(x\) = Individual observation
\(\mu\) = Population mean
\(N\) = Total number of observations
2.3 Variance (Sample)
$$s^2 = \frac{\sum (x - \bar{x})^2}{n - 1}$$
Where:
\(s^2\) = Sample variance
\(\bar{x}\) = Sample mean
\(n\) = Sample size
2.4 Standard Deviation (Population)
$$\sigma = \sqrt{\frac{\sum (x - \mu)^2}{N}}$$
Note: Standard deviation is the square root of variance.
2.5 Standard Deviation (Sample)
$$s = \sqrt{\frac{\sum (x - \bar{x})^2}{n - 1}}$$
2.6 Coefficient of Variation
$$CV = \left(\frac{\sigma}{\mu}\right) \times 100\%$$
Where:
\(CV\) = Coefficient of Variation
\(\sigma\) = Standard deviation
\(\mu\) = Mean
Note: Used to compare variability between datasets with different units or means.

3. Probability

3.1 Basic Probability
$$P(A) = \frac{\text{Number of favorable outcomes}}{\text{Total number of possible outcomes}}$$
$$P(A) = \frac{n(A)}{n(S)}$$
Where:
\(P(A)\) = Probability of event A
\(n(A)\) = Number of favorable outcomes
\(n(S)\) = Total number of outcomes in sample space
Note: \(0 \leq P(A) \leq 1\)
3.2 Complementary Events
$$P(A') = 1 - P(A)$$
$$P(A) + P(A') = 1$$
Where:
\(P(A')\) = Probability of event A not occurring
3.3 Addition Rule (Mutually Exclusive Events)
$$P(A \text{ or } B) = P(A) + P(B)$$
Note: Use when events cannot occur simultaneously.
3.4 Addition Rule (Non-Mutually Exclusive Events)
$$P(A \text{ or } B) = P(A) + P(B) - P(A \text{ and } B)$$
$$P(A \cup B) = P(A) + P(B) - P(A \cap B)$$
3.5 Multiplication Rule (Independent Events)
$$P(A \text{ and } B) = P(A) \times P(B)$$
$$P(A \cap B) = P(A) \times P(B)$$
Note: Use when the occurrence of one event does not affect the other.
3.6 Conditional Probability
$$P(A|B) = \frac{P(A \cap B)}{P(B)}$$
Where:
\(P(A|B)\) = Probability of A given that B has occurred
\(P(B) \neq 0\)

4. Permutations and Combinations

4.1 Factorial
$$n! = n \times (n-1) \times (n-2) \times \cdots \times 3 \times 2 \times 1$$
$$0! = 1$$
Example: \(5! = 5 \times 4 \times 3 \times 2 \times 1 = 120\)
4.2 Permutation (Arrangement)
$$^nP_r = \frac{n!}{(n - r)!}$$
Where:
\(n\) = Total number of items
\(r\) = Number of items being arranged
Order matters
Example: \(^5P_3 = \frac{5!}{(5-3)!} = \frac{120}{2} = 60\)
4.3 Combination (Selection)
$$^nC_r = \frac{n!}{r! \times (n - r)!}$$
Where:
\(n\) = Total number of items
\(r\) = Number of items being selected
Order does not matter
Example: \(^5C_3 = \frac{5!}{3! \times 2!} = \frac{120}{6 \times 2} = 10\)
4.4 Relationship between Permutation and Combination
$$^nP_r = {^nC_r} \times r!$$

5. Correlation and Regression

5.1 Correlation Coefficient (Pearson's r)
$$r = \frac{\sum[(x - \bar{x})(y - \bar{y})]}{\sqrt{\sum(x - \bar{x})^2 \times \sum(y - \bar{y})^2}}$$
Where:
\(r\) = Correlation coefficient
\(-1 \leq r \leq 1\)
\(r = 1\): Perfect positive correlation
\(r = -1\): Perfect negative correlation
\(r = 0\): No correlation
5.2 Linear Regression (Slope)
$$b = \frac{\sum[(x - \bar{x})(y - \bar{y})]}{\sum(x - \bar{x})^2}$$
Where:
\(b\) = Slope of regression line
5.3 Linear Regression Equation
$$y = a + bx$$
where: $$a = \bar{y} - b\bar{x}$$
Where:
\(a\) = y-intercept
\(b\) = slope
\(\bar{x}\) = mean of x values
\(\bar{y}\) = mean of y values

6. Additional Important Formulas

6.1 Percentile
$$\text{Position of } k^{\text{th}} \text{ percentile} = \left(\frac{k}{100}\right) \times n$$
Where:
\(k\) = percentile value (0 to 100)
\(n\) = number of observations
6.2 Quartiles
$$Q_1 \text{ (First Quartile)} = 25^{\text{th}} \text{ percentile}$$
$$Q_2 \text{ (Second Quartile)} = 50^{\text{th}} \text{ percentile} = \text{Median}$$
$$Q_3 \text{ (Third Quartile)} = 75^{\text{th}} \text{ percentile}$$
6.3 Interquartile Range (IQR)
$$IQR = Q_3 - Q_1$$
Note: IQR represents the middle 50% of the data and is resistant to outliers.
6.4 Z-Score (Standard Score)
$$z = \frac{x - \mu}{\sigma}$$
Where:
\(z\) = standard score
\(x\) = individual value
\(\mu\) = population mean
\(\sigma\) = population standard deviation
6.5 Expected Value
$$E(X) = \sum [x \times P(x)]$$
Where:
\(E(X)\) = Expected value
\(x\) = value of random variable
\(P(x)\) = probability of x

7. Quick Tips for CSAT Statistics

✓ Time Management: Spend no more than 1.5 minutes per statistics question.
✓ Formula Sheet: Memorize all basic formulas, especially mean, median, mode, range, and basic probability.
✓ Data Arrangement: For median and quartiles, always arrange data first.
✓ Calculator Use: Practice mental math and approximation techniques.
✓ Common Mistakes: Watch out for: confusing permutation with combination, forgetting to arrange data for median, incorrect probability calculations.
✓ Practice Areas: Focus on data interpretation tables, bar graphs, pie charts, and line graphs with statistical calculations.

Best wishes for your CSAT preparation! 

Practice regularly and master these formulas for success.

© 2026 CSAT Preparation Guide

The document Important Formulas: Statistics is a part of the UPSC Course CSAT Preparation.
All you need of UPSC at this link: UPSC

FAQs on Important Formulas: Statistics

1. Why are measures of central tendency important in statistics for CSAT?
Ans. Measures of central tendency, such as mean, median, and mode, are crucial for summarising a set of data points. They provide a single value that represents the entire dataset, helping candidates understand the general trend or typical value. In CSAT, this understanding is necessary for interpreting data and making informed decisions based on statistical information.
2. What are measures of dispersion and why do they matter in CSAT statistics?
Ans. Measures of dispersion, including range, variance, and standard deviation, indicate how spread out the values in a dataset are. They help assess the variability or consistency of the data. In CSAT, a good grasp of these measures allows candidates to evaluate the reliability of data, which is essential for drawing accurate conclusions and comparisons.
3. How is probability used in statistics for the CSAT?
Ans. Probability is a fundamental concept in statistics that quantifies the likelihood of an event occurring. In the context of the CSAT, understanding probability helps candidates make predictions and informed choices based on statistical data. It is particularly useful in questions involving risk assessment and decision-making under uncertainty.
4. What is the significance of permutations and combinations in CSAT statistics?
Ans. Permutations and combinations are important for calculating the number of possible arrangements or selections of a set of items. In CSAT, these concepts are vital for solving problems related to probability, counting principles, and combinatorial analysis. Mastery of these topics allows candidates to tackle complex statistical problems effectively.
5. Can you explain correlation and regression in the context of CSAT statistics?
Ans. Correlation and regression are statistical methods used to analyse the relationship between two or more variables. Correlation measures the strength and direction of a linear relationship, while regression provides a predictive model of how one variable affects another. In CSAT, these concepts are essential for interpreting data trends and making predictions based on statistical relationships.
Explore Courses for UPSC exam
Get EduRev Notes directly in your Google search
Related Searches
past year papers, video lectures, Exam, Semester Notes, MCQs, Important Formulas: Statistics, Previous Year Questions with Solutions, pdf , Viva Questions, practice quizzes, mock tests for examination, Important questions, Important Formulas: Statistics, Free, Important Formulas: Statistics, ppt, Summary, shortcuts and tricks, Extra Questions, Objective type Questions, study material, Sample Paper;