All Exams  >   UPSC  >   CSAT Preparation  >   All Questions

All questions of Statistics for UPSC CSE Exam

If the mean of 5 observations x, x + 2, x + 4, x + 6 and x + 8 is 11, then the mean of last 3 observations is 
  • a)
    13
  • b)
    15
  • c)
    17
  • d)
    11
Correct answer is option 'A'. Can you explain this answer?

Rajiv Reddy answered
Concept use:
Mean of the observation = Sum of the observations/ Total no of observations 
Calculations:
mean of 5 observations x, x + 2, x + 4, x + 6 and x + 8 is 11
⇒ mean of 5 observation = x + x + 2 + x + 4 + x + 6 + x + 8/ 5 = 5x + 20/5 = 11
⇒ mean of 5 observation = x + 4 = 11 
⇒  x = 7 
Mean of Last Observation = x + 4 + x + 6 + x + 8/ 3 = 3x + 18/3 = x + 6 = 7 + 6 = 13 

The appropriate average for calculating average percentage increase in population is____________.
  • a)
    Arithmetic Mean
  • b)
    Harmonic Mean
  • c)
    Mode
  • d)
    Geometric Mean
Correct answer is option 'D'. Can you explain this answer?

Utkarsh Joshi answered
When calculating the average percentage increase, it is important to consider the compounding nature of the changes. The geometric mean is well-suited for this purpose because it captures the growth rate over multiple periods.
The geometric mean is calculated by taking the nth root of the product of n values. In the context of calculating the average percentage increase in population, we would take the geometric mean of the growth rates observed over a specific period.
For example, let's consider a population that experienced the following percentage changes over five years: +10%, +5%, -3%, +8%, and +12%. To calculate the average percentage increase over these five years, we would take the geometric mean of these growth rates.
Using the arithmetic mean would not be appropriate in this case because it does not account for the compounding effect of the growth rates. The arithmetic mean would treat each growth rate equally, regardless of the compounding nature.
The geometric mean, on the other hand, considers the relative changes in the population size and provides an average growth rate that reflects the compounding effect over time. It is particularly useful when analyzing growth rates, financial returns, or any situation involving multiplicative changes.
In summary, the appropriate average for calculating the average percentage increase in population is the geometric mean.

The observations 4, 1, 4, 3, 6, 2, 1, 3, 4, 5, 1, 6 are outputs of 12 dices thrown simultaneously. If m and M are means of lowest 8 observations and highest 4 observations respectively, then what is (2m + M) equal to?  
  • a)
    10
  • b)
    12
  • c)
    17
  • d)
    21
Correct answer is option 'A'. Can you explain this answer?

Kaavya Gupta answered
To find the value of (2m + M), we need to calculate the mean of the lowest 8 observations (m) and the mean of the highest 4 observations (M). Let's break down the problem step by step:

Step 1: Sorting the observations
First, let's sort the given observations in ascending order:
1, 1, 1, 2, 3, 3, 4, 4, 4, 5, 6, 6

Step 2: Calculating the mean of the lowest 8 observations (m)
To find the mean (average), we sum up all the lowest 8 observations and divide it by 8:
m = (1 + 1 + 1 + 2 + 3 + 3 + 4 + 4) / 8
m = 19 / 8
m = 2.375

Step 3: Calculating the mean of the highest 4 observations (M)
To find the mean (average), we sum up all the highest 4 observations and divide it by 4:
M = (4 + 5 + 6 + 6) / 4
M = 21 / 4
M = 5.25

Step 4: Calculating (2m + M)
Now, we can substitute the values of m and M into the equation:
(2m + M) = (2 * 2.375 + 5.25)
(2m + M) = (4.75 + 5.25)
(2m + M) = 10

Therefore, (2m + M) is equal to 10, which corresponds to option A in the given options.

If mode of a grouped data is 10 and mean is 4, then median will be
  • a)
    1
  • b)
    4
  • c)
    6
  • d)
    8
Correct answer is option 'C'. Can you explain this answer?

Nilesh Patel answered
Concept use:
The relationship between mean, median, and mode in a "perfectly" symmetrical distribution is given by the empirical relationship:
Mode = 3(Median) - 2(Mean)
Calculations:
Median = (Mode + 2 × Mean) / 3
Median = (10 + 2 × 4) / 3 = 18/3 = 6

Which of the following describe the middle part of a group of numbers?________.
  • a)
    The Measure of Variability
  • b)
    The Measure of Central Tendency
  • c)
    The Measure of Association
  • d)
    The Measure of Shape
Correct answer is option 'B'. Can you explain this answer?

Rhea Kulkarni answered
The Measure of Central Tendency

The measure of central tendency refers to a statistical measure that represents the middle or typical value of a group of numbers. It provides us with a single value that summarizes the entire set of data. There are several measures of central tendency, including the mean, median, and mode. Among these options, the measure of central tendency is the one that best describes the middle part of a group of numbers.

Mean
The mean is the most commonly used measure of central tendency. It is calculated by summing up all the values in a data set and dividing it by the total number of values. The mean is influenced by outliers, which are extreme values that can distort the overall average.

Median
The median is another measure of central tendency. It is the middle value in a data set when the values are arranged in ascending or descending order. If there is an even number of values, the median is calculated by taking the average of the two middle values. The median is not affected by outliers, making it a better measure of central tendency when dealing with skewed data.

Mode
The mode is the value that appears most frequently in a data set. It is useful for describing the most common value or category in a set of data. Unlike the mean and median, the mode can be used with both numerical and categorical data.

Conclusion
In summary, the measure of central tendency is the statistical measure that describes the middle part of a group of numbers. It provides a single value that summarizes the data and represents the typical value in the set. The mean, median, and mode are different measures of central tendency, with each having its own advantages and uses depending on the nature of the data.

To find the average speed of a journey which is the appropriate measure of central tendency____________
  • a)
    Mean
  • b)
    Geometric Mean
  • c)
    Harmonic Mean
  • d)
    Weighted Mean
Correct answer is option 'C'. Can you explain this answer?

Meghana Roy answered
The appropriate measure of central tendency to find the average speed of a journey is the Harmonic Mean. Let's understand why the Harmonic Mean is the correct choice.

Mean:
The mean is a measure of central tendency that is commonly used to find the average of a set of values. It is calculated by summing up all the values and dividing by the total number of values. However, the mean is not suitable for finding the average speed of a journey because it does not take into account the distance traveled and the time taken.

Geometric Mean:
The geometric mean is another measure of central tendency that is used when dealing with values that are proportional or in a multiplicative relationship. It is calculated by taking the nth root of the product of n values. However, the geometric mean is not appropriate for finding the average speed of a journey because it does not consider the different distances traveled and the corresponding time taken.

Weighted Mean:
The weighted mean is a measure of central tendency that assigns weights to each value in the set, reflecting their importance or significance. It is calculated by multiplying each value by its corresponding weight, summing up the weighted values, and dividing by the sum of the weights. While the weighted mean can be useful in certain situations, it may not be appropriate for finding the average speed of a journey unless there are specific weights assigned to different distances or time intervals.

Harmonic Mean:
The harmonic mean is a measure of central tendency that is specifically used to find the average of rates or speeds. It is calculated by dividing the total distance traveled by the total time taken. The formula for the harmonic mean is given by:

Harmonic Mean = (Total distance traveled) / (Total time taken)

Since the harmonic mean takes into account both the distance traveled and the time taken, it provides a more accurate measure of the average speed of a journey. By using the harmonic mean, we ensure that each distance and time interval contributes proportionally to the overall average speed.

Therefore, the appropriate measure of central tendency to find the average speed of a journey is the Harmonic Mean.

A symmetrical distribution has mean equal to 4. Its mode will be______.
  • a)
    Equal to 4
  • b)
    Less than 4
  • c)
    Greater than 4
  • d)
    Not equal to 4
Correct answer is option 'A'. Can you explain this answer?

Suresh Reddy answered
In a symmetrical distribution, the data is evenly distributed around the central value, resulting in a mirror image when the distribution is folded along its center. The mean, median, and mode are all equal in a perfectly symmetrical distribution.
Given that the mean is equal to 4 in this case, it implies that the values on both sides of the distribution are balanced and cancel each other out when calculating the mean. Therefore, the median will also be equal to 4 since it represents the center point that divides the distribution into two equal halves.
In a symmetrical distribution, there is no skewness or bias towards either side. Each value has an equal chance of being the most frequently occurring value, which is the mode. Since the distribution is symmetrical and the mean is 4, the mode will also be equal to 4.
In summary, in a symmetrical distribution with a mean equal to 4, the mode will be equal to 4.

Sum of square of the deviations about mean is_______.
  • a)
    Maximum
  • b)
    Minimum
  • c)
    Zero
  • d)
    None of these
Correct answer is option 'B'. Can you explain this answer?

Sum of square of the deviations about mean is Minimum

The sum of the squares of deviations about the mean, also known as the sum of squared residuals or the sum of squared errors, is a measure of the variability or dispersion of a set of data points around their mean. It is often used in statistics to assess the goodness of fit of a regression model or to compare the variability of different datasets.

The sum of square of deviations about mean is minimum when the data points are located close to the mean. In other words, the closer the data points are to the mean, the smaller the sum of squares of deviations will be. This occurs because the deviations are squared before being summed, which means that larger deviations have a disproportionately larger impact on the sum.

Explanation:

When we calculate the sum of squares of deviations about the mean, we are essentially measuring how far each data point is from the mean, squaring those distances, and then summing them up. This gives us a measure of the overall variability or dispersion of the data.

When the data points are located close to the mean, the deviations will be small, and when we square these small deviations, we get even smaller values. As a result, the sum of squares of deviations will be relatively small.

On the other hand, when the data points are spread out from the mean, the deviations will be larger, and squaring these larger deviations will result in even larger values. Consequently, the sum of squares of deviations will be larger.

Therefore, the sum of squares of deviations about the mean is minimum when the data points are located close to the mean, indicating less variability or dispersion in the dataset.

Conclusion:

In conclusion, the sum of squares of deviations about the mean is minimum when the data points are located close to the mean. This measure of variability is used to assess the goodness of fit of regression models and compare the variability of different datasets. Understanding the concept of sum of squares of deviations can help in analyzing and interpreting statistical data.

If any of the value in the data set is zero then it is not possible (i.e. impossible) to compute_________.
  • a)
    Mode
  • b)
    Median
  • c)
    Mean
  • d)
    Harmonic Mean
Correct answer is option 'D'. Can you explain this answer?

Jaideep Sen answered
Introduction
In statistics, various measures are used to describe and analyze data sets. Some of the commonly used measures include the mode, median, mean, and harmonic mean. However, if any value in the data set is zero, it becomes impossible to compute the harmonic mean.

Explanation
The harmonic mean is a measure of central tendency that is used when dealing with rates or ratios. It is defined as the reciprocal of the arithmetic mean of the reciprocals of the individual values in the data set. Mathematically, the harmonic mean (H) is calculated as:

H = n / (∑(1/x))

where n is the number of values in the data set and x represents each individual value.

Reasoning
When any value in the data set is zero, it poses a problem in calculating the harmonic mean because division by zero is undefined. Since the harmonic mean involves taking the reciprocals of the individual values, the presence of a zero value would result in division by zero. As a result, the harmonic mean cannot be computed.

Example
Let's consider a simple example to illustrate this point. Suppose we have a data set with the values [2, 4, 0, 6, 8]. To calculate the harmonic mean, we would need to take the reciprocals of each value and sum them up. However, when we encounter the zero value, we cannot proceed with the calculation as division by zero is undefined. Hence, in this case, it is impossible to compute the harmonic mean.

Conclusion
In summary, if any value in the data set is zero, it becomes impossible to compute the harmonic mean. This is because the harmonic mean involves taking the reciprocals of the individual values, and division by zero is undefined. It is important to note that this limitation only applies to the harmonic mean, while other measures such as the mode, median, and mean can still be calculated even if zero values are present in the data set.

If mean is less than mode, the distribution will be__________.
  • a)
    Positively skewed
  • b)
    Negatively skewed
  • c)
    Symmetrical
  • d)
    None of these
Correct answer is option 'B'. Can you explain this answer?

Suresh Reddy answered
Skewness refers to the asymmetry or lack of symmetry in a distribution. It describes the extent to which the data points are skewed to the left or right of the central value.
When the mean is less than the mode, it indicates that the tail of the distribution is longer on the left side, pulling the mean towards the left. This implies that the distribution is negatively skewed, also known as left-skewed.
In a negatively skewed distribution, the tail extends towards the left side, while the bulk of the data is concentrated towards the right side. The mode represents the most frequently occurring value, and when it is greater than the mean, it suggests that the peak of the distribution is on the right side.
To visualize this, imagine a dataset representing the incomes of a group of individuals. If the distribution is negatively skewed and the mean is less than the mode, it means that there are a few individuals with extremely low incomes, which extends the tail towards the left. The majority of the individuals have higher incomes, which creates the peak or mode on the right side.
In summary, if the mean is less than the mode, the distribution will be negatively skewed. The tail of the distribution is longer on the left side, indicating a concentration of values towards the right side.

The calculation of mean and variance is based on________.
  • a)
    Small values only
  • b)
    Large values only
  • c)
    Extreme values only
  • d)
    All values
Correct answer is option 'D'. Can you explain this answer?

Jaya Nair answered
Both the mean and variance are statistical measures that provide insights into different aspects of a dataset.
The mean, also known as the arithmetic mean or average, is calculated by summing all the values in the dataset and dividing by the total number of values. It represents the central tendency or average value of the dataset. To obtain an accurate mean, all values in the dataset are considered and included in the calculation.
The variance is a measure of the dispersion or spread of the dataset. It quantifies the average squared deviation from the mean. To calculate the variance, each value in the dataset is subtracted from the mean, squared, and then summed. Again, all values in the dataset are taken into account in the variance calculation.
Both the mean and variance require consideration of all values in the dataset to provide meaningful and accurate results. Excluding any values would lead to an incomplete representation of the data and could potentially introduce biases or inaccuracies in the calculations.
Therefore, the calculation of mean and variance is based on all values in the dataset. It is important to include all values to obtain reliable and comprehensive measures of central tendency and dispersion.

Which of the following cannot be less than zero (negative)?
  • a)
    Median
  • b)
    Geometric Mean
  • c)
    Arithmetic Mean
  • d)
    Harmonic Mean
Correct answer is option 'B'. Can you explain this answer?

The geometric mean is a measure of central tendency that is commonly used for a set of positive numbers. It is calculated by taking the nth root of the product of n positive values.
Since the geometric mean involves taking the root of positive values, it cannot be negative. This is because taking the root of a negative number or zero is not defined in standard mathematical operations.
On the other hand, the median, arithmetic mean, and harmonic mean can be negative under certain circumstances. For example, if a dataset contains negative values, the median and arithmetic mean can be negative if the negative values outweigh the positive values.
Therefore, among the options given, the measure that cannot be less than zero (negative) is the geometric mean. It is specifically designed for positive values and does not yield negative results.

What is the mean of the range, mode and median of the data given below?
5, 10, 3, 6, 4, 8, 9, 3, 15, 2, 9, 4, 19, 11, 4
  • a)
    10
  • b)
    12
  • c)
    8
  • d)
    9
Correct answer is option 'D'. Can you explain this answer?

Nidhi Pillai answered
Step 1: Calculate the Range
- Definition: The range is the difference between the maximum and minimum values in a dataset.
- Data: 5, 10, 3, 6, 4, 8, 9, 3, 15, 2, 9, 4, 19, 11, 4
- Max Value: 19
- Min Value: 2
- Calculation: Range = Max - Min = 19 - 2 = 17
Step 2: Calculate the Mode
- Definition: The mode is the number that appears most frequently in the dataset.
- Frequency Count:
- 2: 1 time
- 3: 2 times
- 4: 3 times
- 5: 1 time
- 6: 1 time
- 8: 1 time
- 9: 2 times
- 10: 1 time
- 11: 1 time
- 15: 1 time
- 19: 1 time
- Most Frequent: 4 (occurs 3 times)
Step 3: Calculate the Median
- Definition: The median is the middle value when the data is arranged in ascending order.
- Sorted Data: 2, 3, 3, 4, 4, 4, 5, 6, 8, 9, 9, 10, 11, 15, 19
- Middle Value: With 15 values, the median is the 8th value.
- Median: 6
Step 4: Calculate the Mean of Range, Mode, and Median
- Values: Range = 17, Mode = 4, Median = 6
- Calculation: Mean = (Range + Mode + Median) / 3 = (17 + 4 + 6) / 3 = 27 / 3 = 9
Conclusion
- The mean of the range, mode, and median is 9, which confirms that the correct answer is option 'D'.

If mean and mode of some data are 4 & 10 respectively, its median will be:
  • a)
    1.5
  • b)
    5.3
  • c)
    16
  • d)
    6
Correct answer is option 'D'. Can you explain this answer?

Utkarsh Joshi answered
Concept:
Mean: The mean or average of a data set is found by adding all numbers in the data set and then dividing by the number of values in the set.
Mode: The mode is the value that appears most frequently in a data set.
Median: The median is a numeric value that separates the higher half of a set from the lower half. 
Relation b/w mean, mode and median:
Mode = 3(Median) - 2(Mean)
Calculation:
Given that,
mean of data = 4 and mode of  data = 10
We know that
Mode = 3(Median) - 2(Mean)
⇒ 10 = 3(median) - 2(4)
⇒ 3(median) = 18
⇒ median = 6
Hence, the median of data will be 6.

Data must be arranged either in ascending or descending order if some want to compute________.
  • a)
    Mode
  • b)
    Mean
  • c)
    Harmonic Mean
  • d)
    Median
Correct answer is option 'D'. Can you explain this answer?

Bhavya Gupta answered



Explanation:

Median Calculation:
- The median is the middle value in a set of numbers when they are arranged in either ascending or descending order.
- To calculate the median, the data must be arranged in a specific order so that the middle value can be identified.
- If the data is not arranged, it would be challenging to determine the middle value accurately.

Importance of Arranging Data:
- Arranging data in ascending or descending order is crucial for calculating the median accurately.
- If the data is not arranged, the median calculation may result in errors or incorrect values.
- The correct order of data ensures that the middle value is correctly identified, leading to an accurate median calculation.

Conclusion:
- In conclusion, arranging data in ascending or descending order is essential for computing the median accurately.
- Without proper arrangement, the calculation of the median may lead to errors and inaccuracies.
- Therefore, ensuring that the data is correctly ordered is crucial for obtaining the correct median value.

If any of the value in data set is negative then it is impossible to compute___________
  • a)
    Arithmetic Mean
  • b)
    Harmonic Mean
  • c)
    Geometric Mean
  • d)
    Mode
Correct answer is option 'C'. Can you explain this answer?

Explanation:

To compute the geometric mean, all values in the data set must be positive. The geometric mean is a measure of central tendency that is calculated by taking the nth root of the product of n numbers. It is commonly used when dealing with numbers that are related to each other multiplicatively, such as growth rates or ratios.

Arithmetic Mean:
The arithmetic mean is the sum of all values in a data set divided by the number of values. It is used to find the average value of a set of numbers. The presence of negative values in the data set does not affect the calculation of the arithmetic mean. Negative values can be balanced out by positive values, resulting in a non-negative mean.

Harmonic Mean:
The harmonic mean is the reciprocal of the arithmetic mean of the reciprocals of a set of numbers. It is used to find the average rate or speed when dealing with rates or speeds that are inversely proportional to each other. The presence of negative values in the data set does not prevent the computation of the harmonic mean. The reciprocal of a negative value is still a valid value, and the mean can be calculated accordingly.

Geometric Mean:
The geometric mean is the nth root of the product of n numbers. It is used to find the average value of a set of numbers that are related to each other multiplicatively. However, the presence of negative values in the data set affects the computation of the geometric mean. Taking the root of a negative number results in an imaginary number, which is not a valid value for the geometric mean. Therefore, if any value in the data set is negative, it is impossible to compute the geometric mean.

Mode:
The mode is the value that appears most frequently in a data set. Unlike the arithmetic mean, harmonic mean, and geometric mean, the mode is not affected by the presence of negative values. The mode can be computed regardless of the presence of negative numbers in the data set.

In conclusion, if any value in a data set is negative, it is impossible to compute the geometric mean. The arithmetic mean, harmonic mean, and mode can still be calculated even if negative values are present.

Find the median of the following data: 160, 180, 200, 280, 300, 320, 400_______.
  • a)
    140
  • b)
    300
  • c)
    180
  • d)
    280
Correct answer is option 'D'. Can you explain this answer?

Athul Khanna answered
To find the median of a set of data, we need to arrange the data in ascending order and find the middle value. In this case, the data set is: 160, 180, 200, 280, 300, 320, 400.

Arranging the data set in ascending order:
160, 180, 200, 280, 300, 320, 400

Finding the middle value:
Since there are 7 values in the data set, the middle value will be the 4th value when arranged in ascending order.

So, the median of the given data set is 280.

Therefore, the correct answer is option D) 280.

In Uni-model distribution, if mode is less than mean, then the distribution will be_________.
  • a)
    Symmetrical
  • b)
    Normal
  • c)
    Positively skewed
  • d)
    Negatively skewed
Correct answer is option 'D'. Can you explain this answer?

Nandini Bose answered
Explanation:

Uni-model Distribution:
A unimodal distribution is a type of probability distribution that has only one peak or mode.

Mode, Mean, and Skewness:
- The mode is the value that appears most frequently in a data set.
- The mean is the average of all the values in a data set.
- Skewness is a measure of the asymmetry of a distribution.

Relationship between Mode, Mean, and Skewness:
- If the mode is less than the mean, it means that the data is skewed to the left or negatively skewed.
- In a negatively skewed distribution, the tail of the distribution is longer on the left side of the peak, and the mean is less than the mode.

Conclusion:
Therefore, if the mode is less than the mean in a unimodal distribution, the distribution will be negatively skewed. This indicates that the data is concentrated on the right side of the peak, with a longer tail on the left side.

If the mode of the following data is 7, then the value of k in the data set 3, 8, 6, 7, 1, 6, 10, 6, 7, 2k + 5, 9, 7, and 13 is:
  • a)
    3
  • b)
    7
  • c)
    4
  • d)
    1
Correct answer is option 'D'. Can you explain this answer?

Utkarsh Joshi answered
Concept:
Mode is the value that occurs most often in the data set of values.
Calculation:
Given data values are 3, 8, 6, 7, 1, 6, 10, 6, 7, 2k + 5, 9, 7, and 13
In the above data set, values 6, and 7 have occurred more times i.e., 3 times
But given that mode is 7.
So, 7 should occur more times than 6.
Hence the variable 2k + 5 must be 7
⇒ 2k + 5 = 7
⇒ 2k = 2
∴ k = 1

Statistics are aggregates of______________
  • a)
    Methods
  • b)
    Calculations
  • c)
    Facts
  • d)
    Data
Correct answer is option 'D'. Can you explain this answer?

Utkarsh Joshi answered
Statistics is a branch of mathematics that deals with the collection, analysis, interpretation, presentation, and organization of data. Data is the raw information or facts that are collected from various sources, such as surveys, experiments, observations, or databases.
Statistics take this raw data and transform it into meaningful information by applying various methods and calculations. These methods and calculations include techniques for summarizing and describing data, making inferences or predictions, testing hypotheses, and drawing conclusions.
While methods and calculations are used in the field of statistics, they are tools or techniques employed to analyze and process the data. They are not the aggregates themselves. Similarly, facts are individual pieces of information, whereas statistics involve the systematic and structured analysis of data to draw broader conclusions or make generalizations.
Therefore, statistics are aggregates of data. Data forms the foundation of statistical analysis, and statistics provide insights, summaries, and interpretations of the data, enabling us to better understand and draw conclusions about the phenomena or populations being studied.

The shape of symmetrical distribution is _______
  • a)
    U shaped
  • b)
    Bell Shaped
  • c)
    J Shaped
  • d)
    None of these
Correct answer is option 'B'. Can you explain this answer?

Rajiv Reddy answered
A symmetrical distribution, also known as a normal distribution or Gaussian distribution, has a bell-shaped curve. This shape is characterized by a smooth, symmetric, and unimodal pattern. The curve is highest at the center and tapers off towards the tails on both sides.
The bell-shaped curve is defined by its mean, which represents the center of the distribution, and its standard deviation, which determines the width or spread of the curve. In a perfectly symmetrical distribution, the mean, median, and mode are all located at the center of the curve.
The bell-shaped curve is a fundamental concept in statistics and probability theory. It is commonly observed in natural and social phenomena, where numerous independent factors contribute to the observed values. Examples of variables that often exhibit a bell-shaped distribution include heights, weights, test scores, and measurement errors.
The bell-shaped curve is significant because it allows for the application of many statistical methods and hypothesis tests that assume normality. Additionally, it provides a reference distribution against which other distributions can be compared or standardized.
Therefore, the shape of a symmetrical distribution is bell-shaped.

Which mean is most affected by extreme values?
  • a)
    Geometric Mean
  • b)
    Harmonic Mean
  • c)
    Arithmetic mean
  • d)
    Trimmed Mean
Correct answer is option 'C'. Can you explain this answer?

Amit Kumar answered
The arithmetic mean is the most commonly used measure of central tendency. It is calculated by summing all the values in a dataset and dividing it by the total number of values. The arithmetic mean is sensitive to extreme values because it takes into account the magnitude of each data point.
When there are extreme values in a dataset, they can significantly affect the arithmetic mean. Extreme values have a disproportionate impact on the sum of the values used in the calculation, pulling the mean towards them.
For example, let's consider a dataset representing the incomes of a group of individuals. If there is an outlier with an extremely high income, the arithmetic mean will be greatly influenced by this value. The presence of the extreme value can inflate the arithmetic mean, making it higher than the typical income for the majority of individuals in the dataset.
To address the issue of extreme values affecting the arithmetic mean, alternative measures like trimmed mean or winsorized mean can be used. These methods involve removing or downweighting a certain percentage of extreme values before calculating the mean. This helps mitigate the impact of outliers on the resulting value.
In summary, the arithmetic mean is the measure of central tendency that is most affected by extreme values in a dataset.

What is the mean of first 99 natural numbers?
  • a)
    100
  • b)
    50.5
  • c)
    50
  • d)
    99
Correct answer is option 'C'. Can you explain this answer?

Anjali Rao answered
Suppose there are ‘n’ observations {x1, x2, x3,…, xn}
Calculation:
To find:  Mean of the first 99 natural numbers
As we know, Sum of first n natural numbers = 

A set of values is said to be relatively uniform if it has_______.
  • a)
    High Dispersion
  • b)
    Zero Dispersion
  • c)
    Low Dispersion
  • d)
    Negative Dispersion
Correct answer is option 'C'. Can you explain this answer?

Roshni Sarkar answered
Explanation:

To understand why a set of values is said to be relatively uniform if it has low dispersion, let's first define what dispersion means in the context of statistics. Dispersion refers to the degree of spread or variability in a dataset. It provides information about how much the values deviate from the central tendency (mean, median, or mode) of the dataset.

Low Dispersion

When a set of values has low dispersion, it means that the values are closely clustered around the central tendency. In other words, there is little variation or spread among the values in the dataset. This can be visualized by a narrow distribution or a small range of values.

Relatively Uniform

When we say that a set of values is relatively uniform, we mean that the values are evenly distributed or balanced. In this context, uniformity refers to an equal representation of values across the dataset. This can be visualized by a histogram or bar chart where each category or bin has a similar frequency or count.

Connection between Low Dispersion and Relatively Uniform

Now, the connection between low dispersion and relatively uniform becomes evident. If a set of values has low dispersion, it means that the values are closely clustered or have little variation. In this case, the values are likely to be evenly distributed or relatively uniform across the dataset.

Answer: Option C - Low Dispersion

Therefore, a set of values is said to be relatively uniform if it has low dispersion. This implies that the values are evenly distributed, and there is little variation or spread among them.

The measures of dispersion are changed by the change of__________.
  • a)
    Scale
  • b)
    Origin
  • c)
    Unit
  • d)
    None of these
Correct answer is option 'A'. Can you explain this answer?

Utkarsh Joshi answered
Measures of dispersion are statistical indicators that quantify the spread or variability of a dataset. They provide information about how the values in a dataset are dispersed or scattered around a central value, such as the mean or median.
When we talk about changing the scale, we refer to altering the magnitude or size of the values in the dataset. This can be achieved by multiplying or dividing the values by a constant factor.
When the scale of the data changes, the measures of dispersion will also change. Let's consider some common measures of dispersion:
  1. Range: The range is the simplest measure of dispersion and is calculated as the difference between the maximum and minimum values in a dataset. Changing the scale by multiplying or dividing the data will directly affect the range since it depends on the magnitude of the values.
  2. Variance and Standard Deviation: These measures quantify the average deviation of data points from the mean. Changing the scale will affect both the variance and the standard deviation because they involve squaring the differences between each data point and the mean.
  3. Interquartile Range (IQR): The IQR is a measure of dispersion that represents the range between the 25th and 75th percentiles of the dataset. Changing the scale will impact the IQR since it is calculated based on percentiles.
In all these cases, changing the scale of the data will result in corresponding changes in the measures of dispersion. The scale affects the absolute values and the spread of the data points, leading to different values for the measures of dispersion.
Therefore, the measures of dispersion are changed by the change of scale.

The extreme values in negatively skewed distribution lie in the_____.
  • a)
    Middle
  • b)
    Right Tail
  • c)
    Left Tail
  • d)
    Whole Curve
Correct answer is option 'C'. Can you explain this answer?

Niharika Shah answered
Negatively Skewed Distribution
In statistics, a negatively skewed distribution, also known as a left-skewed distribution, is a type of distribution where the tail on the left side of the distribution is longer or fatter than the tail on the right side. This means that the majority of the data points are concentrated towards the right side of the distribution, while the extreme values are located in the left tail.

Extreme Values
Extreme values, also known as outliers, are observations that are significantly different from the other values in a dataset. In a negatively skewed distribution, the extreme values are located in the left tail of the distribution.

Explanation
To understand why the extreme values in a negatively skewed distribution lie in the left tail, let's consider a hypothetical example. Suppose we have a dataset of exam scores ranging from 0 to 100, where most students scored between 70 and 90, but a few students scored very low (e.g., 20 or 30).

In this scenario, the distribution of exam scores would be negatively skewed because the tail on the left side (representing low scores) would be longer or fatter than the tail on the right side. The majority of students would fall within the range of 70 to 90, which is towards the right side of the distribution. However, the few students who scored very low (the extreme values) would be located in the left tail of the distribution.

The reason for this lies in the definition of skewness. Skewness measures the asymmetry of a distribution. In a negatively skewed distribution, the mean is less than the median, indicating that the tail on the left side is longer. This means that there are more extreme values in the left tail than in the right tail.

Therefore, the extreme values in a negatively skewed distribution lie in the left tail because the tail on the left side is longer or fatter, indicating a higher concentration of extreme values in that region.

Which of the following Measure of Averages is not based on all the values given in the data set___________
  • a)
    Arithmetic Mean
  • b)
    Geometric Mean
  • c)
    Median
  • d)
    Mode
Correct answer is option 'C'. Can you explain this answer?

Kavita Shah answered
The mode is the value or values that occur most frequently in the data set. It represents the most common observation(s) or the peak of the distribution.
Unlike the arithmetic mean, geometric mean, and median, the mode does not take into account all the values in the data set. Instead, it focuses solely on identifying the value(s) with the highest frequency.
For example, consider the following data set: 2, 4, 4, 6, 6, 6, 8, 8, 8. In this case, the mode is 6 because it occurs three times, which is more frequently than any other value. The mode is determined by counting the occurrences of each value, rather than considering the entire range of values.
On the other hand:
  • The arithmetic mean is calculated by summing all the values in the data set and dividing by the total number of values. It incorporates all the values in the calculation.
  • The geometric mean is calculated by taking the nth root of the product of n values. It also considers all the values in the data set.
  • The median represents the middle value when the data set is arranged in ascending or descending order. It includes all the values and identifies the middle observation(s).
Therefore, among the options given, the measure of average that is not based on all the values given in the data set is the mode. It focuses on identifying the most frequently occurring value(s) rather than considering all the values in the data set.

The distribution in which mean = 60 and mode = 50, will be ________
  • a)
    Symmetrical
  • b)
    Positive skewed
  • c)
    Negative skewed
  • d)
    None of these
Correct answer is option 'B'. Can you explain this answer?

Explanation:

To understand why the given distribution is positively skewed, let's first discuss what skewness is.

Skewness:
Skewness is a measure of the asymmetry of a probability distribution. It tells us whether the data is concentrated more on one side of the distribution or the other. Skewness can be positive, negative, or zero.

Positive Skewness:
A distribution is positively skewed when the tail on the right side of the distribution is longer or fatter than the left side. In other words, the mean is greater than the median and mode.

Mean, Median, and Mode:
The mean, median, and mode are measures of central tendency.

- Mean: The mean is the average of all the values in the distribution. It is calculated by summing up all the values and dividing by the total number of values.
- Median: The median is the middle value of the distribution when the data is arranged in ascending or descending order. It divides the data into two equal halves.
- Mode: The mode is the value that appears most frequently in the distribution.

Given Information:
- Mean = 60
- Mode = 50

Analysis:
In a positively skewed distribution, the mean is greater than the median and mode. Since the given mean is 60 and the mode is 50, we can conclude that the distribution is positively skewed.

Example:
Let's consider an example to understand this better. Assume we have the following dataset: 50, 50, 50, 60, 70, 80.

- Mean: (50 + 50 + 50 + 60 + 70 + 80) / 6 = 360 / 6 = 60
- Median: 55 (middle value)
- Mode: 50 (most frequently occurring value)

In this example, the mean is 60, which is greater than the median (55) and mode (50). Therefore, the distribution is positively skewed.

Conclusion:
Based on the given information, the distribution in which the mean is 60 and the mode is 50 is positively skewed.

Which of the following Measure of averages is affected by extreme (very small or very large) values in the data set?
  • a)
    Geometric Mean
  • b)
    Median
  • c)
    Arithmetic Mean
  • d)
    Harmonic Mean
Correct answer is option 'C'. Can you explain this answer?

The arithmetic mean, also known as the mean, is calculated by summing all the values in the data set and dividing by the total number of values. It represents the balance point or center of the data.
Extreme values in the data set can have a significant impact on the arithmetic mean because they contribute to the overall sum. If there are extreme values that are very small or very large, they can pull the mean towards those extreme values.
For example, consider the following data set: 1, 2, 3, 4, 1000. The arithmetic mean of this data set is (1 + 2 + 3 + 4 + 1000) / 5 = 202. If we remove the extreme value of 1000, the mean becomes (1 + 2 + 3 + 4) / 4 = 2.5. The presence of the extreme value significantly affects the arithmetic mean.
On the other hand, the geometric mean, median, and harmonic mean are less influenced by extreme values.
  • The geometric mean is calculated by taking the nth root of the product of n values. Since extreme values contribute to the product rather than the sum, their effect is mitigated.
  • The median represents the middle value when the data set is arranged in ascending or descending order. Extreme values do not impact the position of the middle value, making the median less affected by them.
  • The harmonic mean is calculated by taking the reciprocal of each value, finding their arithmetic mean, and then taking the reciprocal of that result. Extreme values have a smaller influence on the harmonic mean due to the reciprocal operations involved.
In summary, the measure of average that is affected by extreme values in the data set is the arithmetic mean. Extreme values can significantly alter the mean due to their contribution to the overall sum.

In a symmetrical distribution, mean is ____________ mode.
  • a)
    Equal to
  • b)
    Less than
  • c)
    Greater than
  • d)
    Not equal to
Correct answer is option 'A'. Can you explain this answer?

Deepak Kapoor answered
In a symmetrical distribution, the data points are evenly distributed around a central value, resulting in a mirror image when the distribution is folded along its center. This means that the left and right sides of the distribution are symmetrically balanced.
Given this symmetry, the mean, median, and mode will all have the same value in a perfectly symmetrical distribution.
The mean represents the average value of the dataset, calculated by summing all the values and dividing by the total number of values. The mode represents the most frequently occurring value in the dataset.
Since a symmetrical distribution has equal frequencies on both sides of the mode, the mode will be the value that occurs most often and therefore represents the highest peak in the distribution.
In a symmetrical distribution, the balance of the data on both sides of the mode implies that the mean will be the same as the mode.
Therefore, in a symmetrical distribution, the mean is equal to the mode.

Which among the following are the measures of Central Tendency or Measures of Location?
A. Mean
B. Range
C. Mode
D. Median
E. Variance
Choose the most appropriate answer from the options given below:
  • a)
    A, B, C and E only
  • b)
    C, D and E only
  • c)
    A, C and D only
  • d)
    B, C and D only
Correct answer is option 'C'. Can you explain this answer?

Ojasvi Mehta answered
A. Mean: The mean is a measure of central tendency that represents the average value of a set of data. It is calculated by summing all the values in the dataset and dividing by the number of observations.
C. Mode: The mode is a measure of central tendency that represents the most frequently occurring value in a dataset. It is the value that appears with the highest frequency.
D. Median: The median is a measure of central tendency that represents the middle value in a dataset when it is arranged in ascending or descending order. It divides the dataset into two equal halves.
B. Range: The range is not a measure of central tendency. It is a measure of dispersion that represents the difference between the maximum and minimum values in a dataset. It provides information about the spread of the data but does not give insight into the central tendency.
E. Variance: The variance is not a measure of central tendency. It is a measure of dispersion that quantifies the spread of data points around the mean. It provides information about the variability of the dataset but does not directly represent the central tendency.
Therefore, the measures of central tendency or measures of location are A. Mean, C. Mode, and D. Median.

Which of the following is not a value of central tendency?
  • a)
    Mode
  • b)
    Median
  • c)
    Mean
  • d)
    Standard deviation
Correct answer is option 'D'. Can you explain this answer?

Suresh Reddy answered
  • The mode represents the most frequently occurring value,
  • The median is the middle value when the data is arranged in ascending or descending order, and
  • The mean is the average value calculated by summing all values and dividing by the number of values.
  • Mode, median, and mean, are all measures of central tendency.
  • The standard deviation is not a value of central tendency.
  • It is a measure of dispersion or variability in a dataset.
∴ Standard deviation is the required answer.

The most repeated (popular) value in a data set is called_______.
  • a)
    Median
  • b)
    Mean
  • c)
    Mode
  • d)
    Geometric Mean
Correct answer is option 'C'. Can you explain this answer?

Kavita Shah answered
The mode is a measure of central tendency that represents the value or values in a data set that occur most frequently. It is the observation(s) with the highest frequency.
In other words, the mode represents the most popular or commonly occurring value in the data set. It is the value that appears more often than any other value.
For example, consider the following data set: 3, 5, 5, 7, 7, 7, 9, 9, 9. In this case, the mode is 7 and 9 because they both occur three times, which is the highest frequency. Both 7 and 9 are the most repeated values in the data set.
The mode is particularly useful when you want to identify the value(s) that have the highest occurrence or when you are interested in the most typical observation in the data set.
On the other hand:
  • The median represents the middle value when the data set is arranged in ascending or descending order.
  • The mean, also known as the arithmetic mean, is calculated by summing all the values and dividing by the total number of values.
  • The geometric mean is calculated by taking the nth root of the product of n positive values.
Therefore, among the options given, the most repeated (popular) value in a data set is called the mode. It represents the value(s) that occur with the highest frequency in the data set.

The values of mean, median and mode can be________.
  • a)
    Some time equal
  • b)
    Never equal
  • c)
    Always equal
  • d)
    None of these
Correct answer is option 'A'. Can you explain this answer?

Arun Khatri answered
The mean, median, and mode are three measures of central tendency that provide insights into the location or center of a dataset. While they can be equal in certain situations, it is not a universal rule that they will always be equal.
In some distributions, the mean, median, and mode may have the same value. This occurs in perfectly symmetrical distributions, such as the normal distribution, where the data points are evenly distributed around a central value.
However, in many distributions, the mean, median, and mode can have different values. This is especially true in distributions that are skewed or have multiple modes.
For example, in a positively skewed distribution, the mean will be greater than the median, and both of these may differ from the mode. Similarly, in a negatively skewed distribution, the mean will be less than the median, and the mode may be different as well.
It's important to consider the shape and characteristics of the specific distribution when determining the relationship between the mean, median, and mode. While they can be equal in some cases, it is not a guaranteed or universal outcome.
Therefore, the values of mean, median, and mode can be some times equal, but it is not always the case.

The middle value of an ordered array of numbers is the________.
  • a)
    Mode
  • b)
    Mean
  • c)
    Median
  • d)
    Mid-Point
Correct answer is option 'C'. Can you explain this answer?

Arun Khatri answered
The median is a measure of central tendency that represents the middle value in a dataset when the values are arranged in ascending or descending order. It divides the dataset into two equal halves, with an equal number of values above and below it.
To find the median in an ordered array of numbers, you simply identify the value located at the center position. If the total number of values in the array is odd, there will be one middle value that is the median. If the total number of values is even, the median is typically calculated as the average of the two middle values.
For example, consider the ordered array [2, 4, 7, 9, 12]. The middle value is 7, so 7 would be the median of this array.
The median is particularly useful when dealing with skewed distributions or datasets with outliers. Unlike the mean, which is influenced by extreme values, the median provides a robust measure of central tendency that is less affected by outliers.
In summary, the middle value of an ordered array of numbers is the median. It represents the central value that divides the dataset into two equal halves when the values are arranged in ascending or descending order.

The Geometric Mean of -2, 4, 03, 6, 0 will be__________.
  • a)
    -3
  • b)
    0
  • c)
    Cannot be Computed
  • d)
    None of these
Correct answer is option 'C'. Can you explain this answer?

Kavita Shah answered
The geometric mean is calculated by taking the nth root of the product of n positive values. However, it is important to note that the geometric mean is only defined for positive values. It cannot be calculated when negative values or zero are present in the dataset.
In the given values -2, 4, 03, 6, 0, we have a negative value (-2) and a zero (0). Since the geometric mean cannot be computed with negative values or zero, we cannot find the geometric mean for this dataset.
Therefore, the geometric mean of -2, 4, 03, 6, 0 is cannot be computed.

If mean, median, and mode are all equal then distribution will be________
  • a)
    Positive Skewed
  • b)
    Negative Skewed
  • c)
    Symmetrical
  • d)
    None of these
Correct answer is option 'C'. Can you explain this answer?

Arun Khatri answered
In a symmetrical distribution, the data points are evenly distributed around a central value, resulting in a mirror image when the distribution is folded along its center. This means that the left and right sides of the distribution are symmetrically balanced.
When the mean, median, and mode are equal, it indicates that the distribution is perfectly balanced and there is no skewness or bias towards either side. Each side of the distribution has an equal number of data points, resulting in a symmetrical shape.
A symmetrical distribution is also known as a normal distribution or Gaussian distribution. It follows a characteristic bell-shaped curve, where the mean, median, and mode are all located at the center of the distribution.
In summary, if the mean, median, and mode are all equal in a distribution, it indicates that the distribution is symmetrical. The balance of the data points on both sides of the distribution implies a lack of skewness or bias towards either side.

________ is the measure of average which can have more than one value.
  • a)
    Mean
  • b)
    Median
  • c)
    Harmonic Mean
  • d)
    Mode
Correct answer is option 'D'. Can you explain this answer?

Arun Khatri answered
The mode is the value or values in a dataset that occur most frequently. In some cases, there may be multiple values with the same highest frequency, resulting in multiple modes. When this occurs, the dataset is described as having multiple modes or being multimodal.
For example, consider a dataset of exam scores: 75, 80, 85, 90, 90, 95, 95, 95. In this dataset, the value 95 occurs three times, which is the highest frequency. Therefore, the mode(s) of this dataset is 95. This dataset is said to have a mode of 95.
However, in some cases, a dataset may not have any repeated values, or all values may have the same frequency. In such cases, the dataset is considered to have no mode.
On the other hand, the mean, median, and harmonic mean are measures of central tendency that typically yield a single value. The mean is the average calculated by summing all the values and dividing by the total number of values. The median is the middle value when the dataset is arranged in ascending or descending order. The harmonic mean is a type of average used for rates or ratios.
Therefore, the measure of average that can have more than one value is the mode.

In symmetrical distribution, mean, median, and mode are__________
  • a)
    Equal
  • b)
    Different
  • c)
    Zero
  • d)
    None of these
Correct answer is option 'A'. Can you explain this answer?

Deepak Kapoor answered
In a perfectly symmetrical distribution, the data points are evenly distributed around a central value, resulting in a mirror image when the distribution is folded along its center. This means that the left and right sides of the distribution are symmetrically balanced.
Given this symmetry, the mean, median, and mode will all have the same value in a perfectly symmetrical distribution.
The mean represents the average value of the dataset, calculated by summing all the values and dividing by the total number of values. The median represents the middle value when the dataset is arranged in ascending or descending order. The mode represents the value or values that occur most frequently in the dataset.
In a perfectly symmetrical distribution, the balance of the data on both sides of the mode implies that the mean and median will be located at the center of the distribution, which is also where the mode will be located.
Therefore, in a symmetrical distribution, the mean, median, and mode are equal. They all have the same value, reflecting the central tendency and balance of the data points in the distribution.

Chapter doubts & questions for Statistics - CSAT Preparation 2025 is part of UPSC CSE exam preparation. The chapters have been prepared according to the UPSC CSE exam syllabus. The Chapter doubts & questions, notes, tests & MCQs are made for UPSC CSE 2025 Exam. Find important definitions, questions, notes, meanings, examples, exercises, MCQs and online tests here.

Chapter doubts & questions of Statistics - CSAT Preparation in English & Hindi are available as part of UPSC CSE exam. Download more important topics, notes, lectures and mock test series for UPSC CSE Exam by signing up for free.

CSAT Preparation

210 videos|138 docs|138 tests

Top Courses UPSC CSE

Related UPSC CSE Content

Signup to see your scores go up within 7 days!

Study with 1000+ FREE Docs, Videos & Tests
10M+ students study on EduRev