MEASURES OF CENTRAL TENDENCY
Central tendency is the middle point of a distribution and measures of central tendency means measuring sets of data in terms of the central location of the data in a data set. Accordingly, measures of central tendency include three important tools – mean (average), median and mode. Measures of central tendency are generally calculated among ungrouped and grouped data and the formulae for the same would be accordingly different. Please refer to slides for relevant formulae
Ungrouped data
Ungrouped data is the raw data that is not organized into groups and consists of list of numbers. For example, daily prices of stocks listed on stock markets like the Bombay Stock Exchange (BSE) or National Stock Exchange (NSE), or monthly indices of Wholesale Price Index (WPI), monthly wages or workers, etc.
Example
The marks of seven students in an economics test out of a total of 20 marks are
Students Roll No. | Marks in Economics (out of 20) |
1 | 19 |
2 | 18 |
3 | 15 |
4 | 13 |
5 | 17 |
6 | 12 |
7 | 11 |
Grouped data
Data that has been organised into groups as a frequency distribution is grouped data. Large sets of ungrouped data of monthly (e.g. 1000 months) WPI can be grouped into different class indices.
Example:
The frequencies distribution of marks obtained by 200 students in an economics test out of 20
Marks Interval | Frequencies of Students |
1 – 5 | 59 |
6 – 10 | 39 |
11 – 15 | 42 |
16 – 20 | 60 |
Arithmetic Mean (or Average)
Arithmetic mean is the average of a numerical set and is found by dividing the sum of a set of numbers by the total number of members in the set. A set of data can be ungrouped data and grouped data.
Median
The value of a numerical set that equally divides the number of the values that are larger and smaller is the median. Prior to calculating the median of an ungrouped data, the data should be altered in an ascending order.
Mode
The value of a numerical set that appears with the greatest frequency is known as the mode.
Relationship between Mean-Median-Mode
When the mean, median and mode of derived from a data set coincide (mean=median=mode), indicates that the distribution of the data is symmetric. Symmetric data indicates that the data is equally balanced. Qualitative meaning of symmetry could be one’s reflection in the mirror that depicts the exact and direct display of one looks, continuous rotation of a giant wheel, etc. Similarly, symmetric data (in statistics) is considered to be reflective of complete information without any differences, fluctuations or changes.
The relationship between mean median mode can also be considered to determine whether the data is asymmetric or skewed. When mean is greater than median, which is further greater than mode, then the distribution of data is considered to be positively skewed and points in the positive direction (i.e. to the right). For example, if the test was difficult and almost everyone performed poorly in the class, then the resulting distribution would most likely be positively skewed
When the mean is less than the median, which is further less than mode, then the distribution of data is considered to be negatively skewed and points in the negative direction. For example, in an essay test most performed well while very few performed poorly then the distribution would point towards the negative direction.
For a moderately skewed distribution, the empirical relationship between mean, median and mode is: Mean – Mode = 3(Mean – Median)
MEASURES OF DISPERSION
Dispersion refers to variations across a data set. Accordingly, measures of dispersion is related to determining whether the distribution of data vary or differ from one another. The basis of calculating measures of dispersion is through determining the measures of central tendency and relevant tools considered are – Range, Interquartile Range, Variance and Standard Deviation.
Range
The quickest measure of dispersion is the range, which is calculated as the difference between maximum (highest) and minimum (lowest) values in a data set. Range however, ignores the distribution of other data in a data set and provides a distorted view or incomplete information about the data.
Interquartile range
Interquartile range is an extension of the range that considers quartiles within a data set. Quartiles of a data set are three points that divide the data set into four parts. The three values are first quartile or Q_{1} which mainly represent the initial 25% of the data set, second quartile (or median) or Q_{2}, which represents the initial 50% of the data set and third quartile or Q_{3}, which represents the initial 75% of the data set. Interquartile range is the difference between Q_{3 }andQ_{1.} The interquartile range summarizes the spread or variation of values in a data set especially around the median. However, like range it provides incomplete information about the data
Example (with even number of observations):
Data set: 59, 60, 64, 67, 68, 69, 70, 71, 72, 73
Data is in an ascending order
Step 1: Split the data into 2 parts: 1st part is between 59 and 68, i.e. 59, 60, 64, 67, 68 and 2nd part is between 69 and 73, i.e. 69, 70, 71, 72, 73
Step 2: Identify the mid-point in the 1st part which is 64 and the mid-point in the 2nd part which is 71.
Step 3: Find out Q_{1} = 64 and Q3 = 71
Example (with odd number of observations):
Data set: 6, 47, 49, 15, 42, 41, 7, 39, 43, 40, 36
Data in ascending order: 6, 7, 15, 36, 39, 40, 41, 42, 43, 47, 49
Step 1: Find out the median = 40
Step 2: Find out Q_{1} = 25.5 [The mid-point between 1^{st} observation 6 and the median 40 is (15+36)/2]
Step 3: Find out Q_{3} = 42.5 [The mid-point between median 40 and the last observation 49 is (42+43)/2]
Variance and Standard Deviation
The variance and standard deviation describe how far or close the numbers or observations of a data set lie from the mean (or average). Variance is the measure of the average distance between each of a set of data points and their mean value; equal to the sum of the squares of the deviation from the mean value. Standard deviation though calculated as the square root of the variance is the absolute value calculated to indicate the extent of deviation from the average of the data set. For example, is the average wages earned by a group of 100 workers equals Rs 20000 per month and the standard deviation calculated was 5000, then it implies that there are workers whose incomes lie above or below (vary) from the average wages by Rs 5000. The standard deviation in this example was measured to determine the level of disparity in wages among 100 workers. To determine the deviation in wages among each of the workers we calculate the standard score which is the difference between the wage of one workers and average wage across all workers, the whole divided by the standard deviation. For example, if a worker’s wage was Rs 17000, then the standard score would be minus 0.6, which indicates the worker’s wage of Rs 17000 deviates from the mean by (-0.6) multiplied by standard deviation of Rs 5000 which equals minus 3000 (or varies less from the average by Rs. 3000).
Chebyshev’s Theorem
Please refer to the slides for the explanation
Coefficient of Variation (CV)
Coefficient of variation is a relative measure to calculate and compare two different settings that has two separate means and standard deviations and is calculated as the standard deviation divided by the mean and the whole multiplied by 100. Thus, CV measures the amount of variation in data groups that have different means. Suppose, a teacher wishes to evaluate the relative variation in marks (out of 100 marks) in “Business Environment” subject of two classes of students – Class A and Class B. Class A’s average marks are 40 and standard deviation is 5, whereas Class B’s average marks are 70 and standard deviation is 7.
Coefficient of Variation for Class A = (5/40)*100 = 12.5%
Coefficient of Variation for Class B = (7/70)*100 = 10.0%
Class B has a less relative variation in marks than Class A because the average marks of Class B is more than Class A.
Problems related to measures of central tendency and measures of dispersion
1. Mr. X. purchased equity shares of a company in 4 successive months as given below. Find the average price per share
Month | No. of Shares | Price per Share (in Rs) |
Dec-91 | 100 | 200 |
Jan-92 | 150 | 250 |
Feb-92 | 200 | 280 |
Mar-92 | 125 | 300 |
2. The frequency distribution of weights in grams of mangoes of a particular variety is given below. Calculate the arithmetic mean
Weights in (grams) | Number of Mangoes |
410-419 | 14 |
420-429 | 20 |
430-439 | 42 |
440-449 | 52 |
450-459 | 45 |
460-469 | 18 |
470-479 | 7 |
3. The following table gives the daily profits (in Rs) of 195 shops of a town. Calculate mean, median and mode.
Profits Interval | Frequencies of shops |
50-60 | 15 |
60-70 | 20 |
70-80 | 32 |
80-90 | 35 |
90-100 | 33 |
100-110 | 22 |
110-120 | 20 |
120-130 | 10 |
130-140 | 8 |
4. For a moderately skewed distribution, the median price of men’s shoes is Rs 380 and the modal price is Rs 350. Calculate the mean price of shoes
5. Calculate the mean and standard deviation for the following series of workers’ wages
Weekly Wages | No. of Workers |
200 -249 | 7 |
250-299 | 13 |
300-349 | 15 |
350-399 | 24 |
400-449 | 36 |
450-499 | 50 |
500-549 | 25 |
550-599 | 10 |
600-649 | 8 |
650-699 | 6 |
700-749 | 4 |
750-799 | 2 |
6. Bassart Electronics is considering employing one of two training programs. Two groups were trained for the same task. Group 1 was trained by program a, group 2 by program b. For the first group the times required to train the employees had an average of 32.11 hours and a variance of 68.09. In the second goup, the average was 19.75 hours and the variance was 71.14. Which training program has less relative variability in its performance?