Table of contents | |
What is Skewness? | |
Assessing Skewness in Data | |
Types of Skewness | |
Measurement of Skewness | |
Interpreting Skewness |
Skewness can be defined as a statistical measure that describes the lack of symmetry or asymmetry in the probability distribution of a dataset. It quantifies the degree to which the data deviates from a perfectly symmetrical distribution, such as a normal (bell-shaped) distribution. Skewness is a valuable statistical term because it provides insight into the shape and nature of a dataset's distribution. For example, understanding whether a dataset is positively or negatively skewed can be important in various fields, including finance, economics, and data analysis, as it can impact the interpretation of data and the choice of statistical techniques.
To determine the skewness of a dataset, several statistical tests and methods can be utilized. These approaches help identify whether a dataset exhibits positive skewness, negative skewness, or is approximately symmetric. Here are some common techniques:
Visual Inspection: One of the simplest methods to evaluate skewness is by plotting a histogram or density plot of the data. If the plot shows a tail extending to the left, the distribution is negatively skewed. Conversely, if the tail extends to the right, the distribution is positively skewed. A symmetric plot suggests that the data is approximately normally distributed.
Skewness Coefficient (Pearson’s First Coefficient of Skewness): This numerical measure assesses skewness when the mean and mode are not the same. It is calculated using:
Quartile Measure: Skewness can also be assessed by examining the distances between quartiles:
Where Q3 and Q1 represent the third and first quartiles, respectively. Positive values indicate positive skewness, while negative values indicate negative skewness.Positive Skewness (Right Skew): In a positively skewed distribution, the right tail (larger values) is longer than the left tail (smaller values). Most data points are concentrated on the left, with some extreme values on the right. For positively skewed data:
Mean > Median > Mode
Examples include income distribution (where most people earn moderate incomes but a few earn very high incomes), exam scores (where most students score within a certain range but a few score exceptionally high), and stock market returns (where most days have modest returns but a few days have very high returns).
Negative Skewness (Left Skew): In a negatively skewed distribution, the left tail (smaller values) is longer than the right tail (larger values). Most data points are clustered on the right, with a few extreme values on the left. For negatively skewed data:
Mean < Median < Mode
Examples include test scores on an easy exam (where most students score high but a few score very low), retirement age (where most people retire at a typical age but a few retire exceptionally early), and gestational age at birth (where most babies are born at full term but a few are born prematurely).
Zero Skewness (Symmetrical Distribution): A distribution with zero skewness is perfectly symmetrical, meaning the mean, median, and mode are all equal. In such distributions, data points are evenly spread around the central value.
Example: A perfectly balanced dataset with equal frequencies for all values.
Karl Pearson’s Measure of Skewness quantifies the asymmetry of a dataset’s distribution using its mean, median, and standard deviation. This dimensionless number provides insights into the shape and direction of the skewness in the data. It is useful across various fields of statistics and data analysis, guiding researchers and analysts in understanding the skewness, which can influence further modeling and analytical decisions.
Formula for Karl Pearson’s Skewness
Skewness Calculation Using Mean and Mode:
Skewness = Mean − Mode
Coefficient of Skewness:
Based on Mean and Median:
Based on Mean and Mode:
Where:
Interpretation of Karl Pearson’s Skewness Coefficient:
Example of Karl Pearson’s Measure
Calculate Pearson’s skewness coefficient for a dataset of exam scores: 85, 88, 92, 94, 96, 98, 100, 100, 100, 100.
Sol:
Step 1: Calculation of Mean
Mean = 95.3
Step 2: Calculation of Median
Since there are 10 data points, the median is the average of the 5th and 6th values when sorted in ascending order:
Median = 97
Step 3: Calculation of standard deviation.
Thus, σ=√26.81
σ = ~5.
Step 4: Calculation of mode
It is clear from the data set that 100 is the most frequently occurring value in the data. Hence, mode of given data is 100.
Step 5: Substitute the values in the formulae
A. With respect to Mean and Median
Sk = -1.02
B. With respect to Mean and Mode
Sk = -0.94
Interpretation: The negative skewness coefficient (Sk) suggests a slight negative skew in the distribution of exam scores. This means the distribution has a longer tail on the left side, with most scores clustering on the right side of the mean.
Bowley’s Skewness Coefficient, named after British economist Arthur Lyon Bowley, assesses the skewness or asymmetry of a distribution using quartiles. Unlike other measures of skewness that rely on moments or deviations from the mean, Bowley’s measure uses the quartiles to provide an intuitive and straightforward understanding of skewness. This coefficient is particularly useful for analyzing datasets that may not follow a normal distribution or when a more robust measure of skewness is needed.
Example of Bowley’s Measure:
Calculate Bowley’s Measure of Skewness for the following dataset representing the ages of a group of people in a sample: 20, 24, 28, 32, 35, 40, 42, 45, 50.
Q1 = 26
Step 3: Calculate the third quartile (Q3)
To find Q3, consider the values to the right of the median: 40, 42, 45, 50.
Q3 = 43.5
Step 4: Substitute the above values in the formula
B = -0.02
Interpretation: Since B is negative (B < 0), the distribution is negatively skewed (left-skewed). This means that the tail of the distribution is longer on the left side, indicating that there may be outliers or high values on the right side of the data.
Kelly’s measure of skewness is a way to quantify the degree of skewness in a distribution by comparing the values of certain percentiles (typically the 10th, 50th, and 90th percentiles) or deciles (10th, 20th, …, 90th percentiles) of the dataset. Specifically, it involves comparing the difference between the median (50th percentile) and the average of the 10th and 90th percentiles (or deciles) to assess the skewness of the data.
Skewness as per Kelly’s Measure
Coefficient of Skewness as per Kelly’s Measure
Coefficient of Kelly’s Measure
Example of Kelly’s Measure:
Calculate Kelly’s Coefficient of Skewness for the following data: 5, 7, 8, 9, 10, 12, 15, 16, 18, 20.
Sol:
Step 1: Find the 10th Percentile
To find the 10th percentile, we need to rank the data in ascending order and find the value below which 10% of the data falls. In this dataset, the 10th percentile corresponds to the value at position 1 since 10% of 10 data points is 1. So, the 10th percentile is 5.
P10 = 5
Step 2: Find the 50th Percentile (Median)
Since there are 10 data points, the median is the average of the 5th and 6th values when sorted in ascending order
P50 = 11
Step 3: Find the 90th Percentile
To find the 90th percentile, you need to identify the value below which 90% of the data falls. In this dataset, the 90th percentile corresponds to the value at position 9 since 90% of 10 data points is 9. So, the 90th percentile is 18.
P90 = 18
Step 4: Substitute the values in the formula.
SKL = 0.07
Interpretation: A positive Kelly’s Skewness Coefficient indicates a slight positive skew in the distribution, meaning there is a longer tail on the right side. This suggests that some data points on the right are relatively larger compared to the majority of the data points.
Interpreting skewness involves understanding both the direction and magnitude of the skew. Here’s how to interpret skewness:
Direction of Skewness:
Negative Skewness (Left Skewed): When skewness is negative, the distribution is skewed to the left. In a left-skewed distribution:
Positive Skewness (Right Skewed): Positive skewness indicates a rightward skew. In a right-skewed distribution:
Zero Skewness (Symmetric): A skewness value close to zero suggests a symmetric distribution, where data is evenly spread around the mean. This indicates no skewness.
Skewness is an important statistical measure that reveals the asymmetry of a distribution. It can be positive, negative, or zero, indicating the direction and extent of the skew. Skewness measures, including Pearson’s coefficients and the moment coefficient, quantify this asymmetry. Understanding and interpreting skewness is crucial for analyzing data distributions and making well-informed decisions.
235 docs|166 tests
|
1. What is Skewness? |
2. How is Skewness assessed in data? |
3. What are the types of Skewness? |
4. How is Skewness measured? |
5. How can Skewness be interpreted? |
|
Explore Courses for UGC NET exam
|