Table of contents |
|
Introduction |
|
Raw Data |
|
Classification of Data |
|
Variables: Continuous and Discrete |
|
What is a Frequency Distribution? |
|
Frequency Curve |
|
Bivariate Frequency Distribution |
|
Once data is collected, it needs to be organised systematically so that it can be easily understood and analysed. This process is called classification — grouping data into categories based on specific criteria. Just like a junk dealer sorts items or a student organises books by subject, classification brings order, saves time, and makes retrieval simple. In this chapter, we explore how to organise raw data into meaningful groups for effective statistical analysis.
Chronological Classification
In this method, data is classified based on geographical locations such as countries, states, cities, or districts. For example, categorizing the population of different states in India.
Spatial Classification
This classification is used when data cannot be measured numerically but can be sorted based on qualities or attributes like nationality, literacy, gender, religion, etc. For example, classifying a population by gender (male, female) and then further by marital status (married, unmarried).
This method is used for characteristics that can be measured numerically, such as height, weight, age, income, or marks. The data is grouped into classes to make analysis easier.
A variable is something that can change or vary and can be measured or counted. It’s like a container that holds different values depending on the situation. For example, your age, height, income, and even the number of chocolates in a box are all variables because their values can change.
Variables are broadly classified into two types: Continuous Variables and Discrete Variables.
These variables can take any numerical value within a certain range. They can be whole numbers, fractions, or even decimals. Their values change smoothly without any jumps.
Examples:
These variables can only take specific, separate values, usually whole numbers or certain fractions. They change in jumps or steps rather than smoothly.
Examples:
A frequency distribution is a way of organizing and summarizing raw data to make it easier to understand. It helps us see how often different values appear in a dataset.
For example, if we have a list of marks scored by students in a math test, we can group these marks into classes like:
0–10, 10–20, 20–30, and so on. Then, we count how many marks fall into each class.
Let's understand important terms related to this concept:
The two ends of a class.
A Frequency Curve is a graphical representation of a frequency distribution. It shows how the frequencies of different classes are spread out over a range of values.
Frequency Curve
Once we organize data into classes, we don't have to work with individual numbers anymore. Instead, we use the class marks to represent each class. It makes analyzing data much simpler and clearer.
When making a frequency distribution, we need to think about these five important points:
Equal or Unequal Sized Class Intervals:
Should all classes be of the same size (e.g., 0-10, 10-20) or different sizes?
Number of Classes:
How many classes should we make? This depends on the range of the data and how detailed we want the grouping to be.
Size of Each Class:
What should be the width of each class? For example, should each class cover a range of 10 marks or 20 marks?
Determining Class Limits:
Deciding the lower and upper limits of each class. For example, if we choose the class 20–30, then 20 is the lower limit and 30 is the upper limit.
Calculating Frequencies:
Counting how many data points fall into each class.
There are two situations where unequal class intervals are used:
1. Wide Range of Data:
2. Concentrated Data:
In all other cases, equal-sized intervals are used.
Class limits are the boundaries that define each class in a frequency distribution. They should be:
1. Inclusive Class Intervals
2. Exclusive Class Intervals
Example
Let's understand the difference between Inclusive and Exclusive Class Intervals using an example of marks scored by students in a test.
Here, both the lower and upper limits are included.
Suitable for discrete variables where values are exact numbers (like marks without fractions).
Example (Marks ranging from 0 to 100 with 10 class intervals):
0 – 10 (Includes both 0 and 10)
11 – 20 (Includes both 11 and 20)
21 – 30 (Includes both 21 and 30)
...
91 – 100 (Includes both 91 and 100)
Special Cases
In the exclusive method, we need to decide how to handle values that fall on the boundary (e.g., 10, 20, 30, etc.).
Case of Lower Limit Excluded: Values like 10 or 30 are included in the previous interval (0 to 10, 20 to 30).
Case of Upper Limit Excluded (Most Common): Values like 10 or 30 are included in the next interval (10 to 20, 30 to 40).
When using the Inclusive Method for continuous data, there is often a gap between class limits. For example, if the first class is 0–899 and the second class is 900–1799, there's a gap of 1 between 899 and 900. To ensure continuity, we need to adjust the class intervals as follows:
1. Find the Difference:
2. Divide the Difference by Two:
3. Adjust the Lower Limits:
4. Adjust the Upper Limits:
After adjusting the class intervals, the formula to find the class mark becomes:
The frequency of an observation is just the number of times it appears in the data.
Tally marking is a way to keep track of numbers using a simple counting method.
The usual way to represent tally marks is in groups of five lines:
Tally marks were commonly used in the past for counting things, especially personal belongings.
For example, it was helpful for counting domestic animals like goats and cows, which could be hard to track.
Example
Imagine you are a teacher with 100 students who just completed a math test. You want to see how the marks are spread out. The marks range from 0 to 100.
Prepare the Class Intervals (Exclusive Form):
Marking the Tally:
/
) in the corresponding class interval.////
.Count the Tallies:
When we organize data into a frequency distribution table, we simplify and summarize the raw data. However, this also causes a loss of detail.
Why is Information Lost?
Example
Let's say we have the following scores: 25, 25, 20, 22, 25, 28.
Why is This Important?
Sometimes, dividing data into equal class intervals may not effectively represent the data. In such cases, we use unequal class intervals to make the representation more meaningful.
Why Use Unequal Class Intervals?
1. Concentration of Data:
2. Deviation from Class Marks:
How to Create Unequal Classes:
A Frequency Array is used to organize data related to discrete variables. Discrete variables are those that take specific, separate values and not intermediate fractional values between them.
The table shows the Size of Households and the corresponding Number of Households.
Univariate Frequency Distribution
A statistical data series that shows the frequency of only one variable is referred to as univariate frequency distribution. This type of frequency distribution displays the frequency distribution of a single variable, such as the income of people or marks scored by students.
Multivariate Distribution
Multivariate distributions represent the correlation between at least two estimates and the relationships among them. For each univariate distribution with one random variable, there is a broader multivariate distribution. For example, the univariate normal distribution has a broader counterpart, the multivariate normal distribution, which is the most commonly used model for examining multivariate data. However, there are other types of multivariate distributions as well, including the multivariate lognormal distribution, the multivariate binomial distribution, etc.
229 videos|191 docs|158 tests
|
1. What is raw data and how is it different from organized data? | ![]() |
2. What are the different types of data classifications? | ![]() |
3. What are class limits and how do they relate to class intervals? | ![]() |
4. What is the significance of attributes in data classification? | ![]() |
5. How does chronological classification aid in data analysis? | ![]() |