These facts or figures, which are numerical or otherwise, collected with a definite purpose are called data.
Therefore, Data is a collection of facts, such as numbers, words, measurements, observations, etc.
Suppose, the following image shows the performance of the Indian Cricket team in tests in the last 2 years against major test playing nations.Types of data based on the collection of facts
Qualitative data: It is descriptive data. For examples:
Quantitative data: It is numerical information. For examples:
Continuous data: It does not have a fixed value but a range of data. For example, in the figure below, the height of the 3 persons lie between 3 feet to 5 feet.
This involves the study of the collection, analysis, interpretation, presentation, and organization of data. In other words, it is a mathematical discipline to collect, summarize data.
Types of data on the basis of the collection of data
Primary data: It is the data that is collected by a researcher from first-hand sources, using methods like surveys, interviews, or experiments. For example, The following data is collected by a student for his/her thesis for the research project.
Secondary data: It is the data that has already been collected by someone, and then it is updated, tailored or modified for a specific purpose.
For example, in a school, the class-teachers of respective sections record attendance on a daily basis.
This data recorded by the class- teacher is an example of primary data. On a given day, the principal of the school asks for the attendance of all students of each section, to collate the total number of students present in the school on a given day. This data collected by the school principal is an example of secondary data.
Such an arrangement is called the presentation of data. The raw data can be arranged in any of the following ways:
When the raw data is arranged in ascending or descending order, then the data is called an array or arrayed data.
Suppose that the marks obtained by 10 students of class 9th in a mathematics test, out of 50 marks according to their roll numbers be: 39, 45, 33, 19, 21, 41, 21, 19, 40, 41.
The data in this form are called raw data or ungrouped data. The above raw data can be arranged according to in serial order (roll number) as follows:Suppose, we want to find out who scored maximum or minimum marks in the test. The data in the given form does not give us a clear understanding of the performance of the students.
If we arrange the marks scored in ascending or descending order, it gives us a better understanding of the given data. Also, we can easily identify the minimum and maximum values in the data.
In ascending order, the data looks as follows:
19, 19, 21, 21, 33, 39, 40, 41, 41, 45. ⇒ Array or arrayed data.
In descending order, the data looks as follows:
45, 41, 41, 40, 39, 33, 21, 21, 19, 19. ⇒ Array or arrayed data.
From the above array or arrayed data, we can easily identify that the minimum marks are 19 and maximum marks are 45.
Frequency distributions are of two types: Ungrouped frequency distribution table and Grouped frequency distribution table.
Let us consider a large data like the marks obtained (out of 100 marks) by 40 students of class 9th of a school.
50, 60, 70, 21, 19, 33, 39, 21, 92, 88, 80, 70, 72, 19, 40,
41, 92, 50, 50, 56, 60, 70, 60, 60, 88, 41, 45, 92, 88, 95,
70, 40, 39, 33, 19, 21, 41, 45, 70, 80.
If we arrange them in ascending order, it gives us a slightly better picture.
19, 19, 19, 21, 21, 21, 33, 33, 39, 39, 40, 40, 41, 41,
41, 45, 45, 50, 50, 50, 56, 60, 60, 60, 60, 70, 70, 70, 70,
70, 72, 80, 80, 88, 88, 88, 92, 92, 92, 95.
But, in the data arranged above, we cannot easily find how many students scored 41 marks or 60 marks. Again, we have to count that.
To make data easily understandable and clear, we can tabulate data of 40 students as shown below.
In the above table, we can observe how many students scored the same marks but we have recorded 19 such observations. In the even bigger data than this, we may have to draw much bigger tables which makes our work cumbersome and time-consuming. To overcome this limitation, we can represent that data in classes or groups as shown below:In the table above, we have grouped the marks obtained by students in groups, which are called the classes or class intervals and their size is called the class-size or class width. Class 11-20 means the marks obtained between 11 and 20 including both 11 and 20. The number of observations falling in a particular class is called the frequency of that class or class frequency.
We can prepare Grouped frequency distributions by two methods:
I. Inclusive Method
II. Exclusive Method
Inclusive Method: In this method, the classes are so formed that the upper limit of a class is included in that class. For example: In the class 11-20 of marks obtained by students, a student who has obtained 20 marks is included in this class.
Grouped frequency distribution table:1 is shown below which is arranged by an inclusive method.Suppose, two new students are admitted in the class 9th whose marks are 20.5 and 30.5, but we cannot add them in the class intervals viz. 11-20 and 21-30 as these values are not included in any of these class intervals. To include marks 20.5 and 30.5, we use
Exclusive Method for preparing grouped frequency distribution. So, to include marks 20.5 and 30.5, we need intervals such that the upper limit of a class interval should be the same as the lower limit of the next class interval.
Exclusive Method: The class intervals are formed such that the upper limit of a class interval is the same as the lower limit of the next class interval. This method is called the exclusive method of classification.
Grouped frequency distribution table:2 is shown below which is arranged by an exclusive method.In this method, the upper limit of a class is not included in the class.
For example, if a student scores 20 marks, then it is included in the class 20-30 but not in the class 10-20. So, any observation which is common to two class intervals, then it shall be considered in the higher class interval.
Example: Given below are the ages of 25 students of class 9th in a school. Prepare a ungrouped frequency distribution and grouped frequency distribution table.
15, 16, 16, 17, 17, 16, 15, 15, 16, 16, 17, 15, 16, 16, 14, 16, 15, 14, 16, 15, 14, 15, 16, 16, 15, 14, 15.
In the given data the observations are only 14, 15, 16 and 17. These ages are repeated multiple times. So, 14, 15, 16 and 17 are variates of data.
Frequency distribution of the ages of 25 students is given below.For a grouped frequency distribution table, we decide class interval according to own convenience.
Grouped frequency distribution table including exclusive method of ages of 25 students are given below.
Bar Graphs: Bar graphs are the bars of uniform width that can be drawn horizontally or vertically with equal spacing between them and then the length of each bar represents the given number. Such a method of representing data is called a bar diagram or a bar graph.
For a clear representation of categorical data or any ungrouped discrete frequency observations, we generally use the bar graphs.
Example 1: Considering the modes of transport of 30 students of class 9th is given below:
In order to draw the bar graph for the data above, we prepare the frequency table as given below.
Now, we can represent this data using a bar graph, by following the steps as shown below:
- First, we draw two axes viz. x–axis and y–axis. Then, we decide what each axis of the graph represents. By convention, the variates being measured goes on the horizontal (x–axis) and the frequency goes on the vertical (y–axis).
- Next, decide on a numeric scale for the frequency axis. This axis represents the frequency in each category by its height. It must start at zero and include the largest frequency.
- Having decided on a range for the frequency axis we need to decide on a suitable number scale to label this axis. This should have sensible values, for example, 0, 1, 2, . . . , or 0, 10, 20 . . . , or other such values as to make sense given the data.
- Draw the axes and label them appropriately.
- Draw a bar for each category. When drawing the bars it is essential to ensure the following:
- the width of each bar is the same
- the bars are separated from each other by equally sized gaps.
Using this bar graph, we can easily identify the most popular mode of transport is the metro. Bar graphs provide a simple method of quickly spotting patterns within a discrete data set.
Histograms
Histogram was first introduced by Karl Pearson in 1891. Bar charts have their limitations; like they cannot be used to represent continuous data. When dealing with continuous random variables different kinds of graphs are used. This type of graph is called a histogram.
At first sight, a histogram looks similar to bar charts. However, there are two critical differences:
Example 2: Consider the weights of 20 students of a class 9th as given below:
Now, arranging the data in ascending order.
40, 41, 42, 42, 43, 46, 46, 47, 52, 53, 53, 55, 57, 57, 58, 59, 60, 61, 62, 64.
In order to draw the histogram for the data above, we prepare the frequency table as given below.We can represent this information using histogram, by following steps as shown below:
- Find the maximum frequency and draw the vertical (y–axis) from zero to this value.
- The range of the horizontal (x–axis) needs to include a full range of the class intervals from the frequency table.
- Draw a bar for each group in your frequency table. These should be the same width and touch each other (unless there are no data in one particular class).
Frequency Polygon
It is a natural extension of the histogram. In frequency polygon rather than drawing bars, each class is represented by one point and these are joined together by straight lines. We draw frequency polygons in a similar way of drawing a histogram.
Example 3: Consider the weights of 20 students of a class 9th as given below:
Now, arranging the data in ascending order.
40, 41, 42, 42, 43, 46, 46, 47, 52, 53, 53, 55, 57, 57, 58, 59, 60, 61, 62, 64.
In order to draw the frequency polygon for the data above, we prepare the frequency table as given below.We can then present this information as a frequency polygon, by following the process of the steps shown below:
- Prepare a frequency table.
- Find the maximum frequency and draw the vertical (y–axis) from zero to this value.
- The range of the horizontal (x–axis) needs to include all class intervals from the frequency table.
- Draw bars for each class interval in the frequency table. These bars should be of the same width and are adjacent to each other (unless there are no data in one particular class)
- Connect the midpoints of the top side of each bar by a dotted line as shown below.
Frequency polygons can also be drawn independently without drawing histograms. For this, we require the mid-points of the class-interval. These mid-points of the class intervals are called class marks.
Classmark = Upper Limit + Lower Limit / 2
There are three main measures of central tendency: the mean, the median, and the mode.
Mean (Average): It is calculated by dividing the sum of all observations in data by the number of observations. So, if we have n observations in a data set and they have observations x1, x2, ...,xn, the sample mean, usually denoted by x̅ (read as x bar), is:This formula is usually written in a slightly different manner using the Greek capital letter, Σ, read as "sigma", which means sum of:
Where x = x1 + x2 + ........ + xn.
Example 1: Find the mean of the marks obtained by 20 students of class 9th of a school :
20, 30, 30, 10, 40, 45, 30, 20, 25, 45, 10, 25, 35, 45, 40, 20, 30, 25, 20,10.
Suppose that x1= 20, x2 = 30, x3 = 30, x4 = 10, x5 = 40, x6 = 45, x7 = 30, x8 = 20, x9 = 25, x10 = 45, x11 = 10, x12 = 25, x13 = 35, x14 = 45, x15 = 40, x16 = 20, x17 = 30, x18 = 25, x19 = 20 & x20 = 10.
Therefore,
Where,
= 20 + 30 + 30 + 10 + 40 + 45 + 30
+ 20 + 25 + 45 + 10 + 25 + 35
+ 45 + 40 + 20 + 30 + 25 + 20
+ 10
= 555
Therefore, x̅ = 555/20 = 27. 75.
So, the mean of the marks obtained by 20 students of class 9th = 27.75.
Median: The median is the middle observation for a set of data that has been arranged in either ascending or descending order. Median is that observation that splits the arranged data into two halves.
For an even and odd number of observations in ungrouped data, we have different approaches to find the median.
Example 2: The heights (in cm) of 11 students of a class 9th are as follows:
155, 160, 140, 130, 145, 135, 150, 152, 160, 142, 144.
First of all, we arrange the data in ascending order, as follows:
130, 135, 140, 142, 144, 145, 150, 152, 155, 160, 160.
Since the number of students is 11, an odd number, we find out the median by finding the height of the (n + 1/2)th = (11 + 1/2)th = the 6th students, which is 145 cm. [where n is the number of students]
Mode: It is defined as the most frequently occurring observations in data. That is, an observation with the maximum frequency is called the mode.
Example 4: The heights (in cm) of 12 students of a class 9th are as follows:
155, 160, 140, 130, 145, 135, 150, 152, 160, 142, 144, 160.
We can arrange the given data in ascending order:
130, 135, 140, 142, 144, 145, 150, 152, 155, 160, 160, 160.
Here 160 cm occurs most frequently, i.e. three times. So, the mode is 160 cm.
Example 5: The heights (in cm) of 12 students of a class 9th are as follows:
150, 160, 140, 144, 143, 153, 153, 155, 160, 160, 155, 155.
We can arrange the given data in ascending order:
140, 143, 144, 150, 153, 153, 155, 155, 155, 160, 160, 160.
Here, 155 cm and 160 cm all occur most frequently (three times).
So, the mode is 155 cm and 160 cm.
1. What is statistics and why is it important? | ![]() |
2. What are the different types of data in statistics? | ![]() |
3. How do you calculate the mean, median, and mode in statistics? | ![]() |
4. What is the difference between population and sample in statistics? | ![]() |
5. How do you interpret a standard deviation in statistics? | ![]() |
62 videos|426 docs|102 tests
|
62 videos|426 docs|102 tests
|