FUNDAMENTAL CHARACTERISTICS OF STATISTICS
Statistics have the following important characteristics:
(i) Statistics are aggregate of facts and not a single observation.
(ii) Statistics are expressed quantitatively.
(iii) In an experiment statistics are related to each other and comparable. It can be classified into various groups.
(iv) Statistics are collected for a pre-determined purpose.
(v) In collection of statistics a reasonable standard of accuracy must be maintained.
LIMITATIONS OF STATISTICS
Statistics have the following limitations:
(i) Statistics is not fit for study of qualitative phenomenon like honesty, intelligence, poverty etc.
(ii) Statistics deals with groups and does not study individuals.
(iii) Laws of statistics are not exact. These are true on averages.
(iv) Data collected for a definite purpose may not be suitable for another purpose.
Statistical data are the facts which are collected for the purpose of investigation. There are two types of statistical data:
(i) Primary data: The data collected by an investigator for the first time for his own purpose are called primary data. As the primary data are collected by the user of the data, so it is more reliable and relevant.
(ii) Secondary data: The data collected by a secondary source and used by the investigator for his purpose is called secondary data. For example score of a cricket match noted from newspapers is secondary data.
Thus data which are primary in the hands of one become secondary in the hands of the other.
Data collected by any source also can be divided in following two types:
(i) Raw Data: Raw data are those data which are obtained from the original source but not arranged numerically. This is also called ‘ungrouped data’ for example marks of 10 students in maths are given as:
75, 96, 25, 32, 89, 62, 40, 79, 35, 55
An ‘array’ is an arrangement of raw numerical data in the ascending or descending order of magnitude. Above data can be written as
25, 32, 35, 40, 55, 62, 75, 79, 89, 96
(ii) Grouped data: An array can be placed systematically in groups or categories. For example the above data can be grouped in following manner.
TOTAL NUMBER OF STUDENTS
0 to 20
21 to 40
25, 32, 35, 40
41 to 60
61 to 80
62, 75, 79
81 to 100
SOME BASIC DEFINITIONS
(i) Variate: Variate is a quantity that may vary from observation to observation.
(ii) Range: Range is difference between the maximum and minimum observations.
(iii) Class Interval: When data are divided in groups, each group is called a class interval.
(iv) Class Limit: Every class interval has two limits. The smallest observation of the interval is called lower limit and the largest observation of the interval is called upper limit.
(v) Class Mark: The mid value of any class is called its class mark.
Class Mark =
(vi) Class Size: Class size is defined as the difference between two successive class marks. It is also the difference between the upper and lower limits of any class interval.
(vii) Frequency: In a particular class the count of the number of observation is called its frequency. So the corresponding frequency of a class is called its class frequency.
(viii) Cumulative Frequency: The cumulative frequency of any class is obtained by adding all the frequencies successively prior to that class i.e. it is the sum of all frequencies up to that class.
Inclusive and Exclusive distributions:
Inclusive Distribution: When in a distribution, the upper limit does not coincide with the lower limit of the next class then the distribution is called an inclusive distribution. e.g.
Height (in cm)
No. of Students
Exclusive Distribution: An exclusive distribution is that distribution in which the upper limit of one class coincides with the lower limit of the next class. e.g.
Height (in years)
No. of Students
True Class Limit: In the case of exclusive classes the upper and lower limits are respectively known as its true upper limits and true lower limits.
In the case of inclusive classes, the true lower and upper limits are obtained by subtracting 0.5 from the lower limit and adding 0.5 to the upper limit.
True upper limits and true lower limits are also known as boundaries of the class.
Tally: Tally method is used to keep the chance of error at minimum in counting. A bar (|) called tally mark is put against any item when it occurs. The fifth occurrence of any item is represented by putting diagonally a cross tally (|) on the first four tallies.
FREQUENCY DISTRIBUTION TABLE
The tabular arrangement of data showing the frequency of each item is called a frequency distribution table. It is a method to present raw data in the form from which one can easily understand the information contained in the raw data.
Frequency distribution are of two types:
(i) Discrete frequency distribution: In this type of frequency distribution, in the first column of frequency table we write all possible values of the variables from the lowest to the highest, in the second column we write tally marks and in the third column we show frequency of each item. In this method data are not divided into groups or classes.
(ii) Continuous or Grouped Frequency Distribution: In the frequency distribution data are divided into groups or classes. This method is used only where the values in the raw data are largely repeating and the difference between the greatest and the smallest observations is not very large.
PREPARATION OF A FREQUENCY DISTRIBUTION TABLE:
The following steps are taken to prepare a frequency distribution table:
(i) First of all we arrange the data in an array.
(ii) Then draw a table consisting of 3 columns. First column is used for class, the second column for tally and the third column for frequency.
(iii) Then in the first column we write the classes keeping the lowest and the highest scores in view.
(iv) In second column we put tally marks against each class according to the scores.
(v) Then we write frequency of each class in the third column after counting the tally.
(vi) Figures in first column and third column taken together represent the frequency table.
CUMULATIVE FREQUENCY TABLE
Cumulative frequency table is obtained from the ordinary frequency table by successively adding the several frequencies. Thus to form a cumulative frequency table we add a column of cumulative frequency in the frequency distribution table. It is obvious that the cumulative frequency of the last class is the sum of the frequencies of all the classes.
Cumulative frequency series are of two types:
(i) Less than series
(ii) More than series
GRAPHICAL REPRESENTATION OF DATA:
A given data can be represented in graphical way. There are various methods of graphical representation of frequency distribution. Here we shall study only four of them:
The frequency distribution of a discrete value is best represented by a bar graph. The height of the bars is proportional to the frequency of each variate-value. In a bar graph the bars must be kept distinct to show that the variate-values are distinct. The bars are of equal width and are drawn with equal spacing between them on the x-axis depicting the variable. The values of the variable are shown on the y-axis.
Histogram is a graphical representation of a grouped frequency distribution with continuous classes. It consists of a set of rectangles where heights of rectangles are proportional to their class frequencies, for equal class intervals. There is no gap between two successive rectangles. The rectangles are constructed with base as the class size and their heights representing the frequencies.