Introduction
The term "statistics" carries various meanings for different individuals. To some, it represents a singular numerical representation of a dataset, while others view it as numerical measurements or counts. Mathematicians utilize statistics to succinctly summarize data in a single word, considering it as a summary of an event. The term "number," denoted as 'n,' serves as a statistic indicating the size of a dataset, representing the quantity of data points within it. Furthermore, the application of statistical knowledge extends to various aspects of daily life, aiding individuals in making decisions based on diverse sets of available information. However, in the realm of behavioral sciences, "statistics" assumes a different role, primarily focusing on drawing statistical inferences about populations based on quantitative and qualitative information at hand.
- The term "statistics" can be defined in two distinct manners. In its singular form, "Statistics" refers to what is commonly known as statistical methods, while in its plural form, it denotes "data."
- In this unit, we will employ the term "statistics" in its singular sense. In this context, it is delineated as a branch of science concerned with the collection, classification, analysis, and interpretation of statistical data.
The discipline of statistics can be broadly categorized into two main branches:
Descriptive Statistics, and ii) Inferential Statistics
- Descriptive Statistics: The majority of observations in the universe exhibit variability, particularly those related to human behavior. It is widely acknowledged that attributes such as attitude, intelligence, and personality vary among individuals. To establish a meaningful definition of a group or to identify it based on their observations/scores, it becomes imperative to express these observations accurately. Descriptive statistics, as a branch of statistics, focuses on providing descriptions of acquired data. Through these descriptions, specific population groups are defined based on their corresponding characteristics. Descriptive statistics encompass processes such as classification, tabulation, diagrammatic and graphical presentation of data, as well as measures of central tendency and variability. These measures enable researchers to discern patterns within the data or scores, thereby facilitating the description of phenomena. Parameters of the distribution, which represent single estimates summarizing the distribution of data, are fundamental in defining the distribution comprehensively.
Essentially, descriptive statistics involves two primary operations:
(i) Organization of data, and (ii) Summarization of data
Question for Descriptive Statistics-tabular, graphical and numerical methods
Try yourself:
What does descriptive statistics focus on?Explanation
- Descriptive statistics focuses on providing descriptions of acquired data.
- It involves processes such as classification, tabulation, diagrammatic and graphical presentation of data, as well as measures of central tendency and variability.
- The main goal of descriptive statistics is to accurately express observations and define specific population groups based on their characteristics.
- Therefore, descriptive statistics primarily focuses on classifying and organizing data.
Report a problem
Organisation of Data
There are four primary statistical methods for organizing data:
1. Classification
- Classification involves arranging data into groups based on similarities. It summarizes the frequency of individual scores or score ranges for a variable. In its simplest form, a distribution displays each value of a variable alongside the number of occurrences for each value.
- Once data are collected, organizing them facilitates drawing conclusions and making informed decisions. A clearer understanding of data emerges when raw data are organized into a frequency distribution, which illustrates the number of cases falling within specific class intervals or score ranges.
Frequency Distribution with Ungrouped Data and Grouped Data:
- Ungrouped Frequency Distribution: Ungrouped data can be represented by listing all score values and tallying the occurrences of each score.
- Grouped Frequency Distribution: When there is a wide range of score values, making it challenging to visualize the data clearly, a grouped frequency distribution is constructed. This method organizes data into classes, showing the number of observations that fall within each class interval.
Construction of Frequency Distribution:
- To prepare a frequency distribution, several factors must be determined:
- Range of the given data, calculated as the difference between the highest and lowest scores.
- Number of class intervals, typically ranging between 5 and 30.
- Limits of each class interval, known as the class width or range, denoted by 'i.' Class intervals should be of uniform width and divisible by convenient numbers like 2, 3, 5, 10, or 20.
Methods for Describing Class Limits:
Three methods for describing class limits are:
- Exclusive Method: Classes are formed such that the upper limit of one class becomes the lower limit of the next class, assuming that the upper limit of a class is exclusive.
- Inclusive Method: Classes are formed without overlapping limits, including scores equal to the upper limit of each class. This method is preferred for whole number measurements.
- True or Actual Class Method: Class limits are defined mathematically, extending 0.5 units below and above the score's face value on a continuum. These limits are referred to as true or actual class limits.
Types of Frequency Distributions
There are several methods to organize frequency distributions of a dataset based on the statistical analysis or study requirements. Below are a couple of them discussed:
- Relative Frequency Distribution: A relative frequency distribution represents the proportion of the total number of cases observed at each score value or within score value intervals.
- Cumulative Frequency Distribution: In some cases, investigators may want to determine the number of observations less than a specific value. This can be achieved by calculating the cumulative frequency, which sums up the frequencies for a particular class interval and all preceding intervals.
- Cumulative Relative Frequency Distribution: A cumulative relative frequency distribution expresses the cumulative frequency of any score or class interval as a proportion of the total number of cases.
Question for Descriptive Statistics-tabular, graphical and numerical methods
Try yourself:
How is data organized in a grouped frequency distribution?Explanation
- To organize data in a grouped frequency distribution, the range of the given data and the number of class intervals need to be determined.
- The range is calculated as the difference between the highest and lowest scores.
- The number of class intervals typically ranges between 5 and 30.
- Class intervals should be of uniform width and divisible by convenient numbers like 2, 3, 5, 10, or 20.
- By determining the range and number of class intervals, data can be grouped effectively for a clearer visualization and analysis.
Report a problem
2. Tabulation
Data can be presented in the form of a table or a graph, with tabulation being the process of organizing classified data into a table. Tabular presentation enhances the comprehensibility of data and makes it suitable for further statistical analysis. A table consists of several components:
- Table Number: When multiple tables are included in an analysis, each should be assigned a unique number for reference and identification. The number is typically centered at the top of the table.
- Title of the Table: Every table should have a clear and concise title that describes its content. The title is placed centrally at the top of the table or just below/after the table number.
- Caption: Captions are concise headings for columns, which may include headings and sub-headings. They are positioned in the middle of the columns, providing clarity on the data categories such as gender, location, or socioeconomic status.
- Stub: Stubs are brief headings for rows, providing context for the data presented in each row.
- Body of the Table: The main section of the table contains the numerical data arranged according to the captions and stubs.
- Head Note: This note, positioned at the extreme right below the title, explains the units of measurement used in the table.
- Footnote: Footnotes are qualifying statements placed below the table, providing additional information or clarifications not covered in the title, caption, or stubs.
- Source of Data: It is important to mention the source of the data used in the table, typically placed at the end of the table to provide credibility and transparency.
3. Graphical Representation of Data
- The purpose of creating a frequency distribution is to offer a structured approach to interpreting data. To enhance this interpretation, the information from a frequency distribution is often depicted in graphical or diagrammatic formats. Graphical presentation of frequency distributions involves plotting frequencies on a visual platform formed by horizontal and vertical lines, known as a graph.
- A graph is constructed using two perpendicular lines called the X and Y-axes, with appropriate scales indicated. The horizontal line, known as the abscissa, represents one variable, while the vertical line, the ordinate, represents the corresponding frequencies.
Various types of graphs are used to convey statistical information effectively, including histograms, frequency polygons, frequency curves, and cumulative frequency curves.
- Histogram: This method is widely used for illustrating continuous frequency distributions graphically. In a histogram, each class interval's upper limit serves as the lower limit for the next interval. The histogram consists of a series of rectangles, with the width representing the class interval and the height indicating the corresponding frequency.
- Frequency Polygon: To construct a frequency polygon, an abscissa is drawn from point 'O' to point 'X', and an ordinate is drawn from 'O' to 'Y'. The class intervals are labeled on the abscissa, with exact limits or midpoints indicated. Frequencies are then plotted against their respective class intervals on the ordinate, and a line is drawn to connect these points, forming the polygon.
- Frequency Curve: A frequency curve is a smooth, freehand curve drawn through the points of a frequency polygon. Its purpose is to reduce random or erratic fluctuations present in the data, providing a clearer representation of the distribution.
Cumulative Frequency Curve or Ogive
The graph representing a cumulative frequency distribution is called a cumulative frequency curve or ogive. There are two types of ogives based on the type of cumulative frequencies:
- 'Less Than' Ogive: In this type, the cumulative frequencies less than each class boundary are plotted against the upper class boundaries. It is an increasing curve sloping upwards from left to right.
- 'More Than' Ogive: Here, the cumulative frequencies greater than each class boundary are plotted against the lower class boundaries. It is a decreasing curve sloping downwards from left to right.
Question for Descriptive Statistics-tabular, graphical and numerical methods
Try yourself:
What is the purpose of tabulating data?Explanation
- Tabulating data involves organizing classified data into a table, which enhances its comprehensibility.
- Tabulation also makes the data suitable for further statistical analysis.
- Therefore, the purpose of tabulating data is to achieve all of the above objectives.
Report a problem
4. Diagrammatic Representation of Data
A diagram serves as a visual tool for presenting statistical data in a simple and easily understandable manner. Diagrammatic presentation is solely focused on visually representing the data, whereas graphic presentation can be utilized for further analytical purposes. Various forms of diagrams include:
- Bar Diagram: This type of diagram is particularly useful for representing categorical data. Each bar corresponds to a category, with the variable displayed on the horizontal axis and the frequency on the vertical axis. The height of each bar represents the frequency or value of the variable.
- Subdivided Bar Diagram: Subdivided bar diagrams are employed for studying sub-classifications within a dataset. Each bar is divided and shaded according to the sub-categories of the data. The proportion of each sub-class is reflected by the portion of the bar it occupies.
- Multiple Bar Diagram: Multiple bar diagrams are used to compare two or more sets of related phenomena or variables. Bars representing different sets are drawn side by side without any gaps, and various colors or shades are utilized to differentiate between them.
- Pie Diagram: Also known as an angular diagram, a pie chart consists of a circle divided into sectors corresponding to the frequencies of variables in the distribution. Each sector's size is proportional to the frequency of the variable it represents. The circle, representing 360 degrees, is divided proportionally based on percentages. After calculating the angles for each component, segments are drawn in the circle, with different colors or shades used to distinguish between them.
Summary of Data
In the preceding section, we discussed the tabulation and graphical representation of data. However, in research, merely tabulating data may not suffice, especially when comparing multiple series of the same type to identify trends in variables. For such comparisons, it becomes necessary to delve deeper into the characteristics of the data, which is achieved through summary statistics. The frequency distribution of collected data can differ in terms of measures of central tendency and the extent of spread around the central value. These differences constitute the components of summary statistics.
Measures of Central Tendency
Central tendency refers to the middle point of a distribution, where values tend to cluster around a central value. Measures of central tendency aim to capture this tendency accurately. A good measure of central tendency should be clearly defined, easy to comprehend, based on all observations, and resistant to fluctuations in sampling. The three most commonly used measures of central tendency are:
- Arithmetic Mean: This is the average obtained by dividing the sum of all values by the total number of values. It is widely used and useful for further statistical analysis.
- Median: The median is the middle value in a dataset, dividing it into two equal parts. It is not influenced by extreme values.
- Mode: The mode is the value in a distribution with the highest frequency, representing the most typical value.
Measures of Dispersion
Knowing only the central tendency of data is insufficient for a complete understanding. Measures of dispersion quantify the spread or variability of data. The most commonly used measures of dispersion include:
- Range: This is the difference between the largest and smallest values in the distribution.
- Average Deviation: It is the arithmetic mean of the differences between each score and the mean.
- Standard Deviation: This is the most stable index of variability, calculated as the square root of the variance, which is the mean of the squared deviations from the mean. Standard deviation is less affected by sampling fluctuations compared to other measures of dispersion.
Question for Descriptive Statistics-tabular, graphical and numerical methods
Try yourself:
What type of diagram is particularly useful for representing categorical data?Explanation
- A bar diagram is particularly useful for representing categorical data.
- Each bar corresponds to a category, with the variable displayed on the horizontal axis and the frequency on the vertical axis.
- The height of each bar represents the frequency or value of the variable.
- This type of diagram provides a clear visual representation of the distribution of categorical data.
Report a problem