Introduction
- Data management is crucial for organizations, such as schools, to store and handle information electronically.
- Example: A school maintains student data, including attendance records and guardian details.
- Class teachers manually mark daily attendance in a register, recording 'P' for present or 'A' for absent against each student's roll number.
- For a class of 50 students and 26 working days in a month, teachers record 50 × 26 = 1300 entries manually, which becomes tedious as data volume increases.
- Limitations of manual record keeping include:
- Repetitive entry of student details (e.g., roll number, name) when students are promoted to the next class.
- Inconsistencies in writing student details each month, such as incorrect names or skipped records.
- Data loss if the attendance register is lost or damaged.
- Errors in manual consolidation of attendance records.
- Office staff manually maintain student details (roll number, name, date of birth) and guardian details (guardian name, contact number, address) for correspondence.
- Manual systems make it difficult to find, delete, or modify information from large volumes of paper records.
- Computerized systems allow:
- Copying student details to new attendance files when students are promoted.
- Easy retrieval of student or guardian data.
- Adding details for new students.
- Modifying existing student or guardian details.
- Deleting data when a student leaves the school.
File System
- A file is a container for storing data on a computer, saved on storage devices.
- File contents can include text, program code, comma-separated values (CSV), pictures, audio/videos, or web pages.
- Files can be accessed directly and searched for specific data.
- To access file data via software (e.g., displaying a monthly attendance report on a school website), custom programs must be written.
- Example: A school stores student and attendance data in two files:
- STUDENT file(Table 7.1) contains:
- RollNumber: Student's roll number.
- SName: Student's name.
- SDateofBirth: Student's date of birth.
- GName: Guardian's name.
- GPhone: Guardian's phone number.
- GAddress: Guardian's address.
- ATTENDANCE file(Table 7.2) contains:
- AttendanceDate: Date of attendance.
- RollNumber: Student's roll number.
- SName: Student's name.
- AttendanceStatus: 'P' (present) or 'A' (absent).
Limitations of a File System
- File systems become difficult to manage as the number of files and data volume grows.
Specific limitations include
Difficulty in Access:
- Files lack built-in data retrieval mechanisms; application programs are required.
- Developers may not anticipate all access needs, making it hard to retrieve data in desired formats.
- Additional programs must be written for specific data access requirements.
Data Redundancy:
- Redundancy occurs when the same data is duplicated across files.
- Example: Student names appear in both STUDENT and ATTENDANCE files.
- Example: Guardian details (e.g., Himanshu Shah) are repeated for multiple students in the STUDENT file.
- Redundancy wastes storage and may lead to data inconsistency.
Data Inconsistency:
- Inconsistency arises when duplicated data across files do not match.
- Example: If a student's name is updated in one file but not another, the data becomes inconsistent.
- Example: If a student leaves, their details must be deleted from both files; failure to update one file causes inconsistency.
- Inconsistency is likely when different people maintain separate files.
Data Isolation:
- STUDENT and ATTENDANCE files are related but lack a direct link or mapping.
- Separate programs are needed to access each file.
- In complex systems, files created by different people at different times may have varying formats, complicating data retrieval.
- Developers must understand each file's structure to write retrieval programs.
Data Dependence:
- Data is stored in a specific format; changing the format requires updating all related application programs.
- Without updates, programs may fail to access the file correctly.
- This dependency complicates file structure modifications.
Controlled Data Sharing:
- Different users (e.g., teachers, office staff, guardians) require different access levels.
- Example: Guardians and staff should only view attendance data, while teachers can update it.
- File systems struggle to enforce such access controls through application programs.
Database Management System
- A Database Management System (DBMS) is software for creating and managing databases, overcoming file system limitations.
- DBMS allows users to create, store, manage, update, and retrieve data efficiently.
- Examples of DBMS include MySQL, Oracle, PostgreSQL, SQL Server, Microsoft Access, and MongoDB.
- DBMS hides storage and maintenance details, providing an abstract view of data.
- It includes programs for accessing, modifying, and retrieving data.
- DBMS acts as an interface between the database and users or applications.
- Data retrieval is done through querying, using special commands.
- Users can modify the database structure via the DBMS.
- Applications of databases include:
- Banking: Customer info, accounts, loans, transactions.
- Crop Loan: Farmer data, land details, loan history.
- Inventory Management: Products, customers, orders, deliveries.
- Organization Resource Management: Employee records, salaries, departments.
- Online Shopping: Item descriptions, user logins, preferences.
File System to DBMS
Converting the school's STUDENT and ATTENDANCE files into a database requires:
- Removing SName from the ATTENDANCE file, as it's redundant with the STUDENT file; RollNumber links the files.
- Splitting the STUDENT file into STUDENT and GUARDIAN files to avoid redundant guardian data (e.g., for siblings).
- Adding a unique GUID (Guardian ID) column to the GUARDIAN file to distinguish guardians, as names may be duplicated.
- Using GUID in the STUDENT file to link to the GUARDIAN file, since phone numbers may change.
The resulting database, STUDENTATTENDANCE, includes three tables:
- STUDENT: RollNumber, SName, SDateofBirth, GUID.
- GUARDIAN: GUID, GName, GPhone, GAddress.
- ATTENDANCE: AttendanceDate, RollNumber, AttendanceStatus.
DBMS maintains a centralized data repository, accessible by multiple users (e.g., office staff, teachers) simultaneously.
Key Concepts in DBMS
Database Schema:
- The design of a database, representing table names, columns, data types, constraints, and relationships.
- Also called the visual or logical architecture, showing data organization.
Data Constraint:
- Restrictions on data in table columns, defined during table creation.
- Example: Mobile number column restricted to 10-digit non-negative integers.
- Example: RollNumber column set to NOT NULL and UNIQUE to ensure unique student IDs.
- Constraints ensure data accuracy and reliability.
Meta-data or Data Dictionary:
- Database schema and constraints stored in a database catalog, called meta-data (data about data).
Database Instance:
- The state of a database at a given time, initially empty when the schema is defined.
- Data loading creates an instance; queries and manipulations (insert, update, delete) change the instance.
Query:
- A request to retrieve information from one or more tables.
- Example: "Find names of students present on 2000-01-02."
- Written in a query language (e.g., SQL, discussed in Chapter 8).
Data Manipulation:
- Includes Insertion, Deletion, and Update operations.
- Example: Adding a new student's details to STUDENT and GUARDIAN tables (Insertion).
- Example: Removing a student's data from all tables when they leave (Deletion).
- Example: Updating a guardian's phone number in the GUARDIAN table (Update).
Database Engine:
- The underlying component of a DBMS that creates databases and handles queries for data retrieval and manipulation.
Relational Data Model
A data model defines the database structure, data representation, relationships, and constraints.
- The Relational Data Model is the most common, used in Relational DBMS (RDBMS).
- Other models include object-oriented, entity-relationship, document, and hierarchical models.
In the relational model:
- Tables are called relations, storing data in columns.
- Each table has unique column names.
- Each row (tuple) represents a related set of values for an entity.
- Example: Each row in the GUARDIAN table represents a guardian's details (GUID, name, phone, address).
Relations are interconnected, not independent:
- ATTENDANCE's RollNumber links to STUDENT records.
- STUDENT's GUID links to GUARDIAN records.
- Linking attributes ensure database integrity and valid data retrieval.
Key terminologies:
- Relation: A table (e.g., GUARDIAN).
- Tuple: A row in a table, representing a record.
- Attribute: A column in a table, representing a feature.
- Domain: The set of permissible values for an attribute (e.g., 10-digit numbers for GPhone).
- Degree: Number of attributes in a relation (e.g., GUARDIAN has degree 4).
- Cardinality: Number of tuples in a relation (e.g., GUARDIAN has cardinality 5).
Three Important Properties of a Relation
Property 1 (Attributes):
- Each attribute in a relation has a unique name.
- The sequence of attributes is immaterial.
Property 2 (Tuples):
- Each tuple is distinct; no two tuples can have identical values for all attributes.
- Tuples are uniquely identified by their contents.
- The sequence of tuples is immaterial, despite appearing ordered in tables.
Property 3 (State of Relation):
- All values in an attribute must be from the same domain (same data type).
- Values must be atomic (indivisible into meaningful subparts), e.g., GPhone is a single 10-digit number.
- No attribute can have multiple values in one tuple, e.g., GPhone cannot list multiple numbers.
- NULL represents unknown or non-applicable values, e.g., GPhone is NULL if not provided.
Keys in a Relational Database
Tuples must be distinct; at least one attribute must have unique, non-NULL values to identify each tuple.
- Keys impose constraints on attribute values and inter-relation references, defined during database design.
Candidate Key
- Attributes that take distinct values and can uniquely identify tuples.
- Multiple attributes may qualify as candidate keys.
- Example: In the GUARDIAN table, GUID and GPhone are unique, making them candidate keys.
Primary Key
- The candidate key chosen by the designer to uniquely identify tuples.
- Remaining candidate keys are alternate keys.
- Example: If GUID is chosen as the primary key for GUARDIAN, GPhone becomes an alternate key.
Composite Primary Key
- If no single attribute uniquely identifies tuples, multiple attributes are combined as the primary key.
- Example: In the ATTENDANCE table, neither RollNumber nor AttendanceDate alone is unique, but their combination is unique (as a student is marked once per day).
- Thus, {RollNumber, AttendanceDate} forms the composite primary key.
Foreign Key
- An attribute in one relation whose values are derived from the primary key of another relation.
- Links two relations, with the referencing relation called the foreign relation and the referenced relation called the primary or master relation.
- Foreign keys can take NULL values if not part of the foreign table's primary key.
- Example: In the STUDENTATTENDANCE database:
- RollNumber in ATTENDANCE references RollNumber in STUDENT.
- GUID in STUDENT references GUID in GUARDIAN.
- Foreign keys are shown in schema diagrams as arrows from the foreign key to the primary key of the referenced table.