Normalize data to third normal form (3NF)

Topic 10 – Database & File Concepts (AS Level)

Scope of this lesson

  • 10.1 Creating a database – data types, field sizes, key fields, referential integrity, flat vs relational files, file‑access methods, DBMS categories, MIS overview.
  • 10.2 Entity‑Relationship Diagrams (ERDs) – conceptual, logical, physical levels.
  • 10.3 Normalisation to Third Normal Form (3NF) – UNF → 1NF → 2NF → 3NF, functional dependencies, advantages/disadvantages, when to de‑normalise.
  • 10.4 Data dictionary – required columns and examples.

Cross‑Reference to Cambridge AS Syllabus (9626)

Notes Section Syllabus Bullet(s)
Creating a Database – Data types & field sizes 10.1.1 Data types and field sizes
Creating a Database – Key fields (PK, FK, composite, compound) 10.1.2 Primary, foreign, composite and compound keys
Creating a Database – Referential integrity 10.1.3 Referential integrity
Creating a Database – Flat vs relational files 10.1.4 Flat‑file and relational‑file concepts
Creating a Database – File‑access methods 10.1.5 Indexed‑sequential, direct and random access files
Creating a Database – DBMS categories & MIS 10.1.6 Types of DBMS; role of MIS
ERDs – Conceptual, logical, physical 10.2.1 Levels of ER diagrams
Functional Dependencies 10.3.1 Functional dependency notation and analysis
Normalisation – Definitions & check‑lists 10.3.2 Definitions of UNF, 1NF, 2NF, 3NF
Normalisation – Step‑by‑step procedure 10.3.3 Process of normalising to 3NF
Normalisation – Advantages / disadvantages 10.3.4 Benefits and limits of normalisation; when to de‑normalise
Data Dictionary – Required columns 10.4.1 Data‑dictionary structure and contents

1. Creating a Database

1.1 Data Types & Field Sizes

  • Numeric: INT, SMALLINT, BIGINT, DECIMAL(p,s), FLOAT
  • Character: CHAR(n) (fixed length), VARCHAR(n) (variable length)
  • Date/Time: DATE, TIME, DATETIME, TIMESTAMP
  • Boolean: BIT / BOOLEAN
  • Binary: BLOB, VARBINARY

Choose the smallest type that can safely hold the required data – this minimises storage and improves performance.

1.2 Key Fields

  • Primary Key (PK) – uniquely identifies each record; never NULL.
  • Foreign Key (FK) – references a PK in another table; enforces referential integrity.
  • Composite Key – PK formed from two or more attributes (e.g., (StudentID, CourseCode)).
  • Compound Key – a key that may combine single‑field and multi‑field parts (used rarely in the Cambridge syllabus).

1.3 Referential Integrity

The DBMS must prevent:

  • Insertion of a FK value that does not exist in the referenced PK.
  • Deletion or update of a PK value while related FK rows still exist (unless cascade or set null rules are defined).

1.4 Flat vs. Relational Files

  • Flat file: a single table (often a CSV or text file). May contain repeating groups, leading to redundancy and update anomalies.
  • Relational file: a collection of related tables linked by PK/FK pairs. Eliminates most redundancy and supports integrity checks.

1.5 File‑Access Methods

MethodHow it worksTypical use
Sequential AccessRecords are read/written in order from the start of the file.Batch processing, log files.
Indexed‑Sequential AccessAn index (often a B‑tree) points to the location of each record, allowing fast search while preserving sequential order.Most DBMS tables, file‑based systems.
Direct (Random) AccessRecords can be retrieved directly via a physical address or hash value.High‑performance transaction systems.

1.6 Types of DBMS & MIS Overview

  • File‑based DBMS – stores data in flat or indexed files; limited multi‑user support.
  • Relational DBMS (RDBMS) – uses tables, PK/FK, SQL; supports ACID properties.
  • Object‑oriented DBMS – stores objects; useful for CAD/CAM, multimedia.
  • Distributed DBMS – data spread across multiple sites; provides transparency.

A Management Information System (MIS) extracts, processes and presents data from one or more databases to support decision‑making.


2. Entity‑Relationship Diagrams (ERDs)

  • Conceptual ERD – only entities and relationships; no attributes, PK/FK symbols.
  • Logical ERD – adds attributes, underlines primary keys, marks foreign keys (FK).
  • Physical ERD – includes data types, field sizes, indexes, and any integrity rules.
Logical ERD for a student‑course registration system
Figure 1 – Logical ERD (PKs underlined, FK arrows indicate referential integrity).

3. Functional Dependencies (FDs)

An FD X → Y means that whenever two rows have the same value(s) for the attribute set X, they must also have the same value(s) for every attribute in Y.

FDInterpretation
StudentID → StudentName, DateOfBirthStudent name and DOB are determined solely by the student identifier.
CourseCode → CourseTitle, LecturerIDEach course code uniquely determines its title and the lecturer teaching it.
(StudentID, CourseCode) → Semester, GradeThe combination of student and course determines the semester taken and the grade obtained.
LecturerID → LecturerName, DepartmentLecturer details depend only on the lecturer identifier.

Why FDs matter

  • They reveal partial dependencies (an attribute depends on part of a composite key).
  • They expose transitive dependencies** (non‑key attribute depends on another non‑key attribute).
  • Identifying all FDs is the first step in systematic normalisation.

4. Normalisation to Third Normal Form (3NF)

4.1 Definitions & Check‑lists

Normal FormDefinitionChecklist
UNF (Unnormalised Form) Data stored without any logical structure; may contain repeating groups or multi‑valued fields. Identify repeating groups → move to 1NF.
1NF All attribute values are atomic; each field holds a single, indivisible value; no repeating groups.
  • Every column contains indivisible values.
  • No arrays, lists or nested tables.
2NF Relation is in 1NF and every non‑key attribute is fully functionally dependent on the whole of every candidate key (no partial dependencies).
  • Identify all candidate keys.
  • Remove attributes that depend on only part of a composite key.
3NF Relation is in 2NF and has no transitive dependencies (no non‑key attribute depends on another non‑key attribute).
  • Identify any A → B and B → C where A is a key and C is non‑key.
  • Move B (and its dependents) to a separate table.

4.2 Advantages of Normalising to 3NF

  • Eliminates most update, insert and delete anomalies.
  • Reduces data redundancy → saves storage and improves consistency.
  • Improves data integrity; queries become more predictable.
  • Facilitates easier maintenance and future extensions.

4.3 Disadvantages / When to De‑Normalise

  • More tables mean more JOIN operations, which can affect performance on very large data sets.
  • In data‑warehousing or reporting systems, controlled redundancy may be introduced deliberately for speed.
  • Real‑time systems sometimes keep a denormalised copy for rapid lookup.

4.4 Step‑by‑Step Procedure to Reach 3NF

  1. Verify the relation is in 1NF (all values atomic, no repeating groups).
  2. List **all candidate keys** and **all functional dependencies**.
  3. Identify and remove **partial dependencies** → decompose into 2NF tables.
  4. Examine each 2NF table for **transitive dependencies** → decompose further into 3NF tables.
  5. Assign primary keys to the new tables, add foreign keys to preserve relationships, and enforce referential integrity.

5. Worked Example – Student‑Course Registration System

5.1 Unnormalised Form (UNF)

StudentIDStudentNameCourseCodeCourseTitleLecturerSemesterGrade
1001Alice BrownC101Database SystemsDr. SmithFall 2024A
1001Alice BrownC102Web DevelopmentProf. LeeFall 2024B+
1002Bob CarterC101Database SystemsDr. SmithFall 2024B

5.2 1NF Check

All fields contain single, indivisible values; there are no repeating groups. The table already satisfies 1NF.

5.3 Functional Dependencies (complete set)

FDExplanation
StudentID → StudentNameStudent name is determined by the student identifier.
CourseCode → CourseTitle, LecturerEach course code uniquely defines its title and the lecturer teaching it.
(StudentID, CourseCode) → Semester, GradeThe combination of student and course determines the semester and the grade.

5.4 Decompose to 2NF (remove partial dependencies)

Composite key = (StudentID, CourseCode). The dependencies on StudentID alone and CourseCode alone are partial.

Student (2NF)
StudentID (PK)StudentNameOtherStudentDetails
1001Alice Brown
1002Bob Carter
Course (2NF)
CourseCode (PK)CourseTitleLecturer
C101Database SystemsDr. Smith
C102Web DevelopmentProf. Lee
Registration (2NF)
StudentID (FK)CourseCode (FK)SemesterGrade
1001C101Fall 2024A
1001C102Fall 2024B+
1002C101Fall 2024B

All three tables are now in 2NF – every non‑key attribute depends on the whole primary key of its table.

5.5 Identify Transitive Dependencies (move to 3NF)

Assume the Course table also stores the lecturer’s department, giving:

  • CourseCode → Lecturer
  • Lecturer → Department

Since Department depends on Lecturer (a non‑key attribute), we have a transitive dependency.

5.6 Decompose to 3NF

Lecturer (3NF)
LecturerID (PK)LecturerNameDepartment
L01Dr. SmithComputer Science
L02Prof. LeeInformation Systems
Course (3NF)
CourseCode (PK)CourseTitleLecturerID (FK)OtherCourseDetails
C101Database SystemsL01
C102Web DevelopmentL02

The Student and Registration tables remain unchanged (they already satisfy 3NF). All tables now meet the 3NF criteria: every non‑key attribute is directly dependent on the primary key, and no transitive dependencies exist.


6. Expanded Data Dictionary

A Cambridge‑style data dictionary must contain at least the columns shown below. Additional columns such as Default and Validation Rules are useful for practical design.

Field Name Data Type Size Description Allowed Values / Format Default Validation Rules Key Type
StudentID INT 5 Unique identifier for each student 00001‑99999 Auto‑increment Not NULL, unique PK
StudentName VARCHAR 50 Full name of the student Alphabetic, spaces, hyphens Not NULL
CourseCode CHAR 4 Code assigned to a course Pattern Cnnn (e.g., C101) Not NULL, unique in Course PK (Course) / FK (Registration)
CourseTitle VARCHAR 100 Descriptive title of the course Alphanumeric, spaces Not NULL
LecturerID CHAR 3 Identifier for a lecturer Pattern Lnn (e.g., L01) Not NULL, unique in Lecturer PK (Lecturer) / FK (Course)
Department VARCHAR 50 Academic department of the lecturer Alphabetic Not NULL
Semester VARCHAR 10 Semester in which the course was taken Fall YYYY, Spring YYYY Check against list of valid semesters
Grade CHAR 2 Result achieved for the course A, A‑, B+, B, …, F Must match grade list

7. Summary Checklist – Is the Design in 3NF?

  • ✅ All tables are in 2NF (no partial dependencies).
  • ✅ No transitive dependencies remain – every non‑key attribute depends only on its table’s primary key.
  • ✅ Primary keys are defined for every table; foreign keys correctly reference the related primary keys.
  • ✅ Referential integrity rules (ON UPDATE/DELETE) are documented.
  • ✅ Data dictionary fully describes each field, including allowed values and validation.

Create an account or Login to take a Quiz

46 views
0 improvement suggestions

Log in to suggest improvements to this note.