rembrembdocs

Drizzle Relations Fundamentals

In the world of databases, especially relational databases, the concept of relations is absolutely fundamental. Think of “relations” as the connections and links between different pieces of data. Just like in real life, where people have relationships with each other, or objects are related to categories, databases use relations to model how different types of information are connected and work together.

Normalization

Normalization is the process of organizing data in your database to reduce redundancy (duplication) and improve data integrity (accuracy and consistency). Think of it like tidying up a messy filing cabinet. Instead of having all sorts of papers crammed into one folder, you organize them into logical folders and categories to make everything easier to find and manage.

Why is Normalization Important?

Normalization is often described in terms of “normal forms” (1NF, 2NF, 3NF, and beyond). While the details can get quite technical, the core ideas are straightforward:

1NF (First Normal Form): Atomic Values

Goal: Each column should hold a single, indivisible value. No repeating groups of data within a single cell

Example: Instead of having a single address column that stores 123 Main St, City, USA, you’d break it down into separate columns: street_address, city, state, zip_code.

-- Unnormalized (violates 1NF)
CREATE TABLE Customers_Unnormalized (
    customer_id INT PRIMARY KEY,
    name VARCHAR(255),
    address VARCHAR(255) -- Problem: Multiple pieces of info in one column
);

-- Normalized to 1NF
CREATE TABLE Customers_1NF (
    customer_id INT PRIMARY KEY,
    name VARCHAR(255),
    street_address VARCHAR(255),
    city VARCHAR(255),
    state VARCHAR(255),
    zip_code VARCHAR(10)
);

2NF (Second Normal Form): Eliminate Redundant Data Dependent on Part of the Key

Goal: Applies when you have a table with a composite primary key (a primary key made up of two or more columns). 2NF ensures that all non-key attributes are fully dependent on the entire composite primary key, not just part of it.

Imagine we have a table called order_items. This table tracks items within orders, and we use a composite primary key (order_id, product_id) because a single order can have multiple of the same product (though in this simplified example, let’s assume each product appears only once per order for clarity, but the composite key logic still applies).

Expand for visual example

CREATE TABLE OrderItems_Unnormalized (
    order_id INT,
    product_id VARCHAR(10),
    product_name VARCHAR(100),
    product_price DECIMAL(10, 2),
    quantity INT,
    order_date DATE,
    PRIMARY KEY (order_id, product_id) -- Composite Primary Key
);

INSERT INTO OrderItems_Unnormalized (order_id, product_id, product_name, product_price, quantity, order_date) VALUES
(101, 'A123', 'Laptop', 1200.00, 1, '2023-10-27'),
(101, 'B456', 'Mouse', 25.00, 2, '2023-10-27'),
(102, 'A123', 'Laptop', 1200.00, 1, '2023-10-28'),
(103, 'C789', 'Keyboard', 75.00, 1, '2023-10-29');
+------------------------------------------------------------------------------------+
| OrderItems_Unnormalized                                                            |
+------------------------------------------------------------------------------------+
| PK (order_id, product_id) | product_name | product_price | quantity | order_date   |
+------------------------------------------------------------------------------------+
| 101, A123               | Laptop       | 1200.00       | 1        | 2023-10-27     |
| 101, B456               | Mouse        | 25.00         | 2        | 2023-10-27     |
| 102, A123               | Laptop       | 1200.00       | 1        | 2023-10-28     |
| 103, C789               | Keyboard     | 75.00         | 1        | 2023-10-29     |
+------------------------------------------------------------------------------------+

Problem: Notice that product_name and product_price are repeated whenever the same product_id appears in different orders. These attributes are only dependent on product_id, which is part of the composite primary key (order_id, product_id), but not the entire key. This is a partial dependency.

To achieve 2NF, we need to remove the partially dependent attributes (product_name, product_price) and place them in a separate table where they are fully dependent on the primary key of that new table.

Normalization to 2NF: Visual explanation

+-------------------+     1:M     +---------------------------+
| Products          | <---------- | OrderItems_2NF            |
+-------------------+             +---------------------------+
| PK product_id     |             | PK (order_id, product_id) |
| product_name      |             | quantity                  |
| product_price     |             | order_date                |
+-------------------+             | FK product_id             |
                                  +---------------------------+
CREATE TABLE Products (
    product_id VARCHAR(10) PRIMARY KEY,
    product_name VARCHAR(100),
    product_price DECIMAL(10, 2)
);

CREATE TABLE OrderItems_2NF (
    order_id INT,
    product_id VARCHAR(10),
    quantity INT,
    order_date DATE,
    PRIMARY KEY (order_id, product_id), -- Composite Primary Key remains
    FOREIGN KEY (product_id) REFERENCES Products(product_id) -- Foreign Key to Products
);

-- Insert data into Products
INSERT INTO Products (product_id, product_name, product_price) VALUES
('A123', 'Laptop', 1200.00),
('B456', 'Mouse', 25.00),
('C789', 'Keyboard', 75.00);

-- Insert data into OrderItems_2NF (referencing Products)
INSERT INTO OrderItems_2NF (order_id, product_id, quantity, order_date) VALUES
(101, 'A123', 1, '2023-10-27'),
(101, 'B456', 2, '2023-10-27'),
(102, 'A123', 1, '2023-10-28'),
(103, 'C789', 1, '2023-10-29');

3NF (Third Normal Form): Eliminate Redundant Data Dependent on Non-Key Attributes

Goal: Remove data that is dependent on other non-key attributes. This is about eliminating transitive dependencies.

Problem: Let’s say we have a suppliers table. We store supplier information, including their zip_code, city, and state. supplier_id is the primary key.

CREATE TABLE suppliers (
    supplier_id VARCHAR(10) PRIMARY KEY,
    supplier_name VARCHAR(255),
    zip_code VARCHAR(10),
    city VARCHAR(100),
    state VARCHAR(50)
);

INSERT INTO suppliers (supplier_id, supplier_name, zip_code, city, state) VALUES
('S1', 'Acme Corp', '12345', 'Anytown', 'NY'),
('S2', 'Beta Inc', '67890', 'Otherville', 'CA'),
('S3', 'Gamma Ltd', '12345', 'Anytown', 'NY');
+---------------------------------------------------------------+
| suppliers                                                     |
+---------------------------------------------------------------+
| PK supplier_id | supplier_name | zip_code | city      | state |
+---------------------------------------------------------------+
| S1             | Acme Corp     | 12345    | Anytown    | NY   |
| S2             | Beta Inc      | 67890    | Otherville | CA   |
| S3             | Gamma Ltd     | 12345    | Anytown    | NY   |
+---------------------------------------------------------------+

Solution: To achieve 3NF, we remove the attributes dependent on the non-key attribute (city, state dependent on zip_code) and put them into a separate table keyed by the non-key attribute itself (zip_code).

Normalization to 3NF: Visual explanation

+-------------------+     1:M     +--------------------+
| zip_codes         | <---------- | suppliers          |
+-------------------+             +--------------------+
| PK zip_code       |             | PK supplier_id     |
| city              |             | supplier_name      |
| state             |             | FK zip_code        |
+-------------------+             +--------------------+
CREATE TABLE zip_codes (
    zip_code VARCHAR(10) PRIMARY KEY,
    city VARCHAR(100),
    state VARCHAR(50)
);

CREATE TABLE suppliers (
    supplier_id VARCHAR(10) PRIMARY KEY,
    supplier_name VARCHAR(255),
    zip_code VARCHAR(10), -- Foreign Key to zip_codes
    FOREIGN KEY (zip_code) REFERENCES zip_codes(zip_code)
);

-- Insert data into zip_codes
INSERT INTO zip_codes (zip_code, city, state) VALUES
('12345', 'Anytown', 'NY'),
('67890', 'Otherville', 'CA');

-- Insert data into suppliers (referencing zip_codes)
INSERT INTO suppliers (supplier_id, supplier_name, zip_code) VALUES
('S1', 'Acme Corp', '12345'),
('S2', 'Beta Inc', '67890'),
('S3', 'Gamma Ltd', '12345');

Good to know

There are additional normal forms, such as 4NF, 5NF, 6NF, EKNF, ETNF, and DKNF. We won’t cover these here, but we will create a dedicated set of tutorials for them in our guides and tutorials section.

Database Relationships

One-to-One

In a one-to-one relationship, each record in table A is related to at most one record in table B, and each record in table B is related to at most one record in table A. It’s a very direct, exclusive pairing.

  1. User Profiles and User Account Details: Think of a website. Each user account (in a Users table) might have exactly one user profile (in a UserProfiles table) containing more detailed information.
  2. Employees and Parking Spaces: An Employees table and a ParkingSpaces table. Each employee might be assigned at most one parking space, and each parking space is assigned to at most one employee.
  3. Splitting Tables for Organization: Sometimes, you might split a very wide table into two for better organization or security reasons, maintaining a 1-1 relationship between them.
Table A (One Side)      Table B (One Side)
+---------+             +---------+
| PK (A)  | <---------> | FK (A)  | (Foreign Key referencing Table A)
| ...     |             | ...     |
+---------+             +---------+

One-to-Many

In a one-to-many relationship, one record in table A can be related to many records in table B, but each record in table B is related to at most one record in table A. Think of it as a “parent-child” relationship.

  1. Customers and Orders: One customer can place many orders, but each order belongs to only one customer.
  2. Authors and Books: One author can write many books, but (let’s simplify for now and say) each book is written by one primary author.
  3. Departments and Employees: One department can have many employees, but each employee belongs to only one department.
Table A (One Side)      Table B (Many Side)
+---------+             +---------+
| PK (A)  | ----------> | FK (A)  | (Foreign Key referencing Table A)
| ...     |             | ...     |
+---------+             +---------+
     (One)                  (Many)

Many-to-Many

In a many-to-many relationship, one record in table A can be related to many records in table B, and one record in table B can be related to many records in table A. It’s a more complex, bidirectional relationship.

  1. Students and Courses: One student can enroll in many courses, and one course can have many students enrolled.
  2. Products and Categories: One product can belong to multiple categories (e.g., a “T-shirt” can be in “Clothing” and “Summer Wear” categories), and one category can contain many products.
  3. Authors and Books: A book can be written by multiple authors, and an author can write multiple books.
Table A (Many Side)    Junction Table      Table B (Many Side)
+---------+          +-------------+     +---------+
| PK (A)  | -------->| FK (A)      | <----| FK (B)  |
| ...     |          | FK (B)      |     | ...     |
+---------+          +-------------+     +---------+
     (Many)             (Junction)          (Many)

Many-to-many relationships are not directly implemented with foreign keys between the two main tables. Instead, you need a junction table (also called an associative table or bridging table). This table acts as an intermediary to link records from both tables.

-- Table for Students (Many side)
CREATE TABLE students (
    iid INT PRIMARY KEY,
    name VARCHAR(255)
);

-- Table for Courses (Many side)
CREATE TABLE courses (
    id INT PRIMARY KEY,
    name VARCHAR(255),
    credits INT
);

-- Junction Table: Enrollments (Connects Students and Courses - M-M relationship)
CREATE TABLE enrollments (
    id INT PRIMARY KEY AUTO_INCREMENT, -- Optional, but good practice for junction tables
    student_id INT,
    course_id INT,
    enrollment_date DATE,
    -- Composite Foreign Keys (often part of a composite primary key or unique constraint)
    FOREIGN KEY (student_id) REFERENCES students(id),
    FOREIGN KEY (course_id) REFERENCES courses(id),
    UNIQUE KEY (student_id, course_id) -- Prevent duplicate enrollments for the same student and course
);

Why Foreign Keys?

You might think of foreign key constraints as simply a way to validate data - ensuring that when you enter a value in a foreign key column, that value actually exists in the primary key column of another table. And you’d be partially right! This value checking is the mechanism foreign keys use.

But it’s crucial to understand that this validation is not the end goal, it’s the means to a much larger purpose. Foreign key constraints are fundamentally about:

1. Explicitly Defining and Enforcing Relationships

We’ve discussed relationships like One-to-Many between Customers and Orders. A foreign key is the SQL language’s way of telling the database:

Hey database, I want to enforce a 1-M relationship here. Every value in the customer_id column of the Orders table must correspond to a valid customer_id in the Customers table.

It’s not just a suggestion; it’s a constraint the database actively enforces. The database becomes relationship-aware because of the foreign key.

2. Maintaining Referential Integrity

Example:

Without a foreign key, you could accidentally delete a customer from the Customers 
table while their orders still exist in the Orders table. Suddenly, you have orders that point to 
a customer that no longer exists! A foreign key constraint prevents this data inconsistency.

3. Facilitating Database Design and Understanding

In essence, foreign key constraints are not just about checking values; they are about:

  1. Defining the rules of your data relationships
  2. Actively enforcing those rules at the database level
  3. Guaranteeing data integrity and consistency within those relationships
  4. Making your database more robust, reliable, and understandable

Why NOT Foreign Keys?

While highly beneficial, there are some scenarios where you might reconsider or use Foreign Keys with caution. These are typically edge cases and often involve trade-offs.

1. Performance Overhead in Very High-Write Environments

2. Distributed Database Systems and Cross-Node Foreign Keys:

3. Legacy Systems and Data Integration with Non-Relational Data:

You can also check out some great explanations from the PlanetScale team in their article

Polymorphic Relations

Polymorphic relationships are a more advanced concept that allows a single relationship to point to different types of entities or tables. It’s about creating more flexible and adaptable relationships when you have different kinds of data that share some commonality.

Imagine you have an activities log. An activity could be a comment a like or a share. Each of these activity types has different details. Instead of creating separate tables and relationships for each activity type and the things they relate to, you might use a polymorphic approach.

Common Scenarios & Examples

+---------------------+
| **Comments**        |
+---------------------+
| PK comment_id       |
| commentable_type    | ------>  [Polymorphic Relationship]
| commentable_id      | -------->
| user_id             |
| comment_text        |
| ...                 |
+---------------------+
          ^
          |
+---------------------+    +---------------------+    +---------------------+
| **Articles**        |    | **Products**        |    | **Videos**          |
+---------------------+    +---------------------+    +---------------------+
| PK article_id       |    | PK product_id       |    | PK video_id         |
| ...                 |    | ...                 |    | ...                 |
+---------------------+    +---------------------+    +---------------------+
+----------------------+
| **Notifications**    |
+----------------------+
| PK notification_id  |
| notifiable_type     | ------>  [Polymorphic Relationship]
| notifiable_id       | -------->
| user_id             |
| message             |
| ...                  |
+----------------------+
           ^
           |
+---------------------+    +---------------------+    +-----------------------+
| **Users**           |    | **Orders**          |    | **System Events**     |
+---------------------+    +---------------------+    +-----------------------+
| PK user_id          |    | PK order_id         |    | PK event_id           |
| ...                 |    | ...                 |    | ...                   |
+---------------------+    +---------------------+    +-----------------------+

Polymorphic relationships are more complex and are often handled at the application level or using more advanced database features (depending on the specific database system). Standard SQL doesn’t have direct, built-in support for enforcing polymorphic foreign key constraints in the same way as regular foreign keys.