Class 28: Advanced SQL & Database Design

In the previous class, we covered the basics of relational databases and fundamental SQL queries. Today, we'll expand our SQL knowledge with more advanced DML operations and, more importantly, dive into the crucial topic of database design, including relationships and normalization.

SQL Data Manipulation Language (DML) - Continued

`INSERT INTO`: Adding New Rows

Used to add new records to a table.

-- Insert a single new book
INSERT INTO books (title, author, year)
VALUES ('The Lord of the Rings', 'J.R.R. Tolkien', 1954);

-- Insert multiple books in one statement
INSERT INTO books (title, author, year) VALUES
('Pride and Prejudice', 'Jane Austen', 1813),
('To the Lighthouse', 'Virginia Woolf', 1927);

`UPDATE`: Modifying Existing Rows

Used to modify existing records in a table. Always use a WHERE clause to specify which rows to update; otherwise, all rows will be updated!

-- Update the year of a specific book
UPDATE books
SET year = 1980
WHERE id = 1;

-- Update multiple columns for a book
UPDATE books
SET title = 'Nineteen Eighty-Four', author = 'George Orwell'
WHERE id = 2;

`DELETE FROM`: Removing Rows

Used to delete existing records from a table. Again, always use a WHERE clause to specify which rows to delete; otherwise, all rows will be deleted!

-- Delete a specific book
DELETE FROM books
WHERE id = 3;

-- Delete all books published before 1900
DELETE FROM books
WHERE year < 1900;

Aggregate Functions

Aggregate functions perform a calculation on a set of rows and return a single value.

COUNT(): Returns the number of rows.

SELECT COUNT(*) FROM books; -- Total number of books
SELECT COUNT(year) FROM books; -- Number of books with a non-NULL year

SUM(): Calculates the sum of a numeric column.
```
SELECT SUM(price) FROM products;
```
AVG(): Calculates the average of a numeric column.
```
SELECT AVG(rating) FROM reviews;
```

MIN(): Returns the minimum value in a column.

SELECT MIN(year) FROM books; -- Oldest publication year

MAX(): Returns the maximum value in a column.

SELECT MAX(year) FROM books; -- Newest publication year

`GROUP BY`: Grouping Rows

Used with aggregate functions to group rows that have the same values in specified columns into summary rows.

-- Count books by author
SELECT author, COUNT(*) AS num_books
FROM books
GROUP BY author;

-- Calculate average price per category
SELECT category, AVG(price) AS average_price
FROM products
GROUP BY category;

`HAVING`: Filtering Groups

Used to filter groups based on a specified condition. It's like WHERE, but for groups created by GROUP BY.

-- Find authors who have written more than 1 book
SELECT author, COUNT(*) AS num_books
FROM books
GROUP BY author
HAVING COUNT(*) > 1;

-- Find categories where the average product price is over $500
SELECT category, AVG(price) AS average_price
FROM products
GROUP BY category
HAVING AVG(price) > 500;

Joining Tables

The power of relational databases lies in their ability to establish relationships between tables. Joins are used to combine rows from two or more tables based on a related column between them.

Consider these two tables:

-- Authors Table
CREATE TABLE authors (
    author_id INTEGER PRIMARY KEY AUTOINCREMENT,
    name TEXT NOT NULL
);

-- Books Table (with a foreign key to authors)
CREATE TABLE books (
    book_id INTEGER PRIMARY KEY AUTOINCREMENT,
    title TEXT NOT NULL,
    publication_year INTEGER,
    author_id INTEGER,
    FOREIGN KEY (author_id) REFERENCES authors(author_id)
);

-- Sample Data
INSERT INTO authors (name) VALUES ('Douglas Adams'), ('George Orwell'), ('Harper Lee');
INSERT INTO books (title, publication_year, author_id) VALUES
('The Hitchhiker''s Guide to the Galaxy', 1979, 1),
('1984', 1949, 2),
('To Kill a Mockingbird', 1960, 3),
('The Restaurant at the End of the Universe', 1980, 1);

`INNER JOIN`:

Returns rows when there is a match in both tables. It's the most common type of join.

-- Get book titles along with their author names
SELECT b.title, a.name AS author_name
FROM books AS b
INNER JOIN authors AS a ON b.author_id = a.author_id;

-- Result:
-- title                                  | author_name
-- ---------------------------------------|-------------
-- The Hitchhiker's Guide to the Galaxy   | Douglas Adams
-- 1984                                   | George Orwell
-- To Kill a Mockingbird                  | Harper Lee
-- The Restaurant at the End of the Universe| Douglas Adams

`LEFT JOIN` (or `LEFT OUTER JOIN`):

Returns all rows from the left table, and the matched rows from the right table. If there's no match, NULLs are returned for the right table's columns.

-- Find all authors and any books they've written (even if they haven't written any)
SELECT a.name AS author_name, b.title
FROM authors AS a
LEFT JOIN books AS b ON a.author_id = b.author_id;

`RIGHT JOIN` (or `RIGHT OUTER JOIN`):

Returns all rows from the right table, and the matched rows from the left table. If there's no match, NULLs are returned for the left table's columns. (Less common, as you can often achieve the same with a LEFT JOIN by swapping tables).

-- Find all books and their authors (even if a book has no author_id or author doesn't exist)
SELECT b.title, a.name AS author_name
FROM authors AS a
RIGHT JOIN books AS b ON a.author_id = b.author_id;

`FULL JOIN` (or `FULL OUTER JOIN`):

Returns all rows when there is a match in one of the tables. If there's no match, NULLs are returned for the columns of the table that doesn't have a match.

-- Get all authors and all books, matching where possible
SELECT a.name AS author_name, b.title
FROM authors AS a
FULL JOIN books AS b ON a.author_id = b.author_id;

Database Modeling and Design

Database design is the process of creating a detailed data model of a database. A well-designed database is crucial for data integrity, performance, and maintainability.

Entities and Attributes:

Entities: The main objects or concepts about which you want to store information (e.g., Users, Products, Orders, Authors, Books). These typically become your tables.
Attributes: The properties or characteristics of an entity (e.g., for a "Book" entity: Title, Author, Publication Year, ISBN). These become your columns.

Relationships:

How entities are connected to each other.

One-to-One (1:1):
Each record in Table A relates to exactly one record in Table B, and vice-versa.

Example: Users to UserProfiles (where a user has exactly one profile, and a profile belongs to exactly one user).
One-to-Many (1:M):
Each record in Table A can relate to one or more records in Table B, but each record in Table B relates to only one record in Table A.

Example: Authors to Books (one author can write many books, but each book is written by one author). This is implemented using a Foreign Key.
Many-to-Many (M:M):
Each record in Table A can relate to one or more records in Table B, and each record in Table B can relate to one or more records in Table A.

Example: Students to Courses (a student can take many courses, and a course can have many students). This requires a third table, often called a junction table or pivot table, to resolve the many-to-many relationship into two one-to-many relationships.

Entity-Relationship Diagrams (ERDs):

ERDs are visual representations of database schemas. They show entities (tables) and the relationships between them, helping in the design process.

Database Normalization

Normalization is a process of organizing the columns and tables of a relational database to minimize data redundancy and improve data integrity. It involves breaking down a large table into smaller, related tables.

Purpose:
- Reduce data redundancy (avoid storing the same data multiple times).
- Improve data integrity (ensure data is consistent and accurate).
- Simplify maintenance (updates, deletions are easier).
Normal Forms:
- 1NF (First Normal Form): Each column must contain atomic (indivisible) values, and there are no repeating groups of columns.
- 2NF (Second Normal Form): Must be in 1NF, and all non-key attributes must be fully dependent on the primary key. (Applies to tables with composite primary keys).
- 3NF (Third Normal Form): Must be in 2NF, and all non-key attributes must not have transitive dependencies on the primary key (i.e., they shouldn't depend on other non-key attributes).
Denormalization:
Sometimes, for performance reasons (e.g., faster reads), you might intentionally introduce some redundancy. This is called denormalization, and it's a trade-off between read performance and write performance/data integrity.

Indexes

An index is a special lookup table that the database search engine can use to speed up data retrieval. Think of it like an index in a book.

How indexes improve query performance: When you query a column that has an index, the database can quickly find the relevant rows without scanning the entire table. This is especially beneficial for large tables and frequently queried columns (like IDs, foreign keys, or columns used in WHERE clauses).
Trade-offs:
- Space: Indexes consume disk space.
- Write Performance: Every time data is inserted, updated, or deleted in the indexed columns, the index also needs to be updated, which adds overhead to write operations.

With this deeper understanding of SQL and database design, you're well-equipped to start connecting your Node.js/Express applications to a real relational database. In the next class, we'll do exactly that, moving beyond in-memory data!

Class 28: Advanced SQL & Database Design

SQL Data Manipulation Language (DML) - Continued

INSERT INTO: Adding New Rows

UPDATE: Modifying Existing Rows

DELETE FROM: Removing Rows