Class 30: NoSQL Databases - MongoDB Basics

In the previous classes, we focused on relational databases and SQL. While powerful and widely used, relational databases might not always be the best fit for every type of data or application need, especially when dealing with large volumes of unstructured or semi-structured data, or requiring extreme horizontal scalability.

Today, we introduce NoSQL databases, a diverse category of databases that provide alternatives to the traditional relational model. We'll specifically focus on MongoDB, a popular document-oriented NoSQL database.

Introduction to NoSQL Databases

What is NoSQL?
The term "NoSQL" stands for "Not only SQL." It's a broad category of databases that do not use the traditional tabular relational model. They offer flexible schemas, horizontal scalability, and high performance for specific use cases.
Why NoSQL?
- Scalability: Designed for horizontal scaling (distributing data across many servers), making them suitable for big data and high traffic.
- Flexibility: Often schema-less or have flexible schemas, allowing for rapid development and easy evolution of data models.
- Performance: Can offer superior performance for certain types of queries and data access patterns compared to relational databases.
- Variety of Data Models: Different NoSQL databases are optimized for different data structures and access patterns.
CAP Theorem:
The CAP theorem states that a distributed data store cannot simultaneously provide more than two out of three guarantees:
- Consistency (C): All clients see the same data at the same time, regardless of which node they connect to.
- Availability (A): Every request receives a response (without guarantee of it being the latest version of the information).
- Partition Tolerance (P): The system continues to operate despite arbitrary numbers of messages being dropped (or delayed) by the network between nodes.
NoSQL databases often prioritize Availability and Partition Tolerance over strong Consistency, especially in large distributed systems, leading to "eventual consistency."

Categories of NoSQL Databases

NoSQL databases come in various types, each optimized for different data models:

Document Databases:
Store data in flexible, semi-structured documents (often JSON-like). Ideal for hierarchical data and content management.
Examples: MongoDB, Couchbase.
Key-Value Stores:
The simplest NoSQL databases, storing data as a collection of key-value pairs. Highly performant for simple lookups.
Examples: Redis, Amazon DynamoDB.
Column-Family Stores:
Store data in columns organized into column families. Optimized for large datasets and high write throughput.
Examples: Apache Cassandra, HBase.
Graph Databases:
Store data as nodes and edges, representing entities and their relationships. Excellent for highly connected data like social networks or recommendation engines.
Examples: Neo4j.

Focus on MongoDB (Document Database)

MongoDB is a leading open-source, document-oriented NoSQL database. It stores data in flexible, JSON-like documents, meaning fields can vary from document to document and data structure can be changed over time.

Why MongoDB?
- Schema-less (Flexible Schema): Documents in a collection do not need to have the same set of fields or structure. This allows for rapid iteration and schema evolution.
- Horizontal Scalability: Designed to scale out by distributing data across multiple servers (sharding).
- Developer-friendly: Uses JSON-like documents, which maps naturally to JavaScript objects, making it intuitive for JavaScript developers.
- Rich Query Language: Supports a powerful query language with aggregation pipelines for complex data processing.

MongoDB Core Concepts

Documents:
The basic unit of data in MongoDB. Documents are JSON-like BSON (Binary JSON) records. They can contain fields, arrays, and even other embedded documents.
```
// Example MongoDB Document (for a book)
{
  "_id": ObjectId("60c72b2f9b1d8b001c8e4d7a"), // Unique identifier, automatically generated
  "title": "The Hitchhiker's Guide to the Galaxy",
  "author": "Douglas Adams",
  "year": 1979,
  "genres": ["Science Fiction", "Comedy"],
  "publisher": {
    "name": "Pan Books",
    "location": "London"
  }
}
```
- Flexible Schema: Unlike SQL tables, documents in the same collection can have different fields.
- Embedded documents and arrays: MongoDB allows embedding related data within a single document, reducing the need for joins (common in SQL).
Collections:
A grouping of documents. Collections are analogous to tables in relational databases. They do not enforce a schema.
Databases:
A container for collections. A MongoDB server can host multiple databases.
_id field:
Every document in MongoDB has a unique _id field, which acts as the primary key. If you don't provide one, MongoDB automatically generates an ObjectId.

Setting Up MongoDB Locally

You can install MongoDB Community Server on your machine or use Docker, which is often preferred for development.

Installation (Native): Follow instructions on MongoDB Documentation.

Using Docker (Recommended for development):

# Pull the MongoDB image
docker pull mongo

# Run a MongoDB container
# -d: detached mode
# --name: name your container
# -p 27017:27017: map host port 27017 to container port 27017 (default MongoDB port)
# mongo: image name
docker run --name my-mongo -p 27017:27017 -d mongo

Using MongoDB Compass (GUI):

MongoDB Compass is a free, official GUI for MongoDB. It allows you to visually explore your data, run ad-hoc queries, and interact with your database. Download it from MongoDB website.

Basic MongoDB Operations (CLI/Compass)

You can interact with MongoDB using the mongosh shell (CLI) or MongoDB Compass. Here are some basic operations:

1. Creating Databases and Collections:

In MongoDB, databases and collections are created implicitly when you first store data in them.

// In mongosh or Compass "Shell" tab
use mybookstore; // Switches to or creates 'mybookstore' database
db.books.insertOne({ title: "Moby Dick", author: "Herman Melville" }); // Creates 'books' collection and inserts a document

2. Inserting Documents:

db.collection.insertOne(): Inserts a single document.

db.books.insertOne({
  title: "The Great Gatsby",
  author: "F. Scott Fitzgerald",
  year: 1925,
  genres: ["Classic", "Fiction"]
});

db.collection.insertMany(): Inserts multiple documents.

db.books.insertMany([
  { title: "To Kill a Mockingbird", author: "Harper Lee", year: 1960 },
  { title: "1984", author: "George Orwell", year: 1949 }
]);

3. Finding Documents:

db.collection.find(): Retrieves all documents in a collection or documents matching a query.

db.books.find(); // Find all books
db.books.find({ author: "George Orwell" }); // Find books by a specific author
db.books.find({ year: { $gte: 1950 } }); // Find books published in 1950 or later ($gte is "greater than or equal")

db.collection.findOne(): Retrieves a single document matching a query.
```
db.books.findOne({ title: "1984" });
```

4. Updating Documents:

db.collection.updateOne(): Updates a single document matching the filter.

db.books.updateOne(
  { title: "Moby Dick" }, // Filter
  { $set: { year: 1851, genres: ["Adventure", "Classic"] } } // Update operator
);

db.collection.updateMany(): Updates multiple documents matching the filter.
```
db.books.updateMany(
  { author: "Harper Lee" },
  { $set: { status: "available" } }
);
```

5. Deleting Documents:

db.collection.deleteOne(): Deletes a single document matching the filter.
```
db.books.deleteOne({ title: "The Great Gatsby" });
```
db.collection.deleteMany(): Deletes multiple documents matching the filter.
```
db.books.deleteMany({ year: { $lt: 1900 } }); // Delete books published before 1900
```

Advantages and Disadvantages of NoSQL vs. SQL

Feature	SQL (Relational)	NoSQL (Non-Relational)
Schema	Strict, predefined schema. Changes require migrations.	Flexible, dynamic schema (schema-less). Easy to evolve.
Data Model	Tables with rows and columns. Relationships via foreign keys.	Various: Document, Key-Value, Column-Family, Graph.
Scalability	Primarily vertical scaling (more powerful server). Horizontal scaling (sharding) is complex.	Primarily horizontal scaling (distribute across many servers).
ACID Compliance	Strong ACID (Atomicity, Consistency, Isolation, Durability) guarantees for transactions.	Often BASE (Basically Available, Soft state, Eventual consistency). Transactions are more complex or limited.
Query Language	SQL (Structured Query Language) - powerful for complex joins.	Object-oriented query languages, APIs. No standard language.
Use Cases	Complex transactions, strong data integrity, structured data (e.g., financial systems, traditional ERP).	Large volumes of unstructured/semi-structured data, rapid development, real-time apps, flexible data models (e.g., social media, IoT, content management).

When to choose which type of database:

Choose SQL (Relational) when:
- Your data is highly structured and relationships are critical.
- You need strong ACID compliance (e.g., banking, e-commerce transactions).
- You need complex queries involving many joins.
Choose NoSQL (Non-Relational) when:
- You need rapid iteration and a flexible schema.
- You need to handle very large volumes of data and traffic (horizontal scalability).
- Your data is unstructured or semi-structured (e.g., documents, logs, social media posts).
- Your application requires very high read/write throughput for specific data access patterns.

Many modern applications use a combination of both SQL and NoSQL databases (polyglot persistence), choosing the best tool for each specific data storage need.

In the next class, we'll connect our Node.js/Express API to MongoDB using Mongoose, an ODM (Object Data Modeling) library, and perform CRUD operations.