About Database Normalization and Denormalization

A normalized database is one where data duplication is minimized, and everything is organized into related tables to maintain a single source of truth. This keeps the data consistent and clean but can make queries more complex because you often need to join multiple tables.

# models.py - normalized version
from django.db import models

class Author(models.Model):
    first_name = models.CharField(max_length=100)
    last_name = models.CharField(max_length=100)

    def __str__(self):
        return f"{self.first_name} {self.last_name}"

class Book(models.Model):
    title = models.CharField(max_length=255)
    isbn = models.CharField(max_length=13, unique=True)
    publication_date = models.DateField()
    authors = models.ManyToManyField(Author, related_name="books")

    def __str__(self):
        return self.title

A denormalized database adds redundant or aggregated fields to speed up data access by avoiding joins. This improves read speed but complicates updates, as redundant data may need recalculating or syncing, risking inconsistencies.

# models.py - denormalized version
from django.db import models

class Book(models.Model):
    title = models.CharField(max_length=255)
    isbn = models.CharField(max_length=13, unique=True)
    publication_date = models.DateField()

    # Denormalized author fields
    author_names = models.JSONField()  # Stores list of author names
    primary_author_first_name = models.CharField(max_length=100)
    primary_author_last_name = models.CharField(max_length=100)

    def __str__(self):
        return self.title

In short:

  • Normalization focuses on keeping data clean and organized but can slow down queries.
  • Denormalization makes queries faster but adds storage costs and can be tricky to maintain.

Tips and Tricks Architecture Databases PostgreSQL MySQL SQLite