Denormalization - When Breaking the Rules Makes Sense

Table of Contents

Remember all those database normalization rules we learned in school? The ones about avoiding redundancy at all costs and keeping our data neat and tidy? Sometimes we need to throw those rules out the window. Enter denormalization: the art of strategically making our database “messier” for the sake of performance.

What is Denormalization? #

Simply put, denormalization is the process of adding redundant data to our database to speed up read operations. Instead of joining multiple tables to get the information we need, we replicate some data in multiple places.

So, instead of enduring the long latency of join operations on each query. All the data can be retrieved in one single row, when theorically it would have been split in multiple tables.

Like in a kitchen where you would theorically have a perfect spot in the drawers for each ustensil. In reality, the most commonly used ones are better of to be kept on the counter.

When Should You Denormalize? #

Here are the main scenarios where denormalization makes sense:

When you have read-heavy workloads where performance is crucial
When certain joins are expensive and frequently performed
When you need to avoid complex queries that touch multiple tables
When the data rarely changes, making maintenance of redundancy less problematic

A Simple Example #

Let’s say we have an e-commerce application with orders and users tables. In a normalized database, to get order information with user details, we’d need:

SELECT orders.*, users.name, users.email 
FROM orders 
JOIN users ON orders.user_id = users.id
WHERE orders.id = 123;

With denormalization, we might store the user’s name and email directly in the orders table:

SELECT * FROM orders WHERE id = 123;

Much simpler, right? Of course, now when a user changes their email, we need to update it in multiple places - but if reads vastly outnumber updates in our system, this trade-off might be worth it.

The Trade-offs #

Like everything in software engineering, denormalization comes with its costs:

Storage space: You’re storing the same data multiple times
Write complexity: Updates need to happen in multiple places
Data consistency risks: If updates fail in one place but succeed in another
Maintenance overhead: More complex schema to manage and maintain

When to Avoid It #

Don’t rush to denormalize if:

Your data changes frequently
Storage space is a concern
Write performance is more critical than read performance
Your application can’t tolerate occasional data inconsistencies