A caching story: where does a cache's speed come from?

Adding a cache to an application means adding one more component to maintain, monitor, and keep alive. And yet it’s one of the most cost-effective performance levers there is. How can a simple intermediary between your application and its data source divide response times by ten, or even by a hundred? A cache’s speed is nothing magical: it follows from a handful of well-identified physical and algorithmic principles.

In the first installment of this series, we laid the foundations of caching applied to the web: what a cache is, where it sits, and why we need one. This second episode tackles the question of why it goes faster. Understanding where the performance gain truly comes from is what lets you size your cache properly, and anticipate the pitfalls it introduces.

The four sources of a cache’s performance

A cache’s speed rests on the combination of four factors: the nature of the storage, the location of the data, the access algorithms, and compression. Taken together, they explain why a well-designed cache consistently outperforms direct access to the primary data source.

The nature of the storage: memory over disk

The first factor is where the data is physically stored. An access in RAM is several orders of magnitude faster than an input/output (I/O) operation on a disk. Where a disk read is measured in milliseconds, a memory access is measured in nanoseconds. This is precisely why caches favor in-memory storage.

Redis and Memcached are the best-known examples of in-memory stores. These are databases that keep all (or most) of their data in RAM, with a full feature set: varied data structures, automatic expiration, replication, optional persistence to disk.

But you don’t always need a dedicated server. In very specific cases, or simply to experiment, you can manage your own in-memory data structures directly inside your program. A plain HashMap (or Map in JavaScript) initialized on the heap already taps into RAM and offers near-instant access:

// A minimalist application cache, on the heap in memory
const cache = new Map<string, unknown>();

function getUser(id: string) {
  if (cache.has(id)) {
    return cache.get(id); // memory access: near-instant
  }

  const user = fetchUserFromDatabase(id); // disk/network I/O: expensive
  cache.set(id, user);
  return user;
}

This kind of local cache is unbeatable on latency, but it has its limits: it lives and dies with the process, isn’t shared across multiple instances, and consumes your application’s memory. That’s where a dedicated store like Redis comes into its own at scale.

The location of the data: the closer, the faster

The second factor is distance. The closer a piece of data is to whoever consumes it, the faster its access. It’s a physical constraint: information doesn’t travel faster than light, and every kilometer of network adds latency.

That’s the whole purpose of CDNs (Content Delivery Networks). A CDN is a set of servers replicated all over the world, tasked with distributing data as close as possible to users. When a visitor in Asia requests a resource originally hosted in Europe, the CDN serves it from a local point of presence. Latency is never zero, but it’s drastically reduced compared to a transcontinental round trip.

And there’s something even faster than the CDN: the local environment. The highest-performing caching is the kind done directly on the user’s machine. The only latency left is then that of the communication between the machine’s own hardware components. This is exactly what your browser does: Chrome, for instance, maintains a local web cache to avoid re-downloading resources it already knows. We can summarize this hierarchy as follows:

Cache location	Typical latency
Local cache (browser, application memory)	lowest
CDN (regional point of presence)	low to moderate
Origin server / database	highest

Designing a caching strategy is largely about deciding at which level of this hierarchy to place each piece of data.

The algorithms: indexing, hashing, ranking

A fast cache doesn’t just store in memory: it must also know how to retrieve a piece of data instantly and manage its contents intelligently. That’s the job of the algorithms.

For lookups, caches rely on indexing and hashing. A hash table makes it possible to locate an entry in constant time, regardless of how many elements are stored: you compute the key’s fingerprint, and you access the corresponding slot directly.

For content management, continuous-analysis strategies keep the cache at a reasonable size by removing expired or rarely used entries. Two ranking mechanisms are the reference points:

LRU (Least Recently Used): evict, as a priority, the data whose access is the oldest. The assumption is that data not consulted for a long time is unlikely to be consulted again soon.
LFU (Least Frequently Used): evict, as a priority, the least frequently consulted data, counting the number of accesses rather than their recency.

These algorithms favor access to the most useful data while preventing the cache from growing indefinitely. They sit at the heart of invalidation and eviction strategies: a topic in its own right that we’ll explore in detail in the third and final installment of the series.

Compression: transfer and store less

The fourth factor is data compression. Compressing data before caching it (and transferring it) produces three cumulative benefits:

a faster transfer, since there are fewer bytes to move across the network;
reduced storage space, which lets you keep more data in the cache for the same amount of memory;
lower bandwidth consumption, which eases the infrastructure and reduces costs.

The cost of compression (a bit of CPU on write and on read) is almost always far outweighed by the gains on large data or data transferred over the network.

The flip side of the coin: the single source of truth

Everything above explains why a cache is fast. But performance is never free: a cache also introduces a fundamental issue that has to be considered from the design stage.

In the vast majority of cases, a cache comes in addition to a primary data source, not in its place. Yet adding a cache means adding a new persistence source, and therefore a new source of truth. You mechanically end up with the same information stored in two places, and the single source of truth (SSOT) principle is put to the test.

The risk is one of consistency. The last thing you want is for some users to fetch, from the cache, data that’s stale or out of sync with the primary source. Data that’s been modified in the database but still sits unchanged in the cache is an inconsistency served to your users.

To keep this risk under control, a cache must therefore offer two capabilities:

invalidate data, that is, remove it from the cache when it’s no longer up to date;
configure expiration durations (TTL), beyond which an entry is automatically considered stale.

These two mechanisms are what separate a useful cache from a dangerous one. Without them, you trade performance for consistency bugs that are often insidious and hard to diagnose.

Conclusion

A cache’s performance is no accident: it comes from memory rather than disk, from proximity rather than distance, from efficient access and eviction algorithms, and from compression. These four levers, combined, explain why a well-designed cache radically transforms an application’s response times.

But this speed has a counterpart: by duplicating data, the cache weakens the single source of truth and places a consistency risk on your system. Keeping it in check goes through a sound invalidation and expiration strategy, which is precisely the subject of the third and final installment of this series, where we’ll dig into the details of cache invalidation and eviction.