The DataLoader pattern: solving N+1 in GraphQL

In the first part of this series, we saw how the N+1 problem manifests itself, and why GraphQL is particularly exposed to it: a query that looks innocent on the client side translates, on the server side, into an avalanche of queries to the database. The real question now remains: how do we stem this flood? The answer lies in a pattern as simple as it is effective, the DataLoader.

One query instead of N

The intuition is elementary. Instead of running N additional queries, why not run just one?

On the SQL side, the problem is in fact trivial to express. Rather than hammering the database with one query per identifier:

SELECT * FROM reviews WHERE book_id = 1;
SELECT * FROM reviews WHERE book_id = 2;
-- ...
SELECT * FROM reviews WHERE book_id = N;

We can retrieve everything in a single pass with an IN clause:

SELECT * FROM reviews WHERE book_id IN (1, 2, ..., N);

Note: there are other ways to handle this kind of query in SQL (joins, subqueries, window functions), but let’s keep this case simple to illustrate the pattern.

From a SQL standpoint, then, it’s almost effortless: we get exactly the same result in a single query, without the performance penalty that comes from multiplying round trips.

The catch is that in a GraphQL context, we never write this IN query directly. Resolvers are executed in an isolated and independent way: by default, the resolver responsible for fetching a book’s reviews is invoked N times, once for each book in the list. And each invocation, taken on its own, has no knowledge of the other N-1. That’s precisely what reproduces the N+1 problem.

The whole challenge then becomes the following: how do we group these N independent invocations into a single query, equipped with all the necessary parameters?

The DataLoader pattern

This is exactly what the DataLoader pattern solves: a powerful solution to concurrent data fetching. Popularized by GraphQL, it remains entirely applicable outside that ecosystem, anywhere you want to aggregate scattered data accesses.

The DataLoader rests on two complementary mechanisms: batching and caching.

Batching: defer to better group

The Loader has lazy behavior. Rather than running each query immediately (eagerly), at the precise moment a resolver requests it, it queues the calls. Within a single GraphQL request, it thus collects all the demands emitted by the various resolvers, then groups them and executes them all at once.

Concretely, this mechanism shifts:

from a model of N invocations, 1 parameter (one query per book);
to a model of 1 invocation, N parameters (a single query for all the books).

It’s this reversal that dissolves the N+1 problem: the N individual queries merge into a single query carrying all the identifiers.

Caching: never ask for the same thing twice

The DataLoader also ships with a caching mechanism. A Loader memoizes the results of calls already made, which lets us avoid redundant queries and deduplicate parameters within a single request. If two resolvers ask for the reviews of the same book, the data is loaded only once.

In practice with Mercurius

Let’s see how this pattern takes shape in a Node.js GraphQL server. The example below relies on Mercurius, a Fastify adapter for building GraphQL servers. The concept of loaders is built in natively and directly carries over the concepts of the DataLoader pattern (originally the dataloader library).

/**
 * Mercurius is a Fastify adapter for building GraphQL servers.
 * The `loaders` concept is built in natively and carries over the concepts
 * of the DataLoader pattern (originally the dataloader library).
 */
const loaders: MercuriusLoaders<Context> = {
  Book: {
    /**
     * The `reviews` loader is invoked only once per request. All the books
     * are batched and grouped into an array.
     */
    reviews: async (queries: Array<{ obj: Review }>, context) => {
      /**
       * We collect the unique identifiers of the batched books, which lets us
       * fetch all the reviews in a single query.
       * e.g. 'SELECT * FROM reviews WHERE book_id IN (1, 2, 3);'
       */
      const batchedBookIds = queries.map(({ obj }) => obj.id);
      const reviews =
        await context.dependencies.reviewsRepository.findByBookIds(
          batchedBookIds
        );
      /**
       * For the loader to map the batched books to the reviews fetched per
       * book, we must preserve the original order of the batch, so that each
       * set of reviews is associated with the right book.
       * e.g. [                   [
       *   book1,      ->           [review1, review2],
       *   book2,      ->           [review3],
       *   book3       ->           [review4, review5, review6]
       * ]                        ]
       */
      const reviewsByBookId = groupBy(reviews, "book_id");
      return batchedBookIds.map((id) => reviewsByBookId[id] ?? []);
    },
  },
};

Two details in this code are worth highlighting, because they sit at the heart of the DataLoader’s contract.

First, the loader’s signature. Where a classic resolver receives a single book, the loader receives an array of queries (queries): this is the materialization of batching. Mercurius invokes it only once per GraphQL request, passing it the full set of collected books. We then extract all their identifiers (batchedBookIds) to hand them over to the repository in one go, which runs the single IN query.

Second, and this is the most subtle point: the preservation of order. The loader must return an array whose every element corresponds, in the same order, to the entry it received as a parameter. Since the database returns a flat list of reviews, we group it by book_id (here via a groupBy), then rebuild the final result by iterating over batchedBookIds in order. A book with no reviews at all must return an empty array (?? []) rather than a missing value: without this alignment guarantee, the reviews would be associated with the wrong book.

Conclusion

The DataLoader pattern is a simple yet formidably effective solution for handling concurrent queries and optimizing performance in GraphQL. By grouping data accesses through batching and eliminating redundant calls through caching, it turns the surge of N+1 queries into a single, controlled query.

The key takeaway: the N+1 problem is not an inevitability of GraphQL, but a consequence of the isolation of its resolvers. The DataLoader reconciles that isolation with the efficiency of grouped data access, without sacrificing any of the schema’s readability. If there’s a single reflex to keep in mind when facing a relation resolver, it’s this one: put it behind a loader.