BlogNode.js

MongoDB skip and limit order is important in aggregation

Written by Codemzy on November 23rd, 2023

Here's something that caught me out this week - skip and limit order in MongoDB. The order of skip and limit isn't important in chaining (like `find` commands) but it is important in aggregation.

First, a quick summary: You don't need to worry about the order of .skip() and .limit() (chained methods), but you do need to worry about the order of $skip and $limit (aggregation stages).

In MongoDB, you might use helpers like find and updateMany to perform simple database operations. These helpers use chaining - e.g. .skip() and .limit(). But for more complex database operations, you might use an aggregation pipeline. Aggregation uses stages - e.g. $skip and $limit.

Chaining (.skip() and .limit()) and stages ($skip and $limit) behave differently, which can mess you up if you don't get them in the right order.

In MongoDB chaining, the order of .skip() and .limit() is not important.

When you're running a helper like db.collection.find() to fetch documents from a single collection, you'll probably use the chained methods .skip() and .limit() to get paginated data from MongoDB.

This:

db.collection.find({}).limit(60).skip(20);

Will give you the same results as this:

db.collection.find({}).skip(60).limit(20);

You'll get 20 results, skipping the first 60 (e.g. page 4 if you fetch 20 results per page).

When you chain skip() and limit(), the method chaining order does not affect the results. The server always applies the skip operation based on the sort order before it applies the limit on how many documents to return.

— MongoDB - Using skip() with limit()

So I got used to the order of skip() and limit() not being that important. And that all worked great... until I needed an aggregation pipeline.

For more complex database queries, involving things like data transformation, you might need aggregation, and if you're switching from MongoDB CRUD helpers (like find), you'll need to get your stages in order!

MongoDB skip and limit order in aggregation

When calling skip() and limit() on a cursor in MongoDB (as we just covered), the order doesn't matter. Because the server always applies the skip before it applies the limit - even if you put limit first.

But when you’re using an aggregation pipeline, order is important. The server isn't going to put skip and limit in the right order for you anymore. Because when you use aggregation, you're in charge!

Order is important in aggregation because each stage of the aggregation pipeline only gets the documents passed to it from the previous stage.

Let’s say you have 240 results. And you’re returning a page of 60 results to the client.

For a cursor both .skip(60).limit(60) and .limit(60).skip(60) work the same. You’ll skip the first 60 results, and get the next 60. Order doesn’t matter.

But in aggregation, if you put $limit before $skip, and use the following in your aggregation pipeline:

db.collection.aggregate([
  { $limit : 60 },
  { $skip : 60 },
]);

You’re going to get no results. Nothing. Zero. Even though you know results are sitting in your database collection waiting to be fetched.

By limiting the results to 60, the $limit stage will pass 60 results to the $skip stage. And then you tell the pipeline to skip 60 results, so it skips those 60 results it got. So you're skipping all of the results passed from the $limit stage to the $skip stage. And that’s why you get no results! 😬

We need to switch those around and put the $limit stage after $skip.

db.collection.aggregate([
  { $skip : 60 },
  { $limit : 60 },
]);

Now you are skipping the first 60 results, and $skip will pass the remaining 180 results to the $limit stage. $limit can then limit those remaining 180 results to 60, and return 60 results as a page of data!

$skip before $limit.

Adding sort() for consistency

For simplicity, I've only focused on skip and limit, but there's another method you will need too - sort!

If you’re returning paginated data from MongoDB, you really do need a .sort() stage. And this applies in both chaining (.sort()) and aggregation ($sort).

Let's go back to my earlier example:

db.collection.aggregate([
  { $skip : 60 },
  { $limit : 60 },
]);

Which 60 documents are we skipping? The first 60? But what are the first 60 documents? We don't know because they are not sorted in any particular order.

So when we come to page 2 of the data { $skip : 60*2 }, the documents might get returned in a different order. We might end up with some of the same documents from page 1, and some might be missed altogether.

Not cool!

So don't forget to sort your queries! At least on the _id field.

db.collection.aggregate([
  { $sort : { _id: 1 } },
  { $skip : 60 },
  { $limit : 60 },
]);

Or if you're sorting on another field, for example, a title field alphabetically, if it's not unique, add the _id field too. This makes sure the sort order stays consistent.

db.collection.aggregate([
  { $sort : { title: 1, _id: 1 } },
  { $skip : 60 },
  { $limit : 60 },
]);

Or for chained methods...

db.collection.find({}).sort({ title: 1, _id: 1 }).skip(60).limit(60)

Going forward, I'll try to get my .skip() and .limit() in order on chained methods too, for consistency, and so I don't forget when I need to convert a find query to an aggregation pipeline!