Tutorial: MongoDB Aggregate Query Analysis

Avatar

By squashlabs, Last Updated: October 26, 2023

Tutorial: MongoDB Aggregate Query Analysis

Overview of MongoDB Aggregation Pipeline

The MongoDB Aggregation Pipeline is a useful feature that allows developers to perform complex data analysis and transformations on MongoDB data. It is a framework that enables users to process data in a pipeline of stages, with each stage transforming the data in some way.

The aggregation pipeline consists of a series of stages, where each stage performs a specific operation on the data. These stages can be chained together to create a multi-step data processing pipeline. The output of one stage becomes the input for the next stage, allowing for a flexible and efficient way to analyze and manipulate data.

Related Article: Tutorial: Using Python to Interact with MongoDB Collections

The MongoDB Aggregation Framework

The MongoDB Aggregation Framework is the foundation of the aggregation pipeline. It provides a set of operators that can be used to perform various operations on the data, such as filtering, grouping, sorting, and projecting.

The framework includes a wide range of operators, each designed to handle a specific task. Some of the commonly used operators include $match, $group, $project, $sort, $unwind, $limit, and $skip. These operators can be combined in different ways to create useful and complex queries.

Aggregate Queries

Aggregate queries are constructed using the aggregation pipeline. Each query consists of one or more stages, with each stage specifying a particular operation to be performed on the data.

Let’s consider an example where we have a collection of documents representing sales transactions. We want to find the total sales amount for each product category. We can achieve this using the aggregation pipeline with the $group stage:

db.sales.aggregate([
  { $group: { _id: "$category", totalSales: { $sum: "$amount" } } }
])

In this example, the $group stage groups the documents by the category field and calculates the total sales amount for each category using the $sum operator.

Group Stage

The $group stage in MongoDB aggregation is used to group documents based on a specified key and perform aggregations on the grouped data. It is one of the most commonly used stages in the aggregation pipeline.

Let’s consider an example where we have a collection of documents representing books. We want to find the average price of books in each genre. We can achieve this using the $group stage:

db.books.aggregate([
  { $group: { _id: "$genre", averagePrice: { $avg: "$price" } } }
])

In this example, the $group stage groups the documents by the genre field and calculates the average price for each genre using the $avg operator.

Related Article: Exploring MongoDB: Does it Load Documents When Querying?

Project Stage

The $project stage in MongoDB aggregation is used to reshape the documents in the pipeline. It allows developers to include or exclude fields, create new fields, and perform various transformations on the data.

Let’s consider an example where we have a collection of documents representing employees. We want to retrieve only the first name and last name of each employee. We can achieve this using the $project stage:

db.employees.aggregate([
  { $project: { _id: 0, firstName: 1, lastName: 1 } }
])

In this example, the $project stage includes only the firstName and lastName fields in the output documents while excluding the _id field.

Match Stage

The $match stageis used to filter documents based on specific criteria. It allows developers to include or exclude documents from the pipeline based on their field values.

Let’s consider an example where we have a collection of documents representing orders. We want to find orders that were placed after a certain date. We can achieve this using the $match stage:

db.orders.aggregate([
  { $match: { date: { $gte: ISODate("2022-01-01") } } }
])

In this example, the $match stage filters out orders that have a date field value greater than or equal to January 1, 2022.

Sort Stage

The $sort stage is used to sort the documents in the pipeline based on one or more fields. It allows developers to control the order in which the documents are processed.

Let’s consider an example where we have a collection of documents representing students. We want to sort the students based on their scores in descending order. We can achieve this using the $sort stage:

db.students.aggregate([
  { $sort: { score: -1 } }
])

In this example, the $sort stage sorts the documents based on the score field in descending order.

Related Article: How to Add a Field with a Blank Value in MongoDB

Unwind Stage

The $unwind stage in MongoDB aggregation is used to create a new document for each element in an array field. It allows developers to flatten the array and perform operations on the individual elements.

Let’s consider an example where we have a collection of documents representing orders, where each order can have multiple items. We want to find the total quantity sold for each item. We can achieve this using the $unwind stage:

db.orders.aggregate([
  { $unwind: "$items" },
  { $group: { _id: "$items.name", totalQuantity: { $sum: "$items.quantity" } } }
])

In this example, the $unwind stage creates a new document for each item in the items array field. The subsequent $group stage groups the documents by the name field and calculates the total quantity sold for each item.

Limit Stage

The $limit is used to limit the number of documents in the pipeline output. It allows developers to control the amount of data returned by the query.

Let’s consider an example where we have a collection of documents representing products. We want to retrieve only the top 10 products by sales. We can achieve this using the $limit stage:

db.products.aggregate([
  { $sort: { sales: -1 } },
  { $limit: 10 }
])

In this example, the $sort stage sorts the documents based on the sales field in descending order. The subsequent $limit stage limits the output to the first 10 documents.

When to Utilize the MongoDB Skip Stage

The $skip stage is used to skip a specified number of documents in the pipeline. It allows developers to skip over a certain amount of data and retrieve the remaining documents.

Let’s consider an example where we have a collection of documents representing users. We want to retrieve users in batches of 10, skipping the first 20 users. We can achieve this using the $skip stage:

db.users.aggregate([
  { $skip: 20 },
  { $limit: 10 }
])

In this example, the $skip stage skips the first 20 documents in the pipeline. The subsequent $limit stage limits the output to the next 10 documents.

Related Article: How to Use Range Queries in MongoDB

Code Snippet: MongoDB Aggregation Pipeline Example

Here’s an example of a MongoDB aggregation pipeline that demonstrates the usage of multiple stages:

db.sales.aggregate([
  { $match: { date: { $gte: ISODate("2022-01-01") } } },
  { $group: { _id: "$category", totalSales: { $sum: "$amount" } } },
  { $sort: { totalSales: -1 } },
  { $limit: 5 }
])

In this example, we start with the $match stage to filter out sales that occurred after January 1, 2022. Then, we use the $group stage to group the sales by category and calculate the total sales amount for each category. Next, we use the $sort stage to sort the categories based on the total sales amount in descending order. Finally, we use the $limit stage to retrieve only the top 5 categories.

Code Snippet: MongoDB Aggregation Framework Example

Here’s an example of a MongoDB aggregation framework query that demonstrates the usage of various operators:

db.books.aggregate([
  { $match: { price: { $gte: 20 } } },
  { $project: { _id: 0, title: 1, author: 1, price: 1 } },
  { $sort: { price: -1 } },
  { $limit: 10 }
])

In this example, we start with the $match stage to filter out books that have a price greater than or equal to 20. Then, we use the $project stage to include only the title, author, and price fields in the output documents. Next, we use the $sort stage to sort the books based on the price in descending order. Finally, we use the $limit stage to retrieve only the top 10 books.

More Articles from the NoSQL Databases Guide series:

Crafting Query Operators in MongoDB

This tutorial provides a practical guide on creating and implementing query operators in MongoDB. The article covers various topics such as understanding MongoDB query... read more

Using Multi-Indexes with MongoDB Queries

MongoDB queries can benefit from the usage of multiple indexes, allowing for improved performance and optimization. This article explores various aspects of multi-index... read more

MongoDB Queries Tutorial

MongoDB is a powerful NoSQL database that offers flexibility and scalability. In this article, we delve into the modifiability of MongoDB queries, investigating whether... read more

How to Run Geospatial Queries in Nodejs Loopback & MongoDB

Executing geospatial queries with Loopback MongoDB is a crucial skill for software engineers. This article provides insight into geospatial queries, how Loopback... read more

How to Improve the Speed of MongoDB Queries

In this article, we take an in-depth look at the speed and performance of queries in MongoDB. We delve into evaluating query performance, analyzing query speed,... read more

Declaring Variables in MongoDB Queries

Declaring variables in MongoDB queries allows for more flexibility and customization in your data retrieval. This article provides a step-by-step guide on how to use... read more