Schema Patterns - MongoDB - Part 3

Shanmukh
6 min readAug 6, 2020

If you are here I hope you have gone through the Part -1 and Part-2 of this series and if not, I would highly recommend you go through them before proceeding further.

Continuing from the last article we will start with the next set of schema patterns category.

Grouping Patterns

These are some cool patterns that can be used to scale your schema very quickly and efficiently. These types of patterns are highly recommended for read-intensive use cases.

Computed Pattern

Bucket Pattern

Outliner Pattern

Computed Pattern

These patterns are extremely useful when your use case is read-intensive and does computations/calculations on every request.

Let’s dive into an example, Continuing with the base example i.e, building a movie database let’s say we need to implement a feature that indicates the total number of people who have rated a particular movie(not the average rating) something like below.

In this case, every time the page is requesting for details we need to return the 3375 data.

One basic approach would be like having a ratings field in the record where we store all the ratings but if we have a lot of ratings for the movie then that data can grow huge this is where we use the subset pattern which we discussed in the previous article.

So to calculate the overall count you will have to depend on the Ratings collection where you perform a count query on the collection with the movie as the filter. For this you may have to create an index on the movie field and also the DB scan for the count is also expensive.

It’s clear that we cannot use the count query but what we can do is to pre-calculate the total no of ratings and store it in the record and recalculate it once in a while or every time when a rating is given.

Doing this we can avoid the computations on the collection.

Here, we are storing only the top_ratings which will be displayed on the details page and also we store the total_ratings which is pre-calculate to avoid the computations at run time. Similarly, we can store the average rating in the record which will be updated once a new rating is given and an event is triggered which will calculate the average rating in the background.

The above mentioned is a very simple use-case but in real-world scenarios, this pattern has a lot of applications.

Advantages of this pattern are that this saves a lot of CPU and unnecessary indexes.

Disadvantages lie in maintaining the consistency and multiple writes in the DB when a new rating is given which is very less compared to the reads. If there is inconsistency observed we can always recalculate it from the source.

The computed pattern should be used when your use case is read-intensive and does computations/calculations on every request to return data.

Bucket Pattern

This pattern solves a lot of complex issues in the real-world and extremely helpful and efficient. As the name suggests we bucket/group the data together which makes it easier to organize and retrieve the data faster

Historical or Timeseries data are perfect for Bucket Pattern.

We have already seen some examples before like the crew list or ratings where we group the data together and is stored in one record under one field(bucketed) but let’s look into it with another example.

You need to implement a feature where the user can add movies to a watch list.

One way is to create a UserWatchList collection where you store the user’s saved/watchlist movies list but if you observe it on the scale the collection size could become very large after a while(1 million user’s * 50 saved movies on an average ~ 50 million documents).

Retrieving documents on such huge collection become costly because of the index and also if you want to sort it on the last saved movie it becomes much costlier.

This is where bucketing comes into the picture. We can store all the watchlist movie_ids into a list as an attribute to a user.

Advantages:

  1. Fewer documents
  2. Less index size so improved performance
  3. We can also use this bucket for pagination and other computations

Disadvantages include the restriction on the watch_list i.e we can’t have an infinite or large number of movies on the watch_list because of the document size restriction on MongoDB i.e 16 MB. We will be learning how to overcome it in the next pattern.

Bucket pattern is used in cases where we want to retrieve similar data that can be grouped. This is a very flexible pattern that can be used in multiple places with some tweaks to fit the use-case.

Outlier Pattern

This pattern is applicable for data that falls outside of the normal i.e an outlier.

Let’s look at it with the help of the above example. In the above, we have seen a feature where a user can add a movie to his watchlist and the above pattern fits perfectly for such use-case but one of the disadvantages is the restriction on the document size i.e 16MB.

Let’s say there can be a user who has added thousand’s of movies to his watchlist and for that user, this pattern may not be perfect but the percentage of such users is very less maybe around ~0.1 %. For such kind of use-cases, outlier patterns are used.

I will try to put this in simple words. Outlier pattern is using a different approach for the 0.1% of users in the above from the normal pattern which is used for the other 99.9%.

Why do we do this?

Like discussed above we have 2 approaches to solve this issue

  1. UserWatchList collection
  2. Bucket pattern

The bucket pattern is efficient for 99.9% of the users and may not scale well for the rest 0.1% that doesn’t mean that we shouldn’t use the bucket pattern.

In fact, we should use the bucket pattern only because of the advantages it provides but what about the other 0.1%?

We can use 2 approaches one for 99.9% and the other for the rest 0.1%. So, all we need to do is flag the user saying that he is an outlier for which we require an additional boolean field say outlier

And based on this field we should handle the approach on the application side.

In the above example, let's say the user has added 10,000 movies to his watchlist and your document cannot accommodate more than 5000 ids. So once you reach 5000 ids you add an outlier flag to the user. Now on the application side when you receive the request to add a new movie to the watch list you observed that the user is flagged True for outlier based on which we can switch our approach to a different approach maybe UserWatchList collection approach or you can also create a new document in a new table(Note: A new table will help in reducing the index size while retrieving) which can be referenced to the older one and keep adding the movie ids to that collection and the logic behind retrieving these documents and the movie ids will be handled from the application side.

We can observe this kind of pattern in lot of applications like Twitter where the way we have to send updates to followers when a post is created is different from a normal user to a verified user because of the scale. Verified users will have more than a million followers and hence the approach on how they send updates differs from the normal user who has followers ranging from 0–100k. This is what outlier pattern is

I hope the above examples helped in understanding the outlier pattern much better.

The above three patterns are advanced and should be used only when you understand the scale and use-case in depth. This article marks the end of this series.

Hope this article was helpful. Thanks for reading and let me know what do you think about these schema patterns in the comments

My Linkedin ☺️

--

--

Shanmukh

Senior Backend Engineer who loves to explore technologies.