Schema Patterns - MongoDB - Part 2

Shanmukh
4 min readJul 31, 2020

If you are here I hope you have gone through the Part -1 of this series and if not, I would highly recommend you go through them before proceeding further.

Cool, As discussed in the earlier article we will start with the next set of schema patterns category

Frequency of Access Patterns

These are a few advanced patterns that should be used when your use-case is read-intensive.

Subset Pattern

Approximation Pattern

Subset Pattern

This is a very interesting and useful pattern. As the name suggests we need to create a new subset of data which can be accessed for frequent purpose instead of working on a full set of data.

Ok, doesn’t make sense right? let's look into it with an example

So, you remember we used a movie database in the previous article where each record is a movie.

Now let's say we need to add a crew field to that movie record that will have details of the actors and a summary about their role.

All these details will be shown when the movie page is loaded but we need not list the whole crew lets say we have listed 5–10 crew members and then we give them an option to load more.

So what is the advantage of doing this load more option? A lot of people don't go through the whole crew list as they just want to see some details about the movie and the main crew of the film when they search for a particular movie.

So, what we can do is instead of maintaining the whole crew list in the movie record store only main crew details in that page and maintain a different table/collection to store the rest of the crew. This is how we use the subset pattern where we are storing the data which is frequently used away from the data which is rarely requested.

Let’s modify the existing schema

Here instead of storing the whole crew details, we are just storing the important crew to be listed on the details page.

So you might be thinking what’s the harm in sending the whole crew list?

  1. If you are creating an index the index size can grow huge based on the crew data.
  2. When the data is requested you need to load it into memory to serve it So, less data -> less RAM -> fewer Page faults.
  3. Less data means shorter disk access times.
  4. Less data transfer over the network
  5. Less response time of the movie details fetch API

So, I think it’s clear from the example that the subset pattern helps a lot and should be used in cases where we are able to partition data based on the frequency of access i.e which are read-intensive.

Approximation Pattern.

This pattern can be used when the use-case is write-intensive but the correctness of the result set is not of priority.

Yeah, I understand it doesn’t make much sense😅 let me explain it using an example.

So let's say we need to implement a feature which says the number of people visited that page like a view counter and the counter logically need not be accurate it can be an approximate i.e instead of 390223 it can be 390000 which is also fine since the accuracy is not critical and can survive with approximate data.

So the general implementation would be every time the page is visited you make an API request which will increase the counter in the record.

Disadvantages of the above approach:

  1. No of requests/sec is more
  2. Huge load on servers
  3. More CPU is required to consume the requests and also it will keep more important requests in wait state which may be business-critical
  4. And also when you increase the counter in the DB you will use the $inc operator for which the DB generally maintains a lock state to update the data sequentially which will help in maintaining the consistency.
  5. More writes on DB

What could be done? Since the data need not be accurate you can choose to

  1. Create events and instead of processing one event at a time, you can choose to bulk process them and calculate the count on the application layer and update the DB in one write(Eg: increment 500 in one write instead of 500 writes incremented by 1). This will decrease your load by a big factor(Note: of course the tradeoff is more load on the application)
  2. But the above approach may require queuing of events and maintaining them consuming them. Instead of this, you can leave it to the client who will keep track of the user, the pages he has visited and no of times he has visited and make an API request once in 10 minutes time window for which the application need not maintain the events and consuming them but onto the other side, we may lose some data if the user clears cache or something which is still fine since our use case is not critical.

So, I hope the example explained the pattern in a better way than the definition.

We should use the Approximation pattern when the use-case is write-intensive and when we can afford less correctness/accuracy of data.

We will be discussing the next category of schema patterns i.e Grouping patterns which are more advanced and fun patterns to learn in the next part.

Hope this article was helpful. Thanks for reading and let me know what do you think about these schema patterns in the comments

My Linkedin ☺️

Part-3

--

--

Shanmukh

Senior Backend Engineer who loves to explore technologies.