How to design your schema in MongoDB
What are the Schema Patterns?
Schema patterns are very similar to Design Patterns(Gang of Four) but focused on designing the schema. These are some building blocks identified by developers upon years of experience which helps in designing the schema of tables/collections in a way such that the application becomes more scalable in terms of reads/writes.
There are no perfect schema patterns. These are just guidelines and some industry standards you can follow to build a better schema.
You can define your own schema pattern based on your application’s use-case.
You can also use multiple schema patterns together to solve one particular use-case.
Let me repeat this again there is no pattern which can be applied as plug and play.
You need to modify it to fit your use-case.
Schema patterns are not limited to one particular database it is common for all of them but for the article purpose, most of the examples and schemas are defined using MongoDB.
Before designing any schema you need to estimate the below
- Scale it can expect.
- Whether the use-case is Write or Read intensive.
- Cost on the Database.
We will be discussing each pattern with examples of a movie database (Who doesn’t love movies?)
You might be using a lot of these patterns already in your application but it good to know the vocabulary of these schema designs/patterns.
Categories
We can categorize the patterns into 3 categories.
- Representation
- Frequency of Access
- Grouping
Representation Patterns
These patterns focus more on the representation of the schema.
Attribute Pattern
Document Versioning Pattern
Polymorphic Pattern
1. Attribute Pattern
Attribute patterns are well suited for collections/tables that have a similar subset of fields and you are querying on these subsets.
Let’s say we have a movie which is released in different countries and on different dates.
The disadvantages of the above schema are like let's say you want to filter all the movies which have a release in India whose release date is greater than a particular date. In this case, you will be needed to create an index on release_india
and what if you have a filter on Dubai you will have to create an index on release_dubai
.
Let’s modify the schema into a single field releases
Now we have modified the schema in such a way that all the releases come under one field and we can create an index on releases but that’s wrong because when you create an index on releases MongoDB internally creates indexes on all the subfields. Now let's modify the schema and make it more generic.
For the above, you just need one index releases.country
which is more manageable and easy. Now we have also defined the attributes of the releases i.e, date
and country
. This is what Attribute pattern is all about. You need to define your schema such that you can group your similar fields into one field with some common attributes.
2. Document Versioning Pattern
As the name suggests we store the versions of the documents but how do we efficiently do that and why do we want to store the versions? the answer is very simple we can use versions of the document for Analytics purposes or for reverting back to the previous document (especially when you are migrating data for fallback purposes).
How do we do it? It’s Simple, we can maintain the versions of the document just like git
Here, as you can see we have modified the available_languages
.So, now the newer version of the document is marked with a new revision value.
But the issue would be when you are retrieving the latest document you may have to sort based on the version and get the latest doc which will affect your performance and consume a lot of RAM and also the _id
is changed which may cause a lot of issues.
So to avoid this we create a new table to log the versions and keep the latest version in the original table. For analytics and other purposes, we can use the log table.
The advantages in this pattern are basically you are using two tables so there will be fewer documents in the original collection which will help in the read performance and also the syncing to the log collection can be done asynchronously which will help in decreasing the response time.
The disadvantages lie in writing into multiple collections and inconsistency as we have to maintain two collections.
3. Polymorphic pattern
This pattern is utilized when we have documents that have more similarities than differences. In such cases its better if we can accommodate all such documents in one collection.
For example, Let's consider Profiles
collection in a movie database. Profiles can be that of a director, actor, actress, and other technicians. So instead of creating separate collections for each category, we try to keep all of them under one collection since they have more similarities(Name, Age, Years of Exp) than differences (Private albums for musicians, body structure like chest, height, weight, etc for Actors/Actress which is not generally recorded for others as they are not important) The example may not be great but I am sure you have understood the base methodology i.e more similarities and fewer differences use cases will be perfect for polymorphic patterns.
This pattern fits perfectly for MongoDB because it is schemaless and can accommodate multiple profiles with different attributes.
Advantages of such patterns are when we want to retrieve all profiles we can get them from one collection instead of getting from multiple collections and aggregating them which will be costly.
The disadvantages of such a pattern are that the schema gets very complicated after a while when we include multiple profile categories and validating that kind of schema gets difficult to handle it on the application side.
From the above, we can see that a lot of attributes or fields are common.
Keys points to remember
- Schema patterns are not limited to MongoDB.
- No pattern is
plug and play
. It needs to be tweaked to fit your use-case. - We can use a combination of patterns to solve the use-case.
- There can be a lot of other patterns but these are some standard patterns.
I like articles to be short so I will wrap this article and we will be discussing more advanced schema patterns in the next parts.
Hope this article was helpful. Thanks for reading and let me know what do you think about schema patterns in the comments
My Linkedin ☺️