MongoDB® Schema Design: There Is Always A Schema

3 min read
MongoDB® Schema Design: There Is Always A Schema


MongoDB Schema Design

When MongoDB was introduced a few years ago, one of the important features touted was the ability to be “schemaless” – What does this mean for your documents?

MongoDB schema design does not enforce any schema on the documents stored in a collection. MongoDB essentially stores JSON documents, and each document can contain any structure that you want. Consider some examples from our “contacts” collection below. Here is one document that you can store:

  'name': 'user1',
  'address': '1 mountain view',
  'phone': '123-324-3308',
  'SSN': '123-45-7891'

Now the second document stored in the collection can be of this format:

  'name': 'user2',
  'employeeid': 546789

It’s pretty cool that you can store both these documents in the same collection. The problem, however, starts when you need to retrieve these documents from the collection. How do you tell if the retrieved document contains format 1 or format 2? You can check if the retrieved document contains the ‘ssn’ field and then make a decision.  Another option is to store the type of the document in the document itself:

  'type': xxx,
  'name': ....

In both these cases what you have achieved is moving the schema enforcement from the database to the application –

There is always a schema, it is just a question of where it is implemented.

If you have the right indexes it alleviates the problem to a certain extent. If a majority of your queries are by ’employeeid’ you know that the retrieved document is always of the second format – however, the rest of your code that does not use this index will still have the problem mentioned above. Also If you are using an ODM like mongoose then it automatically already enforces a schema for you on top of MongoDB.

There are several applications that benefit from this flexibility. One scenario that comes to mind is the case of a schema where there are a number of optional fields/columns. In MongoDB, there is no penalty for having some missing columns.  Each document can only contain the fields that it needs.

Document Validation

Starting version 3.2.x MongoDB now supports the concept of schema validation using the “validator” construct.  This provides many levels of validation – so you can choose the level that works for you.  The default behaviour if you don’t use validator is the previous schemaless behaviour. Typically you will create the “validators” at the time of collection creation

   { validator: { $or:
         { employeeid: { $exists: true }},
         { SSN: { $exists: true } }

Existing Collections

Existing collections can be updated using the ‘collMod’ command:

  collMod: "contacts",
  validator: { $or: [ { employeeid: { $exists: true }}, { SSN: { $exists: true} } ] }

Validation Level

MongoDB supports the concept of ‘ValidationLevel’.  The default validation level is ‘strict’ which means that inserts and updates fail if the document does not meet the validation criteria. If the validation level is ‘Moderate’ it applies the validation to existing documents that meet the validation criteria. Documents that exist currently and don’t meet the criteria are not validated. While convenient the ‘Moderate’ validation level can get you into trouble down the line – so it needs to be used with care.

Validation Action

By default, the validation action is ‘Error’. If your document fails validation it is an error and the update/insert fails. However, you can also set the Validation action to ‘warn’ which basically logs the schema violation in the log , but does not fail the insert.

What schema design examples would help you on your next project, let us know!

For more information, please visit Connect with ScaleGrid on LinkedIn, X, Facebook, and YouTube.
Table of Contents

Stay Ahead with ScaleGrid Insights

Dive into the world of database management with our monthly newsletter. Get expert tips, in-depth articles, and the latest news, directly to your inbox.

Related Posts

high available cluster

High Availability Clustering & Why You Need It

High availability clustering keeps your IT systems running without interruptions, even amid failures. This guide details high availability clustering, its...


What’s New at ScaleGrid – July 2024

ScaleGrid is excited to announce our latest platform updates, showcasing our unwavering commitment to security, usability, and performance. Our recent...

database backend

What is RabbitMQ Used For

RabbitMQ is an open-source message broker facilitating the connection between different applications within a distributed setup. It is widely utilized...


Add Headline Here