MongoDB®, Tech Tips & Insights

When to Use GridFS on MongoDB®?

Feb 25, 2014

4 min read

SHARE THIS ARTICLE

GridFS is a simple file system abstraction on top of MongoDB®. If you’re familiar with Amazon S3, GridFS is a very similar abstraction. Now, why does a document-oriented database like MongoDB provide a file layer abstraction? Turns out there are some very good reasons:

Storing user-generated file content

A large number of web applications allow users to upload files. Historically, when working with relational databases, these user-generated files get stored on the file system separate from the database. This creates a number of problems. How to replicate the files to all of the needed servers? How to delete all the copies when the file is deleted? How to backup the files for safety and disaster recovery? GridFS solves these problems for the user by storing the files along with the database, and you can leverage your database backup to backup your files. Also, due to MongoDB replication, a copy of your files is stored in each replica. Deleting the file is as easy as deleting an object in the database.

Accessing portions of file content

When a file is uploaded to GridFS, the file is split into chunks of 256k and stored separately. So, when you need to read only a certain range of bytes of the file, only those chunks are brought into memory and not the whole file. This is extremely useful when dealing with large media content that needs to be selectively read or edited.

Storing documents greater than 16MB in MongoDB

By default, MongoDB document size is capped at 16MB. So, if you have documents that are greater than 16MB, you can store them using GridFS.

Overcoming file system limitations

If you’re storing a large number of files, you’ll need to consider file system limitations like the maximum number of files/directory, etc. With GridFS, you don’t need to worry about the file system limits. Also, with GridFS and MongoDB sharding, you can distribute your files across different servers without significantly increasing the operational complexity.

GridFS – Behind the scenes

GridFS uses two collections to store the data:

> show collections;
fs.chunks
fs.files
system.indexes
>

The fs.files collections contain metadata about the files, and the fs.chunks collections store the actual 256k chunks. If you have a sharded collection, the chunks are distributed across different servers and you might get better performance than a filesystem!

> db.fs.files.findOne();
{
  "_id" : ObjectId("530cf1bf96038f5cb6df5f39"),
  "filename" : "./conn.log",
  "chunkSize" : 262144,
  "uploadDate" : ISODate("2014-02-25T19:40:47.321Z"),
  "md5" : "6515e95f8bb161f6435b130a0e587ccd",
  "length" : 1644981
}
>

MongoDB also creates a compound index on files_id and the chunk number to help quickly access the chunks:

> db.fs.chunks.getIndexes();
[  {    "v" : 1,    "key" : {      "_id" : 1    },    "ns" : "files.fs.chunks",    "name" : "_id_"  },  {    "v" : 1,    "key" : {      "files_id" : 1,      "n" : 1    },    "ns" : "files.fs.chunks",    "name" : "files_id_1_n_1"  }]
>

MongoDB GridFS Examples

MongoDB has a built-in utility called “mongofiles” to help exercise the GridFS scenarios. Please refer to your driver documentation on how to use GridFS with your driver.

Put
# mongofiles -h <hostname> -u <username> -p <password> --db files put /conn.log
connected to: 127.0.0.1
added file: { _id: ObjectId('530cf1009710ca8fd47d7d5d'), filename: "./conn.log", chunkSize: 262144, uploadDate: new Date(1393357057021), md5: "6515e95f8bb161f6435b130a0e587ccd", length: 1644981 }
done!

Get
# mongofiles -h <hostname> -u <username> -p <password> --db files get /conn.log
connected to: 127.0.0.1
done write to: ./conn.log

List
# mongofiles -h <hostname> -u <username> -p <password> list
connected to: 127.0.0.1
/conn.log 1644981

Delete
[root@ip-10-198-25-43 tmp]# mongofiles -h <hostname> -u <username> -p <password> --db files delete /conn.log
connected to: 127.0.0.1
done!

GridFS Modules

If you’d like to serve the file data stored in MongoDB directly from your web server or file system, there are several GridFS plugin modules available:

GridFS-Fuse – Plugin GridFS into the filesystem
GridFS-Nginx – Plugin to server GridFS files directly from Nginx

GridFS Limitations

Working Set

Serving files along with your database content can significantly churn your memory working set. If you wouldn’t like to disturb your working set, it might be best to serve your files from a different MongoDB server.

Performance

The file serving performance will be slower than natively serving the file from your web server and filesystem. However, the added management benefits might be worth the slowdown.

Atomic update

GridFS does not provide a way to do an atomic update of a file. If this scenario is necessary, you’ll need to maintain multiple versions of your files and pick the right version.

For more information, please visit www.scalegrid.io. Connect with ScaleGrid on LinkedIn, X, Facebook, and YouTube.

Stay Ahead with ScaleGrid Insights

Dive into the world of database management with our monthly newsletter. Get expert tips, in-depth articles, and the latest news, directly to your inbox.

Jul 1, 2024

High Availability Clustering & Why You Need It

High availability clustering keeps your IT systems running without interruptions, even amid failures. This guide details high availability clustering, its...

Jul 1, 2024

What’s New at ScaleGrid – July 2024

ScaleGrid is excited to announce our latest platform updates, showcasing our unwavering commitment to security, usability, and performance. Our recent...

Jun 28, 2024

What is RabbitMQ Used For

RabbitMQ is an open-source message broker facilitating the connection between different applications within a distributed setup. It is widely utilized...

When to Use GridFS on MongoDB®?

Storing user-generated file content

Accessing portions of file content

Storing documents greater than 16MB in MongoDB

Overcoming file system limitations

GridFS – Behind the scenes

MongoDB GridFS Examples

GridFS Modules

GridFS Limitations

Working Set

Performance

Atomic update

Stay Ahead with ScaleGrid Insights

Related Posts

High Availability Clustering & Why You Need It

What’s New at ScaleGrid – July 2024

What is RabbitMQ Used For

Ready to Get Started?

Dive In for Free

See It in Action

Ask Us Anything

NEWS

When to Use GridFS on MongoDB®?

Storing user-generated file content

Accessing portions of file content

Storing documents greater than 16MB in MongoDB

Overcoming file system limitations

GridFS – Behind the scenes

MongoDB GridFS Examples

GridFS Modules

GridFS Limitations

Working Set

Performance

Atomic update

Stay Ahead with ScaleGrid Insights

Related Posts

High Availability Clustering & Why You Need It

What’s New at ScaleGrid – July 2024

What is RabbitMQ Used For

Ready to Get Started?

Dive In for Free

See It in Action

Ask Us Anything

NEWS

Add Headline Here