Redis® hashes are (intuitively enough!) hashes that map string names to string values. They are essentially named containers of unique fields and their values. They are the perfect way to represent an object as a Redis data structure. As expected, they provide constant-time basic operations like get, set, exists, etc. A bunch of advanced operations are also provided. The full list of hash commands is here.
Let’s take it for a spin from the redis-cli.
# hmset key field value [field value ...] : Insert elements in a hash. O(N), N is # of field being set 127.0.0.1:6379> hmset std:101 name "John Smith" dob "01-01-2000" gender M active 0 cgpa 2.9 OK # hgetall key : key all keys and values in the hash. O(N), N is size of hash 127.0.0.1:6379> hgetall std:101 1) "name" 2) "John Smith" 3) "dob" 4) "01-01-2000" 5) "gender" 6) "M" 7) "active" 8) "0" 9) "cgpa" 10) "2.9" 127.0.0.1:6379> hmset std:102 name "Jane" name "Ann" OK # If duplicates are found, only the last set is valid 127.0.0.1:6379> hgetall std:102 1) "name" 2) "Ann" # hexists key field: does field exist in the hash with key. O(1) 127.0.0.1:6379> hexists std:102 cgpa (integer) 0 # hincrby key field increment: Increment the integer field by increment. O(1) 127.0.0.1:6379> hincrby std:101 active 1 (integer) 1 # hget key field : the value for field in the hash stored at key. O(1) 127.0.0.1:6379> hget std:101 active 1) "1" # If field doesn't exist, hincrby sets it to 0 and then applies increment 127.0.0.1:6379> hincrby std:102 active 2 (integer) 2 # hmget key field [field ...]: the values of the fields requested for the hash with key. O(N), N is # of keys requested 127.0.0.1:6379> hmget std:102 active 1) "2" # hincrbyfloat key field increment: Increment the float value in field by increment. O(1) 127.0.0.1:6379> HINCRBYFLOAT std:101 cgpa 1.0 "3.9" # HSETNX key field value: set field to value if not alread set. O(1) 127.0.0.1:6379> hsetnx std:102 cgpa 4.0 (integer) 1 127.0.0.1:6379> hget std:102 cgpa "4.0" # hlen key: number of fields in the hash. O(1) 127.0.0.1:6379> hlen std:101 (integer) 5 # hkeys key : all fields in the hash. O(N), N is size of hash 127.0.0.1:6379> hkeys std:101 1) "name" 2) "dob" 3) "gender" 4) "active" 5) "cgpa"
As we have come to expect from our hosting for Redis™* as a data structure server, Redis provides fairly useful and advanced operations on hashes.
Internals
Like Redis Sets, Redis hashes too are implemented as dictionaries. Dictionaries in Redis are implemented as hash tables that use the hash function MurmurHash2 and grow via incremental resizing. Hash collisions are handled by chaining. More details can be found in the Redis dictionary implementation at dict.c.
As with Sets, storage optimization is made for smaller hashes. This data structure is called ziplist (Hashes were optimized using a different data structure called zip map before Redis 2.6) in the Redis implementation. It is essentially a specially encoded doubly linked list optimized for memory savings.
The data and the pointers are stored inline. Ziplist is also used to optimize storage of smaller sorted sets and lists. When flattened into such a list, a hash looks something like [key1, value1, key2, value2, …]. How is this more efficient than plain keys? Hashes with few keys can be packed cleverly into this linear array-like structure (i.e., the ziplist) while guaranteeing amortized O(1) performance for get and set. This can’t keep up as the hash fields increase.
As the hash grows, it is converted into the standard dictionary structure to maintain O(1) performance, and the space savings are lost. Redis config parameters controlling this transformation are:
- list-max-ziplist-entries default (512): Change to standard representation if the hash grows larger than this limit.
- list-max-ziplist-value default (64): Change to standard representation if the largest element in the hash exceeds this limit.
The code and comments in the implementation found here provide more details. The memory savings from using this special optimization are significant, and we will discuss them in more detail in the next section.
Memory Optimization
One of the well-known recommendations for memory savings while using Redis is to use hashes instead of plain strings. This is an important use case for utilizing the power of Redis hashes in real-world applications. From the official Redis documentation on memory optimization:
Use hashes when possible
Small hashes are encoded in a very small space, so you should try representing your data using hashes every time it is possible. For instance if you have objects representing users in a web application, instead of using different keys for name, surname, email, password, use a single hash with all the required fields.
That post proposes one way to map a range of objects into a set of hashes to maximize memory savings. In a popular blog post, Instagram describes using a similar technique that helped them achieve orders of magnitude of potential savings.
Applications
- Redis Hashes are naturally suited to store objects, such as sessions, users, visitors, etc. This makes them one of Redis’s key data structures.
- In its memory-optimized form, it is an excellent choice for caching large amounts of data.
An Object Address Store
Since memory optimization is an important use case for hashes, let’s discuss an example similar to the Instagram deployment to show how to utilize the memory-saving features of Redis hashes. Let’s say we have a huge Content-addressable storage (CAS) deployment with hundreds of millions of objects stored. Each object’s location is a hash string. We intend to develop a lookup system to find out the object’s location given its ID. The typical way to do this in Redis will be to use a string.
set object:14590860 "007f80f0a62408..."
set object:11678 "009f80abcd0a60..."
...
This approach works just fine. However, since the number of objects we have is huge, we will need a lot of memory for Redis. We want to do better. Let’s take the memory-optimized hash approach to this problem.
We will have to pick the right values for list-max-ziplist-entries and list-max-ziplist-value. The correct value for list-max-ziplist-value is whatever the max length of the storage address hash string can be. The value of list-max-ziplist-entries has to be kept low enough and will depend on the number of total hash buckets we wish to create.
It will be best figured out empirically. E.g. for 100 million objects, we could choose to use 100k hashes. The max entries in that case will be 100m / 100k = 1000. The application’s logic to decide which hash an object’s storage address goes into can be dividing the object ID by 100k and discarding the remainder. Thus, object ID 14590860 will go into hash (14590860/100k) = 145 i.e.
hset object:145 14590860 "007f80f0a62408..."
hget object:145 14590860
> "007f80f0a62408..."
This implementation will not just be much lighter on memory but should also provide good cache locality.
Here are some of our introductory posts in the Redis data structures series: