Redis hashes are (intuitively enough!) hashes that map string names to string values. They are essentially named containers of unique fields and their values. They are the perfect way to represent an object as a Redis data structure. As expected, they provide constant time basic operations like get, set, exists etc. A bunch of advanced operations are also provided. The full list of hash commands is here.
Let’s take it for a spin from the redis-cli.
# hmset key field value [field value ...] : Insert elements in a hash. O(N), N is # of field being set 127.0.0.1:6379> hmset std:101 name "John Smith" dob "01-01-2000" gender M active 0 cgpa 2.9 OK # hgetall key : key all keys and values in the hash. O(N), N is size of hash 127.0.0.1:6379> hgetall std:101 1) "name" 2) "John Smith" 3) "dob" 4) "01-01-2000" 5) "gender" 6) "M" 7) "active" 8) "0" 9) "cgpa" 10) "2.9" 127.0.0.1:6379> hmset std:102 name "Jane" name "Ann" OK # If duplicates are found, only the last set is valid 127.0.0.1:6379> hgetall std:102 1) "name" 2) "Ann" # hexists key field: does field exist in the hash with key. O(1) 127.0.0.1:6379> hexists std:102 cgpa (integer) 0 # hincrby key field increment: Increment the integer field by increment. O(1) 127.0.0.1:6379> hincrby std:101 active 1 (integer) 1 # hget key field : the value for field in the hash stored at key. O(1) 127.0.0.1:6379> hget std:101 active 1) "1" # If field doesn't exist, hincrby sets it to 0 and then applies increment 127.0.0.1:6379> hincrby std:102 active 2 (integer) 2 # hmget key field [field ...]: the values of the fields requested for the hash with key. O(N), N is # of keys requested 127.0.0.1:6379> hmget std:102 active 1) "2" # hincrbyfloat key field increment: Increment the float value in field by increment. O(1) 127.0.0.1:6379> HINCRBYFLOAT std:101 cgpa 1.0 "3.9" # HSETNX key field value: set field to value if not alread set. O(1) 127.0.0.1:6379> hsetnx std:102 cgpa 4.0 (integer) 1 127.0.0.1:6379> hget std:102 cgpa "4.0" # hlen key: number of fields in the hash. O(1) 127.0.0.1:6379> hlen std:101 (integer) 5 # hkeys key : all fields in the hash. O(N), N is size of hash 127.0.0.1:6379> hkeys std:101 1) "name" 2) "dob" 3) "gender" 4) "active" 5) "cgpa"
As we have come to expect from our hosting for Redis™* as a data structure server, we see that Redis provides fairly useful and advanced operations on hashes.
Like Redis Sets, Redis hashes too are implemented as dictionaries. Dictionaries in Redis are implemented as hash tables that use the hash function MurmurHash2 and grow via incremental resizing. Hash collisions are handled by chaining. More details can be found in the Redis implementation of the dictionary at dict.c.
As with Sets, there is storage optimization made for smaller hashes. This data structure is called ziplist (Hashes were optimization using a different data structure called zipmap before Redis 2.6) in the Redis implementation. It is essentially a specially encoded doubly linked list that is optimized for memory savings. The data, as well as the pointers are stored inline. Ziplist is also used to optimize storage of smaller sorted sets and lists. A hash when flattened into such a list looks something like, [key1, value1, key2, value2, ...]. How is this more efficient than plain keys? Hashes with few keys can be packed cleverly into this linear array like structure (i.e. the ziplist) while still guaranteeing amortized O(1) performance for get and set. Obviously this can’t keep up as the hash fields increase. As the hash grows, it is converted into the standard dictionary structure to maintain O(1) performance and the space savings are lost. Redis config parameters controlling this transformation are:
- list-max-ziplist-entries default (512): Change to standard representation if hash grows larger than this limit.
- list-max-ziplist-value default (64): Change to standard representation if the largest element in the hash becomes larger than this limit.
More details can be understood from the code and comments in the implementation found here. The memory savings from using this special optimization are significant. We will talk about in more details in the next section.
One of the well known recommendations for memory savings while using Redis is to use hashes instead of plain strings. This is an important use-case for utilizing the power of Redis hashes in real world applications. From the official Redis documentation on memory optimization:
Use hashes when possible
Small hashes are encoded in a very small space, so you should try representing your data using hashes every time it is possible. For instance if you have objects representing users in a web application, instead of using different keys for name, surname, email, password, use a single hash with all the required fields.
That post then goes on to propose one way to map a range of objects into a set of hashes to take advantage of the memory savings. Instagram, in a very popular blog post, describes using a similar technique which helped them achieve orders of magnitude of potential savings. Another blog that attempts to measure the benefits of the optimization is this.
- Redis Hashes are naturally suited to store objects: sessions, users, visitors etc. This makes it one of key data structures provided by Redis.
- In its memory optimized form, it is an excellent choice for caching large amounts of data.
An Object Address Store
Since, memory optimization is an important use case for hashes, let’s discuss an example similar to the Instagram deployment to show how to utilize memory saving features of Redis hashes. Let’s say, we have a huge Content-addressable storage (CAS) deployment with hundreds of millions of objects stored. Each object’s location is a hash string. We intend to develop a lookup system to find out the object’s location given it’s ID. The typical way to do this in Redis will be to use a string.
set object:14590860 "007f80f0a62408..."
set object:11678 "009f80abcd0a60..."
This approach works just fine. However since the number of objects we have is huge, we will end up needing a lot of memory for Redis. We want to do better. Let’s take the memory optimized hash approach to this problem. We will have to pick the right values for list-max-ziplist-entries and list-max-ziplist-value. The correct value for list-max-ziplist-value is whatever the max length of the storage address hash string can be. The value of list-max-ziplist-entries has to be kept low enough and will depend on the number of total hash buckets we wish to create. It will be best figured out empirically. For e.g. for 100 million objects we could choose to use 100k hashes. The max entries in that case will be 100m / 100k = 1000. The application’s logic to decide which hash an object’s storage address goes into can be: divide the object ID by 100k and discard the remainder. Thus object ID 14590860 will go into hash (14590860/100k) = 145 i.e.
hset object:145 14590860 "007f80f0a62408..."
hget object:145 14590860
This implementation will not just be much lighter on memory but should also provide good cache locality.
Here are our other posts in the Redis data structure series.