Redis (Remote Dictionary Server) is an immensely popular in-memory key-value store which also provides for optional durability, partitioning, replication, and a host of other features. It’s currently the most popular key-value database and is known for it’s simplicity, low memory footprint, and having a low learning curve. Redis is also referred to as a data structure server and has support for atomic operations on data structures like hashes, lists, sets, sorted sets, bitmaps and hyperloglogs. In this post, we’ll look at the set data type provided by Redis along with its usage and real life use cases.
Redis Sets are unordered collections of strings (a string is the basic Redis value which can contain almost anything) that provides constant time addition, removal and membership checks. Redis also supports reasonably fast union, intersection, and subtraction operations between sets. As expected, it does not allow repeated values.
Here are some examples of Redis sets in action from the redis-cli. Here’s a summary of key representations in the example below:
- users:all represents a set of all users registered on a website.
- users:actv represents active users.
- users:inactv represents inactive users (users who haven’t visited the site for a while).
# sadd key member [member ...] : add members to a set 127.0.0.1:6379> sadd users:all 11 12 13 14 15 Rocket Pocket Socket (integer) 8 127.0.0.1:6379> type users:all set # smembers key: get all members of a set 127.0.0.1:6379> smembers users:all 1) "Pocket" 2) "11" 3) "Socket" 4) "13" 5) "14" 6) "Rocket" 7) "12" 8) "15" # sismember key member: is the given value a member of the set? 127.0.0.1:6379> sismember users:all 00 (integer) 0 127.0.0.1:6379> sismember users:all 11 (integer) 1 127.0.0.1:6379> sismember users:all Socket (integer) 1 127.0.0.1:6379> sadd users:inactv 11 12 13 (integer) 3 # sinter key [key ...]: Intersection of multiple sets # Similar: sinterstore stores the result of intersection of sets to a new set 127.0.0.1:6379> sinter users:all users:inactv 1) "11" 2) "12" 3) "13" # scard key: cardinality of the set i.e. number of members present in the set 127.0.0.1:6379> scard users:all (integer) 8 # sdiff key [key ...] : Subtract multiple sets # Similar: sdiffstore: subtract and store result in a new destination set 127.0.0.1:6379> sdiff users:all 1) "Pocket" 2) "11" 3) "Socket" 4) "13" 5) "14" 6) "12" 7) "Rocket" 8) "15" 127.0.0.1:6379> sdiff users:all users:inactv 1) "14" 2) "Pocket" 3) "Rocket" 4) "Socket" 5) "15" # sdiffstore destination key [key ...] 127.0.0.1:6379> sdiffstore users:actv users:all users:inactv (integer) 5 127.0.0.1:6379> smembers users:actv 1) "14" 2) "Pocket" 3) "Rocket" 4) "Socket" 5) "15" 127.0.0.1:6379> sdiff users:all users:actv users:inactv (empty list or set) # smove source destination member: move a member from source set to destination. 127.0.0.1:6379> smove users:inactv users:actv 11 (integer) 1 127.0.0.1:6379> smembers users:actv 1) "Pocket" 2) "11" 3) "Socket" 4) "14" 5) "Rocket" 6) "15"
Other important set commands include:
- SUNION – set union
- SPOP – randomly remove an element
- SREM – remove one or more elements
The complete list of set related Redis commands can be found here.
Redis internally stores sets as dictionaries. Dictionaries in Redis are implemented as hash tables that use the hash function MurmurHash2 and grow via incremental resizing. Hash collisions are handled by chaining. Sets have a special encoding for small sets when all members of a set are in radix 10 in the range # of 64 bit signed integers called IntSets. This is essentially a sorted array of integers. Searches within the array are performed through binary search. Obviously, this implementation is efficient for very small sets. The size up to which this encoding is used is governed by set-max-intset-entries config parameter. Default value is 512. A good description of internal data structures used by Redis can be found here.
Here’s a small list of a few of the possible Redis Set applications:
- As a set, it can be used to track unique items:
- All unique IP addresses visiting your site.
- All unique jobs instances currently in a given state, etc.
- Again, as a set, it can be used to denote as “belongs to” or similar relationship:
- All SKUs belonging to a particular category.
- All objects with a particular tag, etc.
- Sets can only be used to combine relationships, i.e. union/intersection/subtraction of sets:
- All SKUs that belong to category of t-shirts, but not a subcategory of polo necks.
Let’s pick up a real life example and explore set related use cases further.
Visual Profiler for Your eBook Store
Let’s say you’re the owner for a very large online book store that lists millions of titles. Your primary database is MongoDB and it performs fairly well for most of your use cases with the correct use of indexing, sharding, etc. Here’s a partial DB document schema for the books collections:
... sku: SKU, pid: Product ID, title: String, auth: String, pub: String, isbn: ISBN, edition: Int, price: Float, rating: Float, cats: [String, String ...], tags: [String, String ...], ...
You also record transactions in a collection called txns which may look like:
txnid: TxnID, cid: CustomerID, amnt: Float, curr: Currency, sku: [sku1, sku2, ...], time: TimeStamp, ...
And, a collection for views called views:
time: TimeStamp, # daily aggregated cid: CustomerID, sku: [sku1, sku2, ...], ...
Obviously, this is a highly simplified example which makes broad assumptions. Our intent is to show a Redis set example in a (near) real-world scenario.
Ok, so now you, as the store manager, want a Visual Profiler tool to analyze the relationships and customer behavior across different categories. For example:
- What’s the most popular category?
- Do people looking at or buying Science Fiction also look at Non-Fiction?
You want to be able to do this in real-time, i.e. the profiler’s UI will tick boxes, buttons that let you change parameters, and view results (almost) immediately.
Doing such operations on MongoDB will entail performing rather involved queries to join various categories, tags, and other data that you might care about. With a working set that doesn’t fit in-memory, these wouldn’t be the fastest operations. For example:
- Finding all books sold today that were Fiction, but not Science Fiction will involve querying the txn collection of today’s transactions. Then, iterating over the SKUs for collecting their categories and then doing $in/$nin operations.
Let’s see how this would be handled by bringing Redis into the mix. At the end of each day, daily scheduled jobs can run over these MongoDB collections to create Redis sets. The kind of sets you want to create will depend on the kind of filters you want to support on your frontend. For example, say you’d like to support category-related queries, we’ll want to create sets like:
cat:type:catID cat:sku:fiction cat:sku:nonfiction cat:sku:scfiction cat:sku:history cat:sku:milhistory cat:sku:military ... cat:cid:fiction cat:cid:nonfiction cat:cid:scfiction cat:cid:history cat:cid:milhistory cat:cid:military ...
The cat:sku:* sets will contain SKUs of books sold/viewed today in that category. Similarly, cat:cid:* will contain the CID of customers who bought/sold books in that category. A sample of queries we can answer with these sets are:
- Customers (or # of customers) who viewed/bought Fiction (single category) books today: smembers cat:cid:fiction
- Customers who viewed/bought History but not Military History today: sdiff cat:cid:history cat:cid:milhistory
- (Number of) Books sold today which were Science Fiction and Military i.e. military science fiction: sinter cat:sku:scfiction cat:sku:military
- Any number of such union, intersection and difference operations that you care for.
This itself gives us very powerful querying capabilities. Let’s add more sets! Say, we create additional sets based on book ratings. For example:
rat:sku:5str: Set of books with ratings > 4.5 rat:sku:4str: Set of books with ratings > 3.5 && <= 4.5 rat:sku:3str: ... rat:sku:2str: ... ... rat:cid:5str: Set of customer who bought books with ratings as mentioned above. rat:cid:4str: ... ..
Equipped with these sets, you are now able to quickly find out things like:
- 4 star and above rated fiction books purchased today: sunionstore rat:sku:4strabv rat:sku:5str rat:sku:5str/sinter rat:sku:4strabv cat:sku:fiction
- Customer who bought 3 star and above books in history: sunionstore rat:cid:5strabv rat:cid:3str rat:cid:5str rat:cid:5str/sinter cat:cid:history rat:cid:5strabv
Now, let’s say you want to send a discount coupon to all of your customers who bought an Astrology book today with a rating of 2 or below (as an apology for the bad experience of having to read that book!). You can export that list of CustomerIDs out and send it to your email application. Similarly, you could create sets for other things you may wish to expose as filters in your Visual Profiler like tags, price ranges, etc.
The advantages of using Redis Sets are obvious here. The in-memory store will lead to really fast access so that the frontend feels snappy. Additionally, Redis set operations are either constant time or linear.
In this post, we introduced with examples one of the most useful Redis Data Structures: Sets. Here are some of our other posts in the Redis data structures series: