Using JSONB in PostgreSQL®: How to Effectively Store & Index JSON Data in PostgreSQL

24 min read
Using JSONB in PostgreSQL®: How to Effectively Store & Index JSON Data in PostgreSQL

SHARE THIS ARTICLE

JSON stands for JavaScript Object Notation. It is an open standard format that organizes data into key/value pairs and arrays detailed in RFC 7159. JSON is the most common format web services use to exchange data, store documents, unstructured data, etc. In this post, we will show you tips and techniques for effectively storing and indexing JSON data in PostgreSQL.

json popularity

You can also check out our Working with JSON Data in PostgreSQL vs. MongoDB webinar in partnership with PostgresConf to learn more about the topic. Also, check out our SlideShare page to download the slides.

Why Store JSON in PostgreSQL?

Why should a relational database even care about unstructured data? There are a few scenarios where it is useful.

Schema flexibility

Schema flexibility is one of the main reasons for storing data using the JSON format. Storing your data in JSON is useful when your schema is fluid and is changing frequently. Storing each of the keys as columns will result in frequent DML operations. This can be difficult when your data set is large, such as event tracking, analytics, tags, etc. Note: If a particular key is always present in your document, storing it as a first-class column might make sense. We discuss more about this approach in the section “JSON Patterns & Antipatterns” below.

Nested objects

If your data set has nested objects (single or multi-level), it may be easier to handle them in JSON instead of denormalizing the data into columns or multiple tables.

Syncing with external data sources

Often, an external system provides data as JSON, so it might be a temporary store before data is ingested into other parts of the system. For example, Stripe transactions.

Timeline of JSON Support in PostgreSQL

JSON support in PostgreSQL was introduced in 9.2 and has steadily improved in every release going forward.

Wave 1: PostgreSQL 9.2  (2012) added support for JSON data type

JSON database in 9.2 was fairly limited (and probably overhyped at that point) – a glorified string with some JSON validation thrown in. It is useful for validating incoming JSON and storing it in the database. More details are provided below.

Wave 2: PostgreSQL 9.4 (2014) added support for JSONB data type

JSONB stands for “JSON Binary” or “JSON Better”, depending on whom you ask. It is a decomposed binary format to store JSON. JSONB supports indexing the JSON data and is very efficient at parsing and querying it. In most cases, when you work with JSON in PostgreSQL, you should be using JSONB.

Wave 3: PostgreSQL 12 (2019) added support for SQL/JSON standard and JSONPATH queries

JSONPath brings a powerful JSON query engine to PostgreSQL.
PostgreSQL Version Timeline for JSON Support - ScaleGrid Blog

When Should You Use JSON vs. JSONB?

In most cases, you should be using JSONB. However, there are some specific cases where JSON works better:

  • JSON preserves the original formatting (a.k.a whitespace) and ordering of the keys.
  • JSON preserves duplicate keys.
  • JSON is faster to ingest than JSONB; however, if you do any further processing, JSONB will be faster.

For example, if you’re just ingesting JSON logs and not querying them in any way, JSON might be a better option. For this blog, when we refer to JSON support in PostgreSQL, we will refer to JSONB from now on.

JSONB Patterns & Antipatterns

If PostgreSQL supports JSONB, why do we need columns anymore? Why not just create a table with a JSONB blob and get rid of all columns like the schema below:

CREATE TABLE test(id int, data JSONB, PRIMARY KEY (id));

Columns are still the most efficient technique for working with your data. JSONB storage has some drawbacks vs. traditional columns:

PostgreSQL does not store column statistics for JSONB columns

PostgreSQL maintains statistics about the distributions of values in each column of the table – most common values (MCV), NULL entries, and histogram of distribution. Based on this data, the PostgreSQL query planner makes smart decisions on the plan to use for the query. At this point, PostgreSQL does not store any stats for JSONB columns or keys. This can sometimes result in poor choices like nested loop vs. hash joins, etc.

JSONB storage results in a larger storage footprint

JSONB storage does not deduplicate the key names in the JSON. This can result in a considerably larger storage footprint than MongoDB BSON on WiredTiger or traditional column storage. I ran a simple test with the below JSONB model storing about 10 million rows of data, and here are the results – In some ways, this is similar to the MongoDB MMAPV1 storage model where the keys in JSONB were stored as-is without any compression.

One long-term fix is to move the key names to a table-level dictionary and refer to this dictionary instead of storing the key names repeatedly. Until then, the workaround might be using more compact names (unix-style) instead of more descriptive ones. For example, if you’re storing millions of instances of a particular key, it would be better storage-wise to name it “pb” instead of “publisherName”.

The most efficient way to leverage JSONB in PostgreSQL is to combine columns and JSONB. If a key appears frequently in your JSONB blobs, it is probably better off being stored as a column. Use JSONB as a “catch-all” to handle the variable parts of your schema while leveraging traditional columns for more stable fields.

JSONB Data Structures

JSON Implementation Structure

JSONB and MongoDB BSON are essentially tree structures that store the parsed JSONB data using multi-level nodes. MongoDB BSON has a very similar structure.

Images source

JSONB Implementation Structure

JSONB & TOAST

Another important consideration for storage is how JSONB interacts with TOAST (The Oversize Attribute Storage Technique). Typically, when the size of your column exceeds the TOAST_TUPLE_THRESHOLD (2kb default), PostgreSQL will attempt to compress the data and fit it in 2kb. If that doesn’t work, the data is moved to out-of-line storage. This is what they call “TOASTing” the data. When the data is fetched, the reverse process of “deTOASTting” must happen. You can also control the TOAST storage strategy:

  • Extended – Allows for out-of-line storage and compression (using pglz). This is the default option.
  • External – Allows for out-of-line storage, but not compression.

If you’re experiencing delays due to the TOAST compression or decompression, one option is to set the column storage to ‘EXTENDED proactively’. For all of the details, please refer to this PostgreSQL doc.

JSONB Operators & Functions

PostgreSQL provides a variety of operators to work on JSONB. From the docs:

 

Operator Description
-> Get JSON array element (indexed from zero, negative integers count from the end)
-> Get JSON object field by key.
->> Get JSON array element as text.
->> Get JSON object field as text.
#> Get JSON object at the specified path.
#>> Get the JSON object on the specified path as text.
@> Does the left JSON value contain the right path/value entries at the top level?
<@ Are the left JSON path/value entries at the top level within the right JSON value?
? Does the string exist as a top-level key within the JSON value?
?| Do any of these array strings exist as top-level keys?
?& Do all of these array strings exist as top-level keys?
|| Concatenate two JSONB values into a new JSONB value.
Delete key/value pair or string element from the left operand. Key/value pairs are matched based on their key value.
Delete multiple key/value pairs or string elements from the left operand. Key/value pairs are matched based on their key value.
Delete the array element with the specified index (Negative integers count from the end). It throws an error if the top-level container is not an array.
#- Delete the field or element with the specified path (for JSON arrays, negative integers count from the end)
@? Does the JSON path return any item for the specified JSON value?
@@ Returns the result of the JSON path predicate check for the specified JSON value. Only the first item of the result is taken into account. If the result is not Boolean, then null is returned.

PostgreSQL also provides a variety of Creation Functions and Processing Functions to work with the JSONB data.

JSONB Indexes

JSONB provides a wide array of options to index your JSON data. At a high level, we will dig into 3 different types of indexes – GIN, BTREE, and HASH. Not all index types support all operator classes, so planning is needed to design your indexes based on the type of operators and queries that you plan on using.

GIN Indexes

GIN stands for “Generalized Inverted indexes”. From the docs:

“GIN is designed for handling cases where the items to be indexed are composite values, and the queries to be handled by the index need to search for element values that appear within the composite items. For example, the items could be documents, and the queries could be searches for documents containing specific words.”

PostgreSQL GIN Structure - ScaleGrid Blog

GIN supports two operator classes:

  • jsonb_ops (default) – ?, ?|, ?&, @>, @@, @? [Index each key and value in the JSONB element]
  • jsonb_pathops – @>, @@, @? [Index only the values in the JSONB element]
CREATE INDEX datagin ON books USING gin (data);

Existence Operators (?, ?|, ?& )

These operators can be used to check for the existence of top-level keys in the JSONB. Let’s create a GIN index on the data JSONB column. For example, find all books that are available in Braille. The JSON looks something like this:

&quot;{&quot;tags&quot;: {&quot;nk594127&quot;: {&quot;ik71786&quot;: &quot;iv678771&quot;}}, &quot;braille&quot;: false, &quot;keywords&quot;: [&quot;abc&quot;, &quot;kef&quot;, &quot;keh&quot;], &quot;hardcover&quot;: true, &quot;publisher&quot;: &quot;EfgdxUdvB0&quot;, &quot;criticrating&quot;: 1}
demo=# select * from books where data ? 'braille';
id | author | isbn | rating | data

---------+-----------------+------------+--------+------------------------------------------------------------------------------------------------------------------------------------------------------
------------------
1000005 | XEI7xShT8bPu6H7 | 2kD5XJDZUF | 0 | {&quot;tags&quot;: {&quot;nk455671&quot;: {&quot;ik937456&quot;: &quot;iv506075&quot;}}, &quot;braille&quot;: true, &quot;keywords&quot;: [&quot;abc&quot;, &quot;kef&quot;, &quot;keh&quot;], &quot;hardcover&quot;: false, &quot;publisher&quot;: &quot;zSfZIAjGGs&quot;, &quot;
criticrating&quot;: 4}
.....

demo=# explain analyze select * from books where data ? 'braille';
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on books (cost=12.75..1005.25 rows=1000 width=158) (actual time=0.033..0.039 rows=15 loops=1)
Recheck Cond: (data ? 'braille'::text)
Heap Blocks: exact=2
-&gt; Bitmap Index Scan on datagin (cost=0.00..12.50 rows=1000 width=0) (actual time=0.022..0.022 rows=15 loops=1)
Index Cond: (data ? 'braille'::text)
Planning Time: 0.102 ms
Execution Time: 0.067 ms
(7 rows)

As you can see from the explain output, the GIN index we created is being used for the search. What if we wanted to find books that were in braille or hardcover?

demo=# explain analyze select * from books where data ?| array['braille','hardcover'];
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on books (cost=16.75..1009.25 rows=1000 width=158) (actual time=0.029..0.035 rows=15 loops=1)
Recheck Cond: (data ?| '{braille,hardcover}'::text[])
Heap Blocks: exact=2
-&gt; Bitmap Index Scan on datagin (cost=0.00..16.50 rows=1000 width=0) (actual time=0.023..0.023 rows=15 loops=1)
Index Cond: (data ?| '{braille,hardcover}'::text[])
Planning Time: 0.138 ms
Execution Time: 0.057 ms
(7 rows)

The GIN index only supports the “existence” operators on “top-level” keys. If the key is not at the top level, then the index will not be used. It will result in a sequential scan:

demo=# select * from books where data-&gt;'tags' ? 'nk455671';
id | author | isbn | rating | data

---------+-----------------+------------+--------+------------------------------------------------------------------------------------------------------------------------------------------------------
------------------
1000005 | XEI7xShT8bPu6H7 | 2kD5XJDZUF | 0 | {&quot;tags&quot;: {&quot;nk455671&quot;: {&quot;ik937456&quot;: &quot;iv506075&quot;}}, &quot;braille&quot;: true, &quot;keywords&quot;: [&quot;abc&quot;, &quot;kef&quot;, &quot;keh&quot;], &quot;hardcover&quot;: false, &quot;publisher&quot;: &quot;zSfZIAjGGs&quot;, &quot;
criticrating&quot;: 4}
685122 | GWfuvKfQ1PCe1IL | jnyhYYcF66 | 3 | {&quot;tags&quot;: {&quot;nk455671&quot;: {&quot;ik615925&quot;: &quot;iv253423&quot;}}, &quot;publisher&quot;: &quot;b2NwVg7VY3&quot;, &quot;criticrating&quot;: 0}
(2 rows)

demo=# explain analyze select * from books where data-&gt;'tags' ? 'nk455671';
QUERY PLAN
----------------------------------------------------------------------------------------------------------
Seq Scan on books (cost=0.00..38807.29 rows=1000 width=158) (actual time=0.018..270.641 rows=2 loops=1)
Filter: ((data -&gt; 'tags'::text) ? 'nk455671'::text)
Rows Removed by Filter: 1000017
Planning Time: 0.078 ms
Execution Time: 270.728 ms
(5 rows)

To check for existence in nested docs, use “expression indexes”. Let’s create an index on data->tags:

CREATE INDEX datatagsgin ON books USING gin (data-&gt;'tags');
demo=# select * from books where data-&gt;'tags' ? 'nk455671';
id | author | isbn | rating | data

---------+-----------------+------------+--------+------------------------------------------------------------------------------------------------------------------------------------------------------
------------------
1000005 | XEI7xShT8bPu6H7 | 2kD5XJDZUF | 0 | {&quot;tags&quot;: {&quot;nk455671&quot;: {&quot;ik937456&quot;: &quot;iv506075&quot;}}, &quot;braille&quot;: true, &quot;keywords&quot;: [&quot;abc&quot;, &quot;kef&quot;, &quot;keh&quot;], &quot;hardcover&quot;: false, &quot;publisher&quot;: &quot;zSfZIAjGGs&quot;, &quot;
criticrating&quot;: 4}
685122 | GWfuvKfQ1PCe1IL | jnyhYYcF66 | 3 | {&quot;tags&quot;: {&quot;nk455671&quot;: {&quot;ik615925&quot;: &quot;iv253423&quot;}}, &quot;publisher&quot;: &quot;b2NwVg7VY3&quot;, &quot;criticrating&quot;: 0}
(2 rows)

demo=# explain analyze select * from books where data-&gt;'tags' ? 'nk455671';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on books (cost=12.75..1007.75 rows=1000 width=158) (actual time=0.031..0.035 rows=2 loops=1)
Recheck Cond: ((data -&gt;'tags'::text) ? 'nk455671'::text)
Heap Blocks: exact=2
-&gt; Bitmap Index Scan on datatagsgin (cost=0.00..12.50 rows=1000 width=0) (actual time=0.021..0.021 rows=2 loops=1)
Index Cond: ((data -&gt;'tags'::text) ? 'nk455671'::text)
Planning Time: 0.098 ms
Execution Time: 0.061 ms
(7 rows)

Note: An alternative here is to use the @> operator:

select * from books where data @&gt; '{&quot;tags&quot;:{&quot;nk455671&quot;:{}}}'::jsonb;

However, this only works if the value is an object. So, if you’re unsure if the value is an object or a primitive value, it could lead to incorrect results.

Path Operators @>, <@

The “path” operator can be used for multi-level queries of your JSONB data. Let’s use it similar to the ? operator above:

select * from books where data @&gt; '{&quot;braille&quot;:true}'::jsonb;
demo=# explain analyze select * from books where data @&gt; '{&quot;braille&quot;:true}'::jsonb;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on books (cost=16.75..1009.25 rows=1000 width=158) (actual time=0.040..0.048 rows=6 loops=1)
Recheck Cond: (data @&gt; '{&quot;braille&quot;: true}'::jsonb)
Rows Removed by Index Recheck: 9
Heap Blocks: exact=2
-&gt; Bitmap Index Scan on datagin (cost=0.00..16.50 rows=1000 width=0) (actual time=0.030..0.030 rows=15 loops=1)
Index Cond: (data @&gt; '{&quot;braille&quot;: true}'::jsonb)
Planning Time: 0.100 ms
Execution Time: 0.076 ms
(8 rows)

The path operators support querying nested objects or top-level objects:

demo=# select * from books where data @&gt; '{&quot;publisher&quot;:&quot;XlekfkLOtL&quot;}'::jsonb;
id | author | isbn | rating | data
-----+-----------------+------------+--------+-------------------------------------------------------------------------------------
346 | uD3QOvHfJdxq2ez | KiAaIRu8QE | 1 | {&quot;tags&quot;: {&quot;nk88&quot;: {&quot;ik37&quot;: &quot;iv161&quot;}}, &quot;publisher&quot;: &quot;XlekfkLOtL&quot;, &quot;criticrating&quot;: 3}
(1 row)

demo=# explain analyze select * from books where data @&gt; '{&quot;publisher&quot;:&quot;XlekfkLOtL&quot;}'::jsonb;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on books (cost=16.75..1009.25 rows=1000 width=158) (actual time=0.491..0.492 rows=1 loops=1)
Recheck Cond: (data @&gt; '{&quot;publisher&quot;: &quot;XlekfkLOtL&quot;}'::jsonb)
Heap Blocks: exact=1
-&gt; Bitmap Index Scan on datagin (cost=0.00..16.50 rows=1000 width=0) (actual time=0.092..0.092 rows=1 loops=1)
Index Cond: (data @&gt; '{&quot;publisher&quot;: &quot;XlekfkLOtL&quot;}'::jsonb)
Planning Time: 0.090 ms
Execution Time: 0.523 ms

The queries can be multi-level as well:

demo=# select * from books where data @&gt; '{&quot;tags&quot;:{&quot;nk455671&quot;:{&quot;ik937456&quot;:&quot;iv506075&quot;}}}'::jsonb;
id | author | isbn | rating | data

---------+-----------------+------------+--------+------------------------------------------------------------------------------------------------------------------------------------------------------
------------------
1000005 | XEI7xShT8bPu6H7 | 2kD5XJDZUF | 0 | {&quot;tags&quot;: {&quot;nk455671&quot;: {&quot;ik937456&quot;: &quot;iv506075&quot;}}, &quot;braille&quot;: true, &quot;keywords&quot;: [&quot;abc&quot;, &quot;kef&quot;, &quot;keh&quot;], &quot;hardcover&quot;: false, &quot;publisher&quot;: &quot;zSfZIAjGGs&quot;, &quot;
criticrating&quot;: 4}
(1 row)

GIN Index “pathops” Operator Class

GIN also supports a “pathops” option to reduce the size of the GIN index. When you use the pathops option, the only operator support is the “@>” so you must be careful with your queries. From the docs:

“The technical difference between a jsonb_ops and a jsonb_path_ops GIN index is that the former creates independent index items for each key and value in the data, while the latter creates index items only for each value in the data”

You can create a GIN pathops index as follows:

CREATE INDEX dataginpathops ON books USING gin (data jsonb_path_ops);

On my small dataset of 1 million books, you can see that the pathops GIN index is smaller – you should test with your dataset to understand the savings:

public | dataginpathops | index | sgpostgres | books | 67 MB |
public | datatagsgin | index | sgpostgres | books | 84 MB |

Let’s rerun our query from before with the pathops index:

demo=# select * from books where data @&gt; '{&quot;tags&quot;:{&quot;nk455671&quot;:{&quot;ik937456&quot;:&quot;iv506075&quot;}}}'::jsonb;
id | author | isbn | rating | data

---------+-----------------+------------+--------+------------------------------------------------------------------------------------------------------------------------------------------------------
------------------
1000005 | XEI7xShT8bPu6H7 | 2kD5XJDZUF | 0 | {&quot;tags&quot;: {&quot;nk455671&quot;: {&quot;ik937456&quot;: &quot;iv506075&quot;}}, &quot;braille&quot;: true, &quot;keywords&quot;: [&quot;abc&quot;, &quot;kef&quot;, &quot;keh&quot;], &quot;hardcover&quot;: false, &quot;publisher&quot;: &quot;zSfZIAjGGs&quot;, &quot;
criticrating&quot;: 4}
(1 row)

demo=# explain select * from books where data @&gt; '{&quot;tags&quot;:{&quot;nk455671&quot;:{&quot;ik937456&quot;:&quot;iv506075&quot;}}}'::jsonb;
QUERY PLAN
-----------------------------------------------------------------------------------------
Bitmap Heap Scan on books (cost=12.75..1005.25 rows=1000 width=158)
Recheck Cond: (data @&gt; '{&quot;tags&quot;: {&quot;nk455671&quot;: {&quot;ik937456&quot;: &quot;iv506075&quot;}}}'::jsonb)
-&gt; Bitmap Index Scan on dataginpathops (cost=0.00..12.50 rows=1000 width=0)
Index Cond: (data @&gt; '{&quot;tags&quot;: {&quot;nk455671&quot;: {&quot;ik937456&quot;: &quot;iv506075&quot;}}}'::jsonb)
(4 rows)

However, as mentioned above, the “pathops” option does not support all of the scenarios the default operator class supports. With a “pathops” GIN index, all these queries cannot leverage the GIN index. To summarize, you have a smaller index, but it supports a more limited use case.

select * from books where data ? 'tags'; =&gt; Sequential scan
select * from books where data @&gt; '{&quot;tags&quot; :{}}'; =&gt; Sequential scan
select * from books where data @&gt; '{&quot;tags&quot; :{&quot;k7888&quot;:{}}}' =&gt; Sequential scan

B-Tree indexes

B-tree indexes are the most common index type in relational databases. However, if you index an entire JSONB column with a B-tree index, the only useful operators are “=”, <, <=, >, >=. This can only be used for whole object comparisons, which has a very limited use case.

A more common scenario is to use B-tree “expression indexes”. For a primer, refer here – Indexes on Expressions. B-tree expression indexes can support the common comparison operators ‘=’, ‘<‘, ‘>’, ‘>=’, ‘<=’. As you might recall, GIN indexes don’t support these operators. Let’s consider when we want to retrieve all books with data -> criticrating > 4. So, you would build a query something like this:

demo=# select * from books where data-&gt;'criticrating' &gt; 4;
ERROR: operator does not exist: jsonb &gt;= integer
LINE 1: select * from books where data-&gt;'criticrating'  &gt;= 4;
^
HINT: No operator matches the given name and argument types. You might need to add explicit type casts.

That doesn’t work since the ‘->’ operator returns a JSONB type. So we need to use something like this:

demo=# select * from books where (data-&gt;'criticrating')::int4 &gt; 4;

If you’re using a version before PostgreSQL 11, it gets ugly. You need to first query as text and then cast it to an integer:

demo=# select * from books where (data-&gt;'criticrating')::int4 &gt; 4;

For expression indexes, the index needs to match the query expression exactly. So, our index would look something like this:

demo=# CREATE INDEX criticrating ON books USING BTREE (((data-&gt;'criticrating')::int4));
CREATE INDEX

demo=# explain analyze select * from books where (data-&gt;'criticrating')::int4 = 3;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Index Scan using criticrating on books (cost=0.42..4626.93 rows=5000 width=158) (actual time=0.069..70.221 rows=199883 loops=1)
Index Cond: (((data -&gt; 'criticrating'::text))::integer = 3)
Planning Time: 0.103 ms
Execution Time: 79.019 ms
(4 rows)

demo=# explain analyze select * from books where (data-&gt;'criticrating')::int4 = 3;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Index Scan using criticrating on books (cost=0.42..4626.93 rows=5000 width=158) (actual time=0.069..70.221 rows=199883 loops=1)
Index Cond: (((data -&gt; 'criticrating'::text))::integer = 3)
Planning Time: 0.103 ms
Execution Time: 79.019 ms
(4 rows)
[code language=SQL]
From above we can see that the BTREE index is being used as expected.

Hash Indexes

Hash indexes become interesting if you are only interested in the “=” operator. Consider the case when we are looking for a particular tag on a book. The element to be indexed can be a top-level element or deeply nested.

E.g. tags->publisher = XlekfkLOtL

CREATE INDEX publisherhash ON books USING HASH ((data-&gt;'publisher'));

Hash indexes also tend to be smaller than B-tree or GIN indexes. Of course, this ultimately depends on your data set.

demo=# select * from books where data-&gt;'publisher' = 'XlekfkLOtL'
demo-# ;
id | author | isbn | rating | data
-----+-----------------+------------+--------+-------------------------------------------------------------------------------------
346 | uD3QOvHfJdxq2ez | KiAaIRu8QE | 1 | {&quot;tags&quot;: {&quot;nk88&quot;: {&quot;ik37&quot;: &quot;iv161&quot;}}, &quot;publisher&quot;: &quot;XlekfkLOtL&quot;, &quot;criticrating&quot;: 3}
(1 row)

demo=# explain analyze select * from books where data-&gt;'publisher' = 'XlekfkLOtL';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Index Scan using publisherhash on books (cost=0.00..2.02 rows=1 width=158) (actual time=0.016..0.017 rows=1 loops=1)
Index Cond: ((data -&gt; 'publisher'::text) = 'XlekfkLOtL'::text)
Planning Time: 0.080 ms
Execution Time: 0.035 ms
(4 rows)

Special Mention: GIN Trigram Indexes

PostgreSQL supports string matching using trigram indexes. Trigram indexes work by breaking up text into trigrams. Trigrams are words broken up into sequences of 3 letters. More information can be found in the documentation. GIN indexes support the “gin_trgm_ops” class that can be used to index the data in JSONB. You can choose to use expression indexes to build the trigram index on a particular column.

CREATE EXTENSION pg_trgm;
CREATE INDEX publisher ON books USING GIN ((data-&gt;'publisher') gin_trgm_ops);

demo=# select * from books where data-&gt;'publisher' LIKE '%I0UB%';
 id |     author      |    isbn    | rating |                                      data
----+-----------------+------------+--------+---------------------------------------------------------------------------------
  4 | KiEk3xjqvTpmZeS | EYqXO9Nwmm |      0 | {&quot;tags&quot;: {&quot;nk3&quot;: {&quot;ik1&quot;: &quot;iv1&quot;}}, &quot;publisher&quot;: &quot;MI0UBqZJDt&quot;, &quot;criticrating&quot;: 1}
(1 row)

As you can see in the query above, we can search for any arbitrary string occurring at any potion. Unlike the B-tree indexes, we are not restricted to left-anchored expressions.

demo=# explain analyze select * from books where data-&gt;'publisher' LIKE '%I0UB%';
                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on books  (cost=9.78..111.28 rows=100 width=158) (actual time=0.033..0.033 rows=1 loops=1)
   Recheck Cond: ((data -&gt; 'publisher'::text) ~~ '%I0UB%'::text)
   Heap Blocks: exact=1
   -&gt;  Bitmap Index Scan on publisher  (cost=0.00..9.75 rows=100 width=0) (actual time=0.025..0.025 rows=1 loops=1)
         Index Cond: ((data -&gt; 'publisher'::text) ~~ '%I0UB%'::text)
 Planning Time: 0.213 ms
 Execution Time: 0.058 ms
(7 rows)

Special Mention: GIN Array Indexes

JSONB has great built-in support for indexing arrays. Let’s consider an example of indexing an array of strings using a GIN index in the case when our JSONB data contains a “keyword” element, and we would like to find rows with particular keywords:

{&quot;tags&quot;: {&quot;nk780341&quot;: {&quot;ik397357&quot;: &quot;iv632731&quot;}}, &quot;keywords&quot;: [&quot;abc&quot;, &quot;kef&quot;, &quot;keh&quot;], &quot;publisher&quot;: &quot;fqaJuAdjP5&quot;, &quot;criticrating&quot;: 2}

CREATE INDEX keywords ON books USING GIN ((data-&gt;'keywords') jsonb_path_ops);

demo=# select * from books where data-&gt;'keywords' @&gt; '[&quot;abc&quot;, &quot;keh&quot;]'::jsonb;
   id    |     author      |    isbn    | rating |                                                               data
---------+-----------------+------------+--------+-----------------------------------------------------------------------------------------------------------------------------------
 1000003 | zEG406sLKQ2IU8O | viPdlu3DZm |      4 | {&quot;tags&quot;: {&quot;nk263020&quot;: {&quot;ik203820&quot;: &quot;iv817928&quot;}}, &quot;keywords&quot;: [&quot;abc&quot;, &quot;kef&quot;, &quot;keh&quot;], &quot;publisher&quot;: &quot;7NClevxuTM&quot;, &quot;criticrating&quot;: 2}
 1000004 | GCe9NypHYKDH4rD | so6TQDYzZ3 |      4 | {&quot;tags&quot;: {&quot;nk780341&quot;: {&quot;ik397357&quot;: &quot;iv632731&quot;}}, &quot;keywords&quot;: [&quot;abc&quot;, &quot;kef&quot;, &quot;keh&quot;], &quot;publisher&quot;: &quot;fqaJuAdjP5&quot;, &quot;criticrating&quot;: 2}
(2 rows)

demo=# explain analyze select * from books where data-&gt;'keywords' @&gt; '[&quot;abc&quot;, &quot;keh&quot;]'::jsonb;
                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on books  (cost=54.75..1049.75 rows=1000 width=158) (actual time=0.026..0.028 rows=2 loops=1)
   Recheck Cond: ((data -&gt; 'keywords'::text) @&gt; '[&quot;abc&quot;, &quot;keh&quot;]'::jsonb)
   Heap Blocks: exact=1
   -&gt;  Bitmap Index Scan on keywords  (cost=0.00..54.50 rows=1000 width=0) (actual time=0.014..0.014 rows=2 loops=1)
         Index Cond: ((data -&gt; 'keywords'::text) @&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;gt; '[&quot;abc&quot;, &quot;keh&quot;]'::jsonb)
 Planning Time: 0.131 ms
 Execution Time: 0.063 ms
(7 rows)

The order of the items in the array on the right does not matter. For example, the following query would return the same result as the previous one:

demo=# explain analyze select * from books where data-&gt;'keywords' @&gt; '[&quot;keh&quot;,&quot;abc&quot;]'::jsonb;

All elements in the right side array of the containment operator need to be present – basically like an “AND” operator. If you want “OR” behavior, you can construct it in the WHERE clause:

demo=# explain analyze select * from books where (data-&gt;'keywords' @&gt; '[&quot;abc&quot;]'::jsonb OR data-&gt;'keywords' @&gt; '[&quot;keh&quot;]'::jsonb);

More details on the behavior of the containment operators with arrays can be found in the documentation.

SQL/JSON & JSONPath

SQL standard added support for JSON  in SQL – SQL/JSON Standard-2016. With the PostgreSQL 12/13 releases, PostgreSQL has one of the best implementations of the SQL/JSON standard. For more details refer to the PostgreSQL 12 announcement.

One of the core features of SQL/JSON is support for the JSONPath language when querying JSONB data. JSONPath lets you specify an expression (using a syntax similar to the property access notation in Javascript) to query your JSONB data. This makes it simple and intuitive but is also very powerful to query your JSONB data. Think of  JSONPath as the logical equivalent of XPath for XML.

.key Returns an object member with the specified key.
[*] Wildcard array element accessor that returns all array elements.
.* Wildcard member accessor that returns the values of all members located at the top level of the current object.
.** Recursive wildcard member accessor that processes all levels of the JSON hierarchy of the current object and returns all the member values, regardless of their nesting level.

Refer to JSONPath documentation for the full list of operators. JSONPath also supports a variety of filter expressions.

JSONPath Functions

PostgreSQL 12 provides several functions for using JSONPath to query your JSONB data. Learn more about PostgreSQL EXPLAIN and query costs.

From the docs:

  • jsonb_path_exists – Checks whether the JSONB path returns any item for the specified JSON value.
  • jsonb_path_match – Returns the result of the JSONB path predicate check for the specified JSONB value. Only the first item of the result is taken into account. If the result is not Boolean, then null is returned.
  • jsonb_path_query – Gets all JSONB items returned by JSONB path for the specified JSONB value. There are also a couple of other variants of this function that handle arrays of objects.

Let’s start with a simple query – finding books by publisher:

demo=# select * from books where data @@ '$.publisher == &quot;ktjKEZ1tvq&quot;';
id | author | isbn | rating | data
---------+-----------------+------------+--------+----------------------------------------------------------------------------------------------------------------------------------
1000001 | 4RNsovI2haTgU7l | GwSoX67gLS | 2 | {&quot;tags&quot;: {&quot;nk542369&quot;: {&quot;ik55240&quot;: &quot;iv305393&quot;}}, &quot;keywords&quot;: [&quot;abc&quot;, &quot;def&quot;, &quot;geh&quot;], &quot;publisher&quot;: &quot;ktjKEZ1tvq&quot;, &quot;criticrating&quot;: 0}
(1 row)

demo=# explain analyze select * from books where data @@ '$.publisher == &quot;ktjKEZ1tvq&quot;';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on books (cost=21.75..1014.25 rows=1000 width=158) (actual time=0.123..0.124 rows=1 loops=1)
Recheck Cond: (data @@ '($.&quot;publisher&quot; == &quot;ktjKEZ1tvq&quot;)'::jsonpath)
Heap Blocks: exact=1
-&gt; Bitmap Index Scan on datagin (cost=0.00..21.50 rows=1000 width=0) (actual time=0.110..0.110 rows=1 loops=1)
Index Cond: (data @@ '($.&quot;publisher&quot; == &quot;ktjKEZ1tvq&quot;)'::jsonpath)
Planning Time: 0.137 ms
Execution Time: 0.194 ms
(7 rows)

You can rewrite this expression as a JSONPath filter:

demo=# select * from books where jsonb_path_exists(data,'$.publisher ?(@ == &quot;ktjKEZ1tvq&quot;)');

You can also use very complex query expressions. For example, let’s select books where print style = hardcover and price = 100:

select * from books where jsonb_path_exists(data, '$.prints[*] ?(@.style==&quot;hc&quot; &amp;amp;amp;amp;amp;amp;&amp;amp;amp;amp;amp;amp; @.price == 100)');

However, index support for JSONPath is very limited at this point, making it dangerous to use JSONPath in the where clause. JSONPath support for indexes will be improved in subsequent releases.

demo=# explain analyze select * from books where jsonb_path_exists(data,'$.publisher ?(@ == &quot;ktjKEZ1tvq&quot;)');
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Seq Scan on books (cost=0.00..36307.24 rows=333340 width=158) (actual time=0.019..480.268 rows=1 loops=1)
Filter: jsonb_path_exists(data, '$.&quot;publisher&quot;?(@ == &quot;ktjKEZ1tvq&quot;)'::jsonpath, '{}'::jsonb, false)
Rows Removed by Filter: 1000028
Planning Time: 0.095 ms
Execution Time: 480.348 ms
(5 rows)

Projecting Partial JSON

Another great use case for JSONPath is projecting partial JSONB from the row that matches. Consider the following sample JSONB:

demo=# select jsonb_pretty(data) from books where id = 1000029;
jsonb_pretty
-----------------------------------
{
 &quot;tags&quot;: {
 &quot;nk678947&quot;: {
      &quot;ik159670&quot;: &quot;iv32358
 }
 },
 &quot;prints&quot;: [
     {
         &quot;price&quot;: 100,
         &quot;style&quot;: &quot;hc&quot;
     },
     {
        &quot;price&quot;: 50,
        &quot;style&quot;: &quot;pb&quot;
     }
 ],
 &quot;braille&quot;: false,
 &quot;keywords&quot;: [
     &quot;abc&quot;,
     &quot;kef&quot;,
     &quot;keh&quot;
 ],
 &quot;hardcover&quot;: true,
 &quot;publisher&quot;: &quot;ppc3YXL8kK&quot;,
 &quot;criticrating&quot;: 3
}

Select only the publisher field:

demo=# select jsonb_path_query(data, '$.publisher') from books where id = 1000029;
jsonb_path_query
------------------
&quot;ppc3YXL8kK&quot;
(1 row)

Select the prints field (which is an array of objects):

demo=# select jsonb_path_query(data, '$.prints') from books where id = 1000029;
jsonb_path_query
---------------------------------------------------------------
[{&quot;price&quot;: 100, &quot;style&quot;: &quot;hc&quot;}, {&quot;price&quot;: 50, &quot;style&quot;: &quot;pb&quot;}]
(1 row)

Select the first element in the array prints:

demo=# select jsonb_path_query(data, '$.prints[0]') from books where id = 1000029;
jsonb_path_query
-------------------------------
{&quot;price&quot;: 100, &quot;style&quot;: &quot;hc&quot;}
(1 row)

Select the last element in the array prints:

demo=# select jsonb_path_query(data, '$.prints[$.size()]') from books where id = 1000029;
jsonb_path_query
------------------------------
{&quot;price&quot;: 50, &quot;style&quot;: &quot;pb&quot;}
(1 row)

Select only the hardcover prints from the array:

demo=# select jsonb_path_query(data, '$.prints[*] ?(@.style==&quot;hc&quot;)') from books where id = 1000029;
       jsonb_path_query
-------------------------------
 {&quot;price&quot;: 100, &quot;style&quot;: &quot;hc&quot;}
(1 row)

We can also chain the filters:

demo=# select jsonb_path_query(data, '$.prints[*] ?(@.style==&quot;hc&quot;) ?(@.price ==100)') from books where id = 1000029;
jsonb_path_query
-------------------------------
{&quot;price&quot;: 100, &quot;style&quot;: &quot;hc&quot;}
(1 row)

In summary, PostgreSQL provides a powerful and versatile platform to store and process JSON data. You need to be aware of several gotcha’s, but we are optimistic that it will be fixed in future releases.

More tips for you

Which Is the Best PostgreSQL GUI?

PostgreSQL graphical user interface (GUI) tools help these open-source database users to manage, manipulate, and visualize their data. This post discusses the top 5 GUI tools for administering your PostgreSQL deployments. Learn more


Managing High Availability in PostgreSQL

Managing high availability in your PostgreSQL hosting is crucial to ensuring your clusters maintain exceptional uptime and strong operational performance so your data is always available to your application.


PostgreSQL Connection Pooling: Part 1 – Pros & Cons

In modern apps, clients open a lot of connections. Developers are discouraged from holding a database connection while other operations take place. “Open a connection as late as possible, close as soon as possible”. Read more about PostgreSQL Connection Pooling.

For more information, please visit www.scalegrid.io. Connect with ScaleGrid on LinkedIn, X, Facebook, and YouTube.
Table of Contents

Stay Ahead with ScaleGrid Insights

Dive into the world of database management with our monthly newsletter. Get expert tips, in-depth articles, and the latest news, directly to your inbox.

Related Posts

blog-feature-img_whats-new-at-scalegrid

What’s New at ScaleGrid – September 2024

At ScaleGrid, we’re always pushing the boundaries to offer more flexibility and scalability to our customers. Over the past few...

Managing PostgreSQL® High Availability – Part I: PostgreSQL Automatic Failover

Managing High Availability (HA) in your PostgreSQL hosting is very important to ensuring your database deployment clusters maintain exceptional uptime...

RabbitMQ Security Compliance - ScaleGrid

RabbitMQ Security and Compliance

Follow fundamental procedures in authentication, encryption, and commitment to RabbitMQ security protocols to protect your RabbitMQ system and secure messages....

NEWS

Add Headline Here