Group by JSONB array in PostgreSQL

Recently, Nate Berkopec has been running an online webinar about profiling Rails applications. During the workshop, he was investigating the performance of a platform to recommend Spotify artists based on a listing history. PostgreSQL database was storing artists’ genres in an array structure using JSONB field.

An interesting case appeared in the middle of the session when Nate wanted to optimize a Ruby code to find the most popular genres based on the number of artists associated with them. As it turned out, it’s possible to delegate such a query to the database for better performance.

Let’s start with a schema definition and some sample data to play with:

CREATE TABLE artists (
  id SERIAL PRIMARY KEY,
  name TEXT,
  genres jsonb
);
id name genres
1 Kraftwerk [“electronic”]
2 Daft Punk [“electronic”, “house”]
3 Maroon 5 [“pop”, “rock”]
4 One Republic [“pop”, “rock”]
5 Lil Nas X [“hip-hop”, “country rap”, “rap”]
6 Post Malone [“hip-hop”, “rap”, “trap”]
7 Drake [“hip-hop”, “rap”]

To extract genres out of artists table, jsonb_array_elements function can be used:

SELECT jsonb_array_elements(artists.genres) AS genre
FROM artists;
genre
“electronic”
“electronic”
“house”
“pop”
“rock”
“pop”
“rock”
“hip-hop”
“country rap”
“rap”
“hip-hop”
“rap”
“trap”
“hip-hop”
“rap”

Then the only part that left is to group, count and sort:

SELECT jsonb_array_elements(a.genres) AS genre, COUNT(1) AS popularity
FROM artists AS a
GROUP BY genre
ORDER BY popularity DESC;

Which gives the most popular genres:

genre count
“rap” 3
“hip-hop” 3
“pop” 2
“rock” 2
“electronic” 2
“house” 1
“country rap” 1
“trap” 1

Full example can be found on dbfidle.