Finding most recent children for a record in PostgreSQL

Sometimes we need to find a most recent / popular, or oldest (those cases are similar, and differ in a condition clause only) for a specific record (i.e. most recent post for each user).

This can be achieved with at least 2 methods which we will compare through the example of user notes:

id user_id content created_at
1 1 Alice Note 1 2021-01-01 15:30:00
2 2 Bob Note 1 2021-01-02 13:00:00
3 3 Charlie Note 1 2021-01-01 10:00:00
4 1 Alice Note 2 2021-02-01 11:00:00
5 2 Bob Note 2 2021-01-20 12:00:00
6 1 Alice Note 3 2021-03-01 13:00:00
id name
1 Alice
2 Bob
3 Charlie

notes table contains compound index on user_id, created_at columns to improve the performance.

Sub-select join

SELECT latest_notes.*
FROM notes AS latest_notes
LEFT JOIN notes AS user_notes
ON latest_notes.user_id = user_notes.user_id
AND user_notes.created_at > latest_notes.created_at
WHERE user_notes.id IS NULL;

Relies on loading notes from index twice (for user_notes, and latest_notes), and combining them together (with filters on top) - merge left join.

Left join

SELECT notes.*
FROM (
  SELECT user_id, MAX(created_at) AS created_at
  FROM notes
  GROUP BY notes.user_id
) AS latest_notes
INNER JOIN notes
ON notes.user_id = latest_notes.user_id
AND notes.created_at = latest_notes.created_at

Relies on scanning all notes into hash table, and probing them quickly with results from sub-select - hash join.

This will not work, if more than one user note can has the same created_at (if creation date is assigned automatically, we can use MAX(id) instead)

Performance

Method Cost Actual Time
Sub-select join 33.05.. 59.37 0.040..0.043
Left join 0.30.. 231.27 0.018..0.024

We see that estimated cost of Sub-select join is less fluctual, and might lead to better result but the actual time favors Left join in our case (keep in mind our data-set is very small, and definitely does not represent real-world scenario).

Full examples from the note can be found on dbfiddle.