Having the following table (conversations
):
id | record_id | is_response | text |
---+------------+---------------+------------
Here's my take:
SELECT
record_id,
string_agg(text, ' ' ORDER BY id) AS context
FROM (
SELECT
*,
coalesce(sum(incl::integer) OVER (ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),0) AS grp
FROM (
SELECT *, is_response AND text IN (SELECT text FROM responses) as incl
FROM conversations
) c
) c1
GROUP BY record_id, grp
HAVING bool_or(incl)
ORDER BY max(id);
This will scan the table conversations
once, but I am not sure if it will perform better than your solution. The basic idea is to use a window function to count how maybe preceding rows within the same record, end the conversation. Then we can group by with that number and the record_id
and discard incomplete conversations.
There is a simple and fast solution:
SELECT record_id, string_agg(text, ' ') As context
FROM (
SELECT c.*, count(r.text) OVER (PARTITION BY c.record_id ORDER BY c.id DESC) AS grp
FROM conversations c
LEFT JOIN responses r ON r.text = c.text AND c.is_response
ORDER BY record_id, id
) sub
WHERE grp > 0 -- ignore conversation part that does not end with a response
GROUP BY record_id, grp
ORDER BY record_id, grp;
count()
only counts non-null values. r.text
is NULL if the LEFT JOIN
to responses
comes up empty:
The value in grp
(short for "group") is only increased when a new output row is triggered. All rows belonging to the same output row end up with the same grp
number. It's then easy to aggregate in the outer SELECT
.
The special trick is to count conversation ends in reverse order. Everything after the last end (coming first when starting from the end) gets grp = 0
and is removed in the outer SELECT
.
Similar cases with more explanation: