Suppose I am storing events
associated with users
in a table as follows (with dt
standing in for the timestamp of the event):
With Postgres 9.x this is actually quite easy:
select userid,
string_agg(event, '' order by dt) as event_sequence
from events
group by userid;
Using that result you can now apply a regular expression on the event_sequence:
select *
from (
select userid,
string_agg(event, '' order by dt) as event_sequence
from events
group by userid
) t
where event_sequence ~ 'A.*B'
With Postgres 8.x you need to find a replacement for the string_agg() function (just google for it, there are a lot of examples out there) and you need a sub-select to ensure the ordering of the aggregate as 8.x does support an order by
in an aggregate function.
For Oracle (version 11g R2):
By chance if you are using Oracle DB 11g R2, take look at listagg. The below code should work, but I haven't tested. The point is: you can use listagg
.
SQL> select user,
2 listagg( event, '' )
3 within group (order by dt) events
4 from users
5 group by user
6 order by dt
7 /
USER EVENTS
--------- --------------------
1 ADBCB
2 BBAAC
In prior versions you can do with CONNECT BY clause. More details on listagg.
I'm not at a computer to write code for this answer, but here's how I would go about a RegEx-based solution in SQL Server:
This should ultimately provide you with the functionality in SQL Server that your original question requests, however, if you're analyzing a very large dataset, this could be quite slow and there may be better ways to accomplish what you're looking for.