I have a need for a hive query that I\'m having difficulty figuring out.
I have a time series that looks like this:
time source
This would be a naive solution:
select *
from messages c
cross join messages m
where m.time between c.time - interval '0.001' second
and c.time + interval '0.001' second
and c.word1 = '2B3B'
and m.word2 = 'ABAA'
;
+----------------------------+--------+-------+-------+----------------------------+--------+-------+-------+
| time | source | word1 | word2 | time | source | word1 | word2 |
+----------------------------+--------+-------+-------+----------------------------+--------+-------+-------+
| 2012-02-01 23:43:16.998824 | 0001 | 2B3B | FAF0 | 2012-02-01 23:43:16.999356 | 0002 | 2326 | ABAA |
+----------------------------+--------+-------+-------+----------------------------+--------+-------+-------+
This is the solution with the good performance
select *
from messages c
join messages m
on floor (cast(c.time as decimal(37,7)) / (2 * 0.001)) =
floor (cast(m.time as decimal(37,7)) / (2 * 0.001))
where m.time between c.time - interval '0.001' second
and c.time + interval '0.001' second
and c.word1 = '2B3B'
and m.word2 = 'ABAA'
union all
select *
from messages c
join messages m
on floor ((cast(c.time as decimal(37,7)) + 0.001) / (2 * 0.001)) =
floor ((cast(m.time as decimal(37,7)) + 0.001) / (2 * 0.001))
where floor (cast(c.time as decimal(37,7)) / (2 * 0.001)) <>
floor (cast(m.time as decimal(37,7)) / (2 * 0.001))
and m.time between c.time - interval '0.001' second
and c.time + interval '0.001' second
and c.word1 = '2B3B'
and m.word2 = 'ABAA'
+----------------------------+--------+-------+-------+----------------------------+-------+-------+-------+
| time | source | word1 | word2 | _col4 | _col5 | _col6 | _col7 |
+----------------------------+--------+-------+-------+----------------------------+-------+-------+-------+
| 2012-02-01 23:43:16.998824 | 0001 | 2B3B | FAF0 | 2012-02-01 23:43:16.999356 | 0002 | 2326 | ABAA |
+----------------------------+--------+-------+-------+----------------------------+-------+-------+-------+
Events A and B are going to be caught by the upper part of the UNION ALL.
Events B and C are going to be caught by the lower part of the UNION ALL.
0 0.002 0.004 0.006 0.008 0.01
| | | | | |
-------------------------------------------------------
| |
| |
A B C
| |
| |
-------------------------------------------------------
| | | | |
0.001 0.003 0.005 0.007 0.009