Temporal Join in Hive query (events in close proximity in time)

前端 未结 1 1676
醉酒成梦
醉酒成梦 2021-01-23 06:14

I have a need for a hive query that I\'m having difficulty figuring out.

I have a time series that looks like this:

time                          source          


        
相关标签:
1条回答
  • 2021-01-23 06:49

    This would be a naive solution:

    select  *
    
    from                    messages c
            cross join      messages m 
    
    where   m.time  between c.time - interval '0.001' second
                    and     c.time + interval '0.001' second
                    
        and c.word1 = '2B3B'
        and m.word2 = 'ABAA'
        
    ;
    

    +----------------------------+--------+-------+-------+----------------------------+--------+-------+-------+
    |            time            | source | word1 | word2 |            time            | source | word1 | word2 |
    +----------------------------+--------+-------+-------+----------------------------+--------+-------+-------+
    | 2012-02-01 23:43:16.998824 |   0001 | 2B3B  | FAF0  | 2012-02-01 23:43:16.999356 |   0002 |  2326 | ABAA  |
    +----------------------------+--------+-------+-------+----------------------------+--------+-------+-------+
    

    This is the solution with the good performance

    select  *
    
    from                    messages c
    
            join            messages m
            
            on              floor (cast(c.time as decimal(37,7)) / (2 * 0.001))   =
                            floor (cast(m.time as decimal(37,7)) / (2 * 0.001))
    
    where   m.time  between c.time - interval '0.001' second
                    and     c.time + interval '0.001' second
                    
        and c.word1 = '2B3B'
        and m.word2 = 'ABAA'
        
    union all
    
    select  *
    
    from                    messages c
    
            join            messages m
            
            on              floor ((cast(c.time as decimal(37,7)) + 0.001) / (2 * 0.001))   =
                            floor ((cast(m.time as decimal(37,7)) + 0.001) / (2 * 0.001))
    
    where   floor (cast(c.time as decimal(37,7)) / (2 * 0.001))     <>
            floor (cast(m.time as decimal(37,7)) / (2 * 0.001))
            
        and m.time  between c.time - interval '0.001' second
                    and     c.time + interval '0.001' second
                    
        and c.word1 = '2B3B'
        and m.word2 = 'ABAA'
        
        
    

    +----------------------------+--------+-------+-------+----------------------------+-------+-------+-------+
    |            time            | source | word1 | word2 |           _col4            | _col5 | _col6 | _col7 |
    +----------------------------+--------+-------+-------+----------------------------+-------+-------+-------+
    | 2012-02-01 23:43:16.998824 |   0001 | 2B3B  | FAF0  | 2012-02-01 23:43:16.999356 |  0002 |  2326 | ABAA  |
    +----------------------------+--------+-------+-------+----------------------------+-------+-------+-------+
    

    Illustration

    Events A and B are going to be caught by the upper part of the UNION ALL.
    Events B and C are going to be caught by the lower part of the UNION ALL.

        0        0.002    0.004    0.006    0.008    0.01      
        |        |        |        |        |        |
    -------------------------------------------------------
                          |        |
                          |        |
                              A  B  C
                               |        |
                               |        |
    -------------------------------------------------------
             |        |        |        |        |                
             0.001    0.003    0.005    0.007    0.009
    
    0 讨论(0)
提交回复
热议问题