ERROR: subquery in FROM cannot refer to other relations of same query level

前端 未结 2 1977
我在风中等你
我在风中等你 2021-02-06 16:38

I\'m working with PostgreSQL 9 and I want to find the nearest neighbor inside table RP for all tuples in RQ, comparing the dates (t), but

相关标签:
2条回答
  • 2021-02-06 16:38

    Update:

    LATERAL joins allow that and were introduced with Postgres 9.3. Details:

    • What is the difference between LATERAL and a subquery in PostgreSQL?

    The reason is in the error message. One element of the FROM list cannot refer to another element of the FROM list on the same level. It is not visible for a peer on the same level. You could solve this with a correlated subquery:

    SELECT *, (SELECT t FROM rp ORDER BY abs(rp.t - rq.t) LIMIT 1) AS ra
    FROM   rq
    

    Obviously, you don't care which row from RP you pick from a set of equally close rows, so I do the same.

    However, a subquery expression in the SELECT list can only return one column. If you want more than one or all columns from the table RP, use something like this subquery construct:
    I assume the existence of a primary key id in both tables.

    SELECT id, t, (ra).*
    FROM (
        SELECT *, (SELECT rp FROM rp ORDER BY abs(rp.t - rq.t) LIMIT 1) AS ra
        FROM   rq
        ) x;
    

    Correlated subqueries are infamous for bad performance. This kind of query - while obviously computing what you want - will suck in particular, because the expression rp.t - rq.t cannot use an index. Performance will deteriorate drastically with bigger tables.


    This rewritten query should be able to utilize an index on RP.t, which should perform much faster with big tables.

    WITH x AS (
        SELECT * 
             ,(SELECT t
               FROM   rp
               WHERE  rp.t <  rq.t
               ORDER  BY rp.t DESC
               LIMIT  1) AS t_pre
    
             ,(SELECT t
               FROM   rp
               WHERE  rp.t >= rq.t
               ORDER  BY rp.t
               LIMIT  1) AS t_post
        FROM   rq
        )
    SELECT id, t
          ,CASE WHEN (t_post - t) < (t - t_pre)
                THEN t_post
                ELSE COALESCE(t_pre, t_post) END AS ra
    FROM   x;
    

    Again, if you want the whole row:

    WITH x AS (
        SELECT * 
             ,(SELECT rp
               FROM   rp
               WHERE  rp.t <  rq.t
               ORDER  BY rp.t DESC
               LIMIT  1) AS t_pre
    
             ,(SELECT rp
               FROM   rp
               WHERE  rp.t >= rq.t
               ORDER  BY rp.t
               LIMIT  1) AS t_post
        FROM   rq
        ), y AS (
        SELECT id, t
              ,CASE WHEN ((t_post).t - t) < (t - (t_pre).t)
                    THEN t_post
                    ELSE COALESCE(t_pre, t_post) END AS ra
        FROM   x
        )
    SELECT id AS rq_id, t AS rq_t, (ra).*
    FROM   y 
    ORDER  BY 2;
    

    Note the use of parentheses with composite types! No paren is redundant here. More about that in the manual here and here.

    Tested with PostgreSQL 9.1. Demo on sqlfiddle.

    0 讨论(0)
  • 2021-02-06 16:42

    The correlated subqueries, without an index, are going to do a cross join anyway. So, another way of expressing the query is:

    select rp.*, min(abs(rp.t - rq.t))
    from rp cross join
         rq
    group by <rp.*> -- <== need to replace with all columns
    

    There is another method, which is a bit more complicated. This requires using the cumulative sum.

    Here is the idea. Combine all the rp and rq values together. Now, enumerate them by the closest rp value. That is, create a flag for rp and take the cumulative sum. As a result, all the rq values between two rp values have the same rp index.

    The closest value to a given rq value has an rp index the same as the rq value or one more. Calculating the the rq_index uses the cumulative sum.

    The following query puts this together:

    with rqi as (select t.*, sum(isRQ) over (order by t) as rq_index
                 from (select rq.t, 0 as isRP, <NULL for each rp column>
                       from rq
                       union all
                       select rq.t, 1 as isRP, rp.* 
                       from rp
                      ) t
                ) t
    select rp.*,
           (case when abs(rqprev.t - rp.t) < abs(rqnext.t - rp.t)
                 then abs(rqprev.t - rp.t)
                 else abs(rqnext.t - rp.t)
            end) as closest_value
    from (select *
          from t
          where isRP = 0
         ) rp join
         (select *
          from t
          where isRP = 1
         ) rqprev
         on rp.rp_index = rqprev.rp_index join
         (select *
          from t
          where isRP = 1
         ) rqnext
         on rp.rp_index+1 = rpnext.rq_index
    

    The advantage of this approach is that there is no cross join and no correlated subqueries.

    0 讨论(0)
提交回复
热议问题