I\'m working with PostgreSQL 9 and I want to find the nearest neighbor inside table RP
for all tuples in RQ
, comparing the dates (t
), but
LATERAL
joins allow that and were introduced with Postgres 9.3. Details:
The reason is in the error message. One element of the FROM
list cannot refer to another element of the FROM
list on the same level. It is not visible for a peer on the same level.
You could solve this with a correlated subquery:
SELECT *, (SELECT t FROM rp ORDER BY abs(rp.t - rq.t) LIMIT 1) AS ra
FROM rq
Obviously, you don't care which row from RP
you pick from a set of equally close rows, so I do the same.
However, a subquery expression in the SELECT
list can only return one column. If you want more than one or all columns from the table RP
, use something like this subquery construct:
I assume the existence of a primary key id
in both tables.
SELECT id, t, (ra).*
FROM (
SELECT *, (SELECT rp FROM rp ORDER BY abs(rp.t - rq.t) LIMIT 1) AS ra
FROM rq
) x;
Correlated subqueries are infamous for bad performance. This kind of query - while obviously computing what you want - will suck in particular, because the expression rp.t - rq.t
cannot use an index. Performance will deteriorate drastically with bigger tables.
This rewritten query should be able to utilize an index on RP.t
, which should perform much faster with big tables.
WITH x AS (
SELECT *
,(SELECT t
FROM rp
WHERE rp.t < rq.t
ORDER BY rp.t DESC
LIMIT 1) AS t_pre
,(SELECT t
FROM rp
WHERE rp.t >= rq.t
ORDER BY rp.t
LIMIT 1) AS t_post
FROM rq
)
SELECT id, t
,CASE WHEN (t_post - t) < (t - t_pre)
THEN t_post
ELSE COALESCE(t_pre, t_post) END AS ra
FROM x;
Again, if you want the whole row:
WITH x AS (
SELECT *
,(SELECT rp
FROM rp
WHERE rp.t < rq.t
ORDER BY rp.t DESC
LIMIT 1) AS t_pre
,(SELECT rp
FROM rp
WHERE rp.t >= rq.t
ORDER BY rp.t
LIMIT 1) AS t_post
FROM rq
), y AS (
SELECT id, t
,CASE WHEN ((t_post).t - t) < (t - (t_pre).t)
THEN t_post
ELSE COALESCE(t_pre, t_post) END AS ra
FROM x
)
SELECT id AS rq_id, t AS rq_t, (ra).*
FROM y
ORDER BY 2;
Note the use of parentheses with composite types! No paren is redundant here. More about that in the manual here and here.
Tested with PostgreSQL 9.1. Demo on sqlfiddle.
The correlated subqueries, without an index, are going to do a cross join anyway. So, another way of expressing the query is:
select rp.*, min(abs(rp.t - rq.t))
from rp cross join
rq
group by <rp.*> -- <== need to replace with all columns
There is another method, which is a bit more complicated. This requires using the cumulative sum.
Here is the idea. Combine all the rp and rq values together. Now, enumerate them by the closest rp value. That is, create a flag for rp and take the cumulative sum. As a result, all the rq values between two rp values have the same rp index.
The closest value to a given rq value has an rp index the same as the rq value or one more. Calculating the the rq_index uses the cumulative sum.
The following query puts this together:
with rqi as (select t.*, sum(isRQ) over (order by t) as rq_index
from (select rq.t, 0 as isRP, <NULL for each rp column>
from rq
union all
select rq.t, 1 as isRP, rp.*
from rp
) t
) t
select rp.*,
(case when abs(rqprev.t - rp.t) < abs(rqnext.t - rp.t)
then abs(rqprev.t - rp.t)
else abs(rqnext.t - rp.t)
end) as closest_value
from (select *
from t
where isRP = 0
) rp join
(select *
from t
where isRP = 1
) rqprev
on rp.rp_index = rqprev.rp_index join
(select *
from t
where isRP = 1
) rqnext
on rp.rp_index+1 = rpnext.rq_index
The advantage of this approach is that there is no cross join and no correlated subqueries.