I have a table avl_pool
, and I have a function to find on the map the link nearest to that (x, y)
position.
The performance of this select
I have done things like this. It works relatively well. Note that each connection can handle exactly one query at a time so for each partition of your query, you have to have a separate connection. Now, in C# you could use threads to interact with each connection.
But another option would be to use asynchronous queries and have a single thread manage and poll your entire connection pool (this sometimes simplifies data manipulations on the application side). Note in this case you are best ensuring a sleep or other yield point after every poll cycle.
Note further that the extent to which this speeds the query depends on your disk I/O subsystem and your CPU parallelism. So you cannot just throw more pieces of a query and expect a speed up.
I've done this with SSIS by creating a script that buckets each server into 7 different "@Mode" (In my case the many servers assigns @Mode based on the last three digits of their IP -- this creates fairly evenly distributed buckets.
(CONVERT(int, RIGHT(dbserver, 3)) % @stages) + 1 AS Mode
In SSIS, I have 7 sets of the same 14 large queries running. Each are assigned a different @Mode number that is passed to the stored procedure.
Essentially this allows for 7 simultaneous queries that never run on the same server and effectively cutting the runtime down by approx 85%.
So, Create an SSIS package with the first step of refreshing the @Mode table.
Then create a container that contains 7 containers. Within each of those 7 containers execute your SQL queries with Parameter Mapping to @Mode. I point everything to stored procs, so in my case the SQLStatement field reads something like: EXEC StoredProc ?
. The ?
will then check the Parameter Mapping you created for @Mode.
Finally, in the SQL query, be sure that @Mode is indicated as a variable for which server to run the query against.
Consider marking your map.get_near_link
function as PARALLEL SAFE. This will tell the database engine that it is allowed to try generate a parallel plan when executing the function:
PARALLEL UNSAFE indicates that the function can't be executed in parallel mode and the presence of such a function in an SQL statement forces a serial execution plan. This is the default. PARALLEL RESTRICTED indicates that the function can be executed in parallel mode, but the execution is restricted to parallel group leader. PARALLEL SAFE indicates that the function is safe to run in parallel mode without restriction.
There are several settings which can cause the query planner not to generate a parallel query plan under any circumstances. Consider this documentation:
15.4. Parallel Safety
15.2. When Can Parallel Query Be Used?
On my reading, you may be able to achieve a parallel plan if you refactor your function like this:
CREATE OR REPLACE FUNCTION map.get_near_link(
x NUMERIC,
y NUMERIC,
azim NUMERIC)
RETURNS TABLE
(Link_ID INTEGER, Distance INTEGER, Sendito TEXT, Geom GEOGRAPHY)
AS
$$
SELECT
S.Link_ID,
TRUNC(ST_Distance(ST_GeomFromText('POINT('|| X || ' ' || Y || ')',4326), S.geom) * 100000)::INTEGER AS distance,
S.sentido,
v.geom
FROM (
SELECT *
FROM map.vzla_seg
WHERE ABS(Azim - S.azimuth) NOT BETWEEN 30 AND 330
) S
INNER JOIN map.vzla_rto v
ON S.link_id = v.link_id
WHERE
ST_Distance(ST_GeomFromText('POINT('|| X || ' ' || Y || ')',4326), S.geom) * 100000 < 50
ORDER BY
S.geom <-> ST_GeomFromText('POINT('|| X || ' ' || Y || ')', 4326)
LIMIT 1
$$
LANGUAGE SQL
PARALLEL SAFE -- Include this parameter
;
If the query optimiser will generate a parallel plan when executing this function, you won't need to implement your own parallelisation logic.