postgresql-performance

Can PostgreSQL array be optimized for join?

ⅰ亾dé卋堺 提交于 2019-12-02 07:34:37
I see that Postgres array is good for performance if the array's element is the data itself, e.g., tag http://shon.github.io/2015/12/21/postgres_array_performance.html How about if I use array as a way to store foreign keys of integer? Barring foreign key constraint problem, is it advisable to store foreign keys with integer array? Apps should optimize for report or analytics. So if the app will end up joining the array to table most of the time, say the app need to show the label/title/name of the foreign key, is it still ok to use array for storage of foreign keys? Would the performance be

Slow LEFT JOIN on CTE with time intervals

给你一囗甜甜゛ 提交于 2019-12-02 05:36:11
问题 I am trying to debug a query in PostgreSQL that I've built to bucket market data in time buckets in arbitrary time intervals . Here is my table definition: CREATE TABLE historical_ohlcv ( exchange_symbol TEXT NOT NULL, symbol_id TEXT NOT NULL, kafka_key TEXT NOT NULL, open NUMERIC, high NUMERIC, low NUMERIC, close NUMERIC, volume NUMERIC, time_open TIMESTAMP WITH TIME ZONE NOT NULL, time_close TIMESTAMP WITH TIME ZONE, CONSTRAINT historical_ohlcv_pkey PRIMARY KEY (exchange_symbol, symbol_id,

Slow LEFT JOIN on CTE with time intervals

∥☆過路亽.° 提交于 2019-12-02 01:07:31
I am trying to debug a query in PostgreSQL that I've built to bucket market data in time buckets in arbitrary time intervals . Here is my table definition: CREATE TABLE historical_ohlcv ( exchange_symbol TEXT NOT NULL, symbol_id TEXT NOT NULL, kafka_key TEXT NOT NULL, open NUMERIC, high NUMERIC, low NUMERIC, close NUMERIC, volume NUMERIC, time_open TIMESTAMP WITH TIME ZONE NOT NULL, time_close TIMESTAMP WITH TIME ZONE, CONSTRAINT historical_ohlcv_pkey PRIMARY KEY (exchange_symbol, symbol_id, time_open) ); CREATE INDEX symbol_id_idx ON historical_ohlcv (symbol_id); CREATE INDEX open_close

Why does a slight change in the search term slow down the query so much?

百般思念 提交于 2019-12-01 19:11:41
问题 I have the following query in PostgreSQL (9.5.1): select e.id, (select count(id) from imgitem ii where ii.tabid = e.id and ii.tab = 'esp') as imgs, e.ano, e.mes, e.dia, cast(cast(e.ano as varchar(4))||'-'||right('0'||cast(e.mes as varchar(2)),2)||'-'|| right('0'||cast(e.dia as varchar(2)),2) as varchar(10)) as data, pl.pltag, e.inpa, e.det, d.ano anodet, coalesce(p.abrev,'')||' ('||coalesce(p.prenome,'')||')' determinador, d.tax, coalesce(v.val,v.valf)||' '||vu.unit as altura, coalesce(v1.val

How to delete many rows from frequently accessed table

喜欢而已 提交于 2019-12-01 17:56:46
I need to delete the majority (say, 90%) of a very large table (say, 5m rows). The other 10% of this table is frequently read, but not written to. From " Best way to delete millions of rows by ID ", I gather that I should remove any index on the 90% I'm deleting, to speed up the process (except an index I'm using to select the rows for deletion). From " PostgreSQL locking mode ", I see that this operation will acquire a ROW EXCLUSIVE lock on the entire table. But since I'm only reading the other 10%, this ought not matter. So, is it safe to delete everything in one command (i.e. DELETE FROM

How to delete many rows from frequently accessed table

為{幸葍}努か 提交于 2019-12-01 17:11:19
问题 I need to delete the majority (say, 90%) of a very large table (say, 5m rows). The other 10% of this table is frequently read, but not written to. From "Best way to delete millions of rows by ID", I gather that I should remove any index on the 90% I'm deleting, to speed up the process (except an index I'm using to select the rows for deletion). From "PostgreSQL locking mode", I see that this operation will acquire a ROW EXCLUSIVE lock on the entire table. But since I'm only reading the other

Optimize performance for queries on recent rows of a large table

时间秒杀一切 提交于 2019-12-01 09:02:34
I have a large table: CREATE TABLE "orders" ( "id" serial NOT NULL, "person_id" int4, "created" int4, CONSTRAINT "orders_pkey" PRIMARY KEY ("id") ); 90% of all requests are about orders from the last 2-3 days by a person_id , like: select * from orders where person_id = 1 and created >= extract(epoch from current_timestamp)::int - 60 * 60 * 24 * 3; How can I improve performance? I know about Partitioning , but what about existing rows? And it looks like I need to create INHERITS tables manually every 2-3 days. A partial, multicolumn index on (person_id, created) with a pseudo- IMMUTABLE

Spatial query on large table with multiple self joins performing slow

心不动则不痛 提交于 2019-12-01 06:59:39
I am working on queries on a large table in Postgres 9.3.9. It is a spatial dataset and it is spatially indexed. Say, I have need to find 3 types of objects: A, B and C. The criteria is that B and C are both within certain distance of A, say 500 meters. My query is like this: select school.osm_id as school_osm_id, school.name as school_name, school.way as school_way, restaurant.osm_id as restaurant_osm_id, restaurant.name as restaurant_name, restaurant.way as restaurant_way, bar.osm_id as bar_osm_id, bar.name as bar_name, bar.way as bar_way from ( select osm_id, name, amenity, way, way_geo

Reuse computed select value

旧街凉风 提交于 2019-12-01 06:50:02
I'm trying to use ST_SnapToGrid and then GROUP BY the grid cells (x, y). Here is what I did first: SELECT COUNT(*) AS n, ST_X(ST_SnapToGrid(geom, 50)) AS x, ST_Y(ST_SnapToGrid(geom, 50)) AS y FROM points GROUP BY x, y I don't want to recompute ST_SnapToGrid for both x and y . So I changed it to use a sub-query: SELECT COUNT(*) AS n, ST_X(geom) AS x, ST_Y(geom) AS y FROM ( SELECT ST_SnapToGrid(geom, 50) AS geom FROM points ) AS tmp GROUP BY x, y But when I run EXPLAIN , both of these queries have the exact same execution plan: GroupAggregate (...) -> Sort (...) Sort Key: (st_x(st_snaptogrid

Spatial query on large table with multiple self joins performing slow

喜欢而已 提交于 2019-12-01 04:45:53
问题 I am working on queries on a large table in Postgres 9.3.9. It is a spatial dataset and it is spatially indexed. Say, I have need to find 3 types of objects: A, B and C. The criteria is that B and C are both within certain distance of A, say 500 meters. My query is like this: select school.osm_id as school_osm_id, school.name as school_name, school.way as school_way, restaurant.osm_id as restaurant_osm_id, restaurant.name as restaurant_name, restaurant.way as restaurant_way, bar.osm_id as bar