Efficiently querying a huge time series table for one row every 15 minutes

I have two tables, conttagtable (t) and contfloattable (cf). T has about 43k rows. CF has over 9 billion.

I created an index on both tables on the tagindex column on both tables. This column can be thought of as a unique identifier for conttagtable and as a foreign key into conttagtable for confloattable. I didn't explicitly create a PK or foreign key on either table relating to the other, although this data is logically related by the tagindex column on both tables as if conttagtable.tagindex were a PRIMARY KEY and contfloattable.tagindex where a FOREIGN KEY (tagindex) REFERENCES conttagtable(tagindex). The data came from a microsoft access dump and I didn't know if I could trust tagindex to be unique, so "uniqueness" is not enforced.

The data itself is extremely large.

I need to obtain single arbitrarily selected row from contfloattable for each 15-minute contfloattable.dateandtime interval for each conttagtable.tagid. So, if the contfloattable for a given tagid has 4000 samples spanning 30 minutes, I need a sample from the 0-14 minute range and a sample from the 15-30 minute range. Any one sample within the 15 minute range is acceptable; 1st, last, random, whatever.

In a nutshell, I need to get a sample every 15 minutes but only one sample per t.tagname. The samples right now are recorded every 5 seconds and the data spans two years. This is a big data problem and way over my head in terms of sql. All of the time interval solutions I have tried from googling or searching on SO have yielded query times that are so long that they are not practical.

Are my indexes sufficient for a fast join? (they appear to be when leaving out the time interval part)
Would I benefit by the addition of any other indexes?
What's the best/fastest query that accomplish the above goals?

Here's an SQLFiddle containing the schema and some sample data: http://sqlfiddle.com/#!1/c7d2f/2

Schema:

        Table "public.conttagtable" (t)
   Column    |  Type   | Modifiers
-------------+---------+-----------
 tagname     | text    |
 tagindex    | integer |
 tagtype     | integer |
 tagdatatype | integer |
Indexes:
    "tagindex" btree (tagindex)


             Table "public.contfloattable" (CF)
   Column    |            Type             | Modifiers
-------------+-----------------------------+-----------
 dateandtime | timestamp without time zone |
 millitm     | integer                     |
 tagindex    | integer                     |
 Val         | double precision            |
 status      | text                        |
 marker      | text                        |
Indexes:
    "tagindex_contfloat" btree (tagindex)

The output i'd like to see is something like this:

cf.dateandtime      |cf."Val"|cf.status|t.tagname
--------------------------------------------------
2012-11-16 00:00:02  45       S         SuperAlpha
2012-11-16 00:00:02  45       S         SuperBeta
2012-11-16 00:00:02  45       S         SuperGamma
2012-11-16 00:00:02  45       S         SuperDelta
2012-11-16 00:15:02  45       S         SuperAlpha
2012-11-16 00:15:02  45       S         SuperBeta
2012-11-16 00:15:02  45       S         SuperGamma
2012-11-16 00:15:02  45       S         SuperDelta
2012-11-16 00:30:02  45       S         SuperAlpha
2012-11-16 00:30:02  45       S         SuperBeta
2012-11-16 00:30:02  45       S         SuperGamma
2012-11-16 00:30:02  45       S         SuperDelta
2012-11-16 00:45:02  42       S         SuperAlpha

...etc etc...

As suggested by Clodoaldo, this is my latest attempt, any suggestions to speed it up?

with i as (
    select cf.tagindex, min(dateandtime) dateandtime
    from contfloattable cf
    group by
        floor(extract(epoch from dateandtime) / 60 / 15),
        cf.tagindex
)
select cf.dateandtime, cf."Val", cf.status, t.tagname
from
    contfloattable cf
    inner join
    conttagtable t on cf.tagindex = t.tagindex
    inner join
    i on i.tagindex = cf.tagindex and i.dateandtime = cf.dateandtime
order by floor(extract(epoch from cf.dateandtime) / 60 / 15), cf.tagindex

Query plan from the above: http://explain.depesz.com/s/loR

Clodoaldo Neto

For 15 minutes intervals:

with i as (
    select cf.tagindex, min(dateandtime) dateandtime
    from contfloattable cf
    group by
        floor(extract(epoch from dateandtime) / 60 / 15),
        cf.tagindex
)
select cf.dateandtime, cf."Val", cf.status, t.tagname
from
    contfloattable cf
    inner join
    conttagtable t on cf.tagindex = t.tagindex
    inner join
    i on i.tagindex = cf.tagindex and i.dateandtime = cf.dateandtime
order by cf.dateandtime, t.tagname

Show the explain output for this query (if it works) so we can try to optimize. You can post it in this answer.

Explain Output

"Sort  (cost=15102462177.06..15263487805.24 rows=64410251271 width=57)"
"  Sort Key: cf.dateandtime, t.tagname"
"  CTE i"
"    ->  HashAggregate  (cost=49093252.56..49481978.32 rows=19436288 width=12)"
"          ->  Seq Scan on contfloattable cf  (cost=0.00..38528881.68 rows=1408582784 width=12)"
"  ->  Hash Join  (cost=270117658.06..1067549320.69 rows=64410251271 width=57)"
"        Hash Cond: (cf.tagindex = t.tagindex)"
"        ->  Merge Join  (cost=270117116.39..298434544.23 rows=1408582784 width=25)"
"              Merge Cond: ((i.tagindex = cf.tagindex) AND (i.dateandtime = cf.dateandtime))"
"              ->  Sort  (cost=2741707.02..2790297.74 rows=19436288 width=12)"
"                    Sort Key: i.tagindex, i.dateandtime"
"                    ->  CTE Scan on i  (cost=0.00..388725.76 rows=19436288 width=12)"
"              ->  Materialize  (cost=267375409.37..274418323.29 rows=1408582784 width=21)"
"                    ->  Sort  (cost=267375409.37..270896866.33 rows=1408582784 width=21)"
"                          Sort Key: cf.tagindex, cf.dateandtime"
"                          ->  Seq Scan on contfloattable cf  (cost=0.00..24443053.84 rows=1408582784 width=21)"
"        ->  Hash  (cost=335.74..335.74 rows=16474 width=44)"
"              ->  Seq Scan on conttagtable t  (cost=0.00..335.74 rows=16474 width=44)"

It looks like you need this index:

create index cf_tag_datetime on contfloattable (tagindex, dateandtime)

Run analyze after creating it. Now notice that any index on a big table will have a significant performance impact on data changes (insert etc) as it will have to be updated at each change.

Update

I added the cf_tag_datetime index (tagindex,dateandtime) and here's the new explain:

"Sort  (cost=15349296514.90..15512953953.25 rows=65462975340 width=57)"
"  Sort Key: cf.dateandtime, t.tagname"
"  CTE i"
"    ->  HashAggregate  (cost=49093252.56..49490287.76 rows=19851760 width=12)"
"          ->  Seq Scan on contfloattable cf  (cost=0.00..38528881.68 rows=1408582784 width=12)"
"  ->  Hash Join  (cost=270179293.86..1078141313.22 rows=65462975340 width=57)"
"        Hash Cond: (cf.tagindex = t.tagindex)"
"        ->  Merge Join  (cost=270178752.20..298499296.08 rows=1408582784 width=25)"
"              Merge Cond: ((i.tagindex = cf.tagindex) AND (i.dateandtime = cf.dateandtime))"
"              ->  Sort  (cost=2803342.82..2852972.22 rows=19851760 width=12)"
"                    Sort Key: i.tagindex, i.dateandtime"
"                    ->  CTE Scan on i  (cost=0.00..397035.20 rows=19851760 width=12)"
"              ->  Materialize  (cost=267375409.37..274418323.29 rows=1408582784 width=21)"
"                    ->  Sort  (cost=267375409.37..270896866.33 rows=1408582784 width=21)"
"                          Sort Key: cf.tagindex, cf.dateandtime"
"                          ->  Seq Scan on contfloattable cf  (cost=0.00..24443053.84 rows=1408582784 width=21)"
"        ->  Hash  (cost=335.74..335.74 rows=16474 width=44)"
"              ->  Seq Scan on conttagtable t  (cost=0.00..335.74 rows=16474 width=44)"

It seems to have gone up in time :( However, if I remove the order by clause (not exactly what i need, but would work), this is what happens, big reduction:

"Hash Join  (cost=319669581.62..1127631600.98 rows=65462975340 width=57)"
"  Hash Cond: (cf.tagindex = t.tagindex)"
"  CTE i"
"    ->  HashAggregate  (cost=49093252.56..49490287.76 rows=19851760 width=12)"
"          ->  Seq Scan on contfloattable cf  (cost=0.00..38528881.68 rows=1408582784 width=12)"
"  ->  Merge Join  (cost=270178752.20..298499296.08 rows=1408582784 width=25)"
"        Merge Cond: ((i.tagindex = cf.tagindex) AND (i.dateandtime = cf.dateandtime))"
"        ->  Sort  (cost=2803342.82..2852972.22 rows=19851760 width=12)"
"              Sort Key: i.tagindex, i.dateandtime"
"              ->  CTE Scan on i  (cost=0.00..397035.20 rows=19851760 width=12)"
"        ->  Materialize  (cost=267375409.37..274418323.29 rows=1408582784 width=21)"
"              ->  Sort  (cost=267375409.37..270896866.33 rows=1408582784 width=21)"
"                    Sort Key: cf.tagindex, cf.dateandtime"
"                    ->  Seq Scan on contfloattable cf  (cost=0.00..24443053.84 rows=1408582784 width=21)"
"  ->  Hash  (cost=335.74..335.74 rows=16474 width=44)"
"        ->  Seq Scan on conttagtable t  (cost=0.00..335.74 rows=16474 width=44)"

I have not yet tried this index...will do so though. standby.

Now looking at it again I think the inverse index could be even better as it can be used not only in the Merge Join but also in the final Sort:

create index cf_tag_datetime on contfloattable (dateandtime, tagindex)

Here's another formulation. I'll be quite curious to see how it scales on the full data set. Create this index first:

CREATE INDEX contfloattable_tag_and_timeseg
ON contfloattable(tagindex, (floor(extract(epoch FROM dateandtime) / 60 / 15) ));

then run this with as much work_mem as you can afford:

SELECT 
  (first_value(x) OVER (PARTITION BY x.tagindex, floor(extract(epoch FROM x.dateandtime) / 60 / 15))).*,
  (SELECT t.tagname FROM conttagtable t WHERE t.tagindex = x.tagindex) AS tagname
FROM contfloattable x ORDER BY dateandtime, tagname;

Sneaky Wombat: Explain from above sql on the full data set (without suggested index): http://explain.depesz.com/s/kGo

Alternately, here's one where only one sequential pass across contfloattable should be required, with values collected into a tuplestore that's then JOINed against to get the tag name. It requires lots of work_mem:

SELECT cf.dateandtime, cf.dataVal, cf.status, t.tagname
FROM 
  (
    SELECT (first_value(x) OVER (PARTITION BY x.tagindex, floor(extract(epoch FROM x.dateandtime) / 60 / 15))).*
    FROM contfloattable x
  ) cf
  INNER JOIN
  conttagtable t ON cf.tagindex = t.tagindex
ORDER BY cf.dateandtime, t.tagname;

Sneaky Wombat: Explain from above sql on the full data set (without suggested index): http://explain.depesz.com/s/57q

If it works you'll want to throw as much work_mem as you can afford at the query. You haven't mentioned your system's RAM, but you'll want a decent chunk of it; try:

SET work_mem = '500MB';

... or more if you have at least 4GB of RAM and are on a 64-bit CPU. Again, I'd be really interested to see how it works on the full data set.

BTW, for correctness of these queries I'd advise you to ALTER TABLE conttagtable ADD PRIMARY KEY (tagindex); then DROP INDEX t_tagindex;. It'll take some time as it'll be building a unique index. Most of the queries mentioned here are assuming that t.tagindex is unique in conttagtable, and that really should be enforced. The unique index can be used for additional optimisations that the old non-unique t_tagindex cannot, and it produces much better statistics estimates.

Also, when comparing query plans, note that cost isn't necessarily strictly proportional to real-world execution time. If the estimates are good then it should roughly correlate, but the estimates are only that. Sometimes you'll see a high-cost plan execute faster than a supposedly low-cost plan due to things like bad rowcount estimates or index selectivity estimates, limitations in the query planner's ability to infer relationships, unexpected correlations, or cost parameters like random_page_cost and seq_page_cost that don't match the real system.

来源：https://stackoverflow.com/questions/12921598/efficiently-querying-a-huge-time-series-table-for-one-row-every-15-minutes

标签

sql

postgresql

aggregate

group-by

postgresql-performance