I\'m partitioning a very large table that contains temporal data, and considering to what granularity I should make the partitions. The Postgres partition documentation claims
If you don't want to trust the PostgreSQL developers who wrote the code, then I recommend that you simply try it yourself and run a few example queries with explain analyze and time them using different partition schemes. Your specific hardware and software configuration is likely to dominate any answer in any case.
I'm assuming that the row optimization cache which the query optimizer uses to determine what joins and restrictions to use is stored with each partition, so it probably needs to load and read parts of each partition to plan the query.
"large numbers of partitions are likely to increase query planning time considerably" and recommends that partitioning be used with "up to perhaps a hundred" partitions.
Because every extra partition will usually be tied to check constraints, and this will lead the planner to wonder which of the partitions need to be queried against. In a best case scenario, the planner identifies that you're only hitting a single partition and gets rid of the append
step altogether.
In terms of rows, and as DNS and Seth have pointed out, your milage will vary with the hardware. Generally speaking, though, there's no significant difference between querying a 1M row table and a 10M row table -- especially if your hard drives allow for fast random access and if it's clustered (see the cluster
statement) using the index that you're most frequently hitting.
The query planner has to do a linear search of the constraint information for every partition of tables used in the query, to figure out which are actually involved--the ones that can have rows needed for the data requested. The number of query plans the planner considers grows exponentially as you join more tables. So the exact spot where that linear search adds up to enough time to be troubling really depends on query complexity. The more joins, the worse you will get hit by this. The "up to a hundred" figure came from noting that query planning time was adding up to a non-trivial amount of time even on simpler queries around that point. On web applications in particular, where latency of response time is important, that's a problem; thus the warning.
Can you support 500? Sure. But you are going to be searching every one of 500 check constraints for every query plan involving that table considered by the optimizer. If query planning time isn't a concern for you, then maybe you don't care. But most sites end up disliking the proportion of time spent on query planning with that many partitions, which is one reason why monthly partitioning is the standard for most data sets. You can easily store 10 years of data, partitioned monthly, before you start crossing over into where planning overhead starts to be noticeable.
Each Table Partition takes up an inode on the file system. "Very large" is a relative term that depends on the performance characteristics of your file system of choice. If you want explicit performance benchmarks, you could probably look at various performance benchmarks of mails systems from your OS and FS of choice. Generally speaking, I wouldn't worry about it until you get in to the tens of thousands to hundreds of thousands of table spaces (using dirhash on FreeBSD's UFS2 would be win). Also note that this same limitation applies to DATABASES, TABLES or any other filesystem backed database object in PostgreSQL.