问题
I'm testing MonetDB and getting an amazing performance while querying millions of rows on my laptop.
I expect to work with billions in production and I need to update the data as often as possible, let say each 1 minute or 5 minutes worst case. Just updating existing records or adding new ones, deletion can be scheduled once a day.
I've seen a good performance for the updates on my tests, but i'm a bit worried about same operations over three of four times more data.
About BULK insert, got 1 million rows in 5 secs, so good enough performance right now as well. I have not tried deletion.
Everything works fine unless you run queries at the same time you update the data, in this case all seems to be frozen for a long-long-long time.
So, what's the best strategy to get MonetDB updated in background? Thanks
回答1:
You could do each load in a new table with the same schema, then create a VIEW
that unions them all together. Queries will run on the view, and dropping and recreating that view is very fast.
However, it would probably be best to merge some of these smaller tables together every now and then. For example, a nightly job could combine all load tables from the previous day(s) into a new table (runs independently, no problem) and then recreate the view again.
Alternatively, you could use the BINARY COPY INTO
to speed up the loading process in the first place.
回答2:
There is a new merge table functionnality that could replace the view in Hannes Mühleisen answer and would be more idiomatic. You can attach / detach partitions using:
alter table mergedTable ADD/DROP table partitionTable
It will be problematic for updates as they must be made directly to the partition tables easier if you have a partitionning key (date/...) But it was the same with the previous solution.
来源:https://stackoverflow.com/questions/28122178/monetdb-refresh-data-in-background-best-strategy-with-active-connections-making