SQL Server Efficiently dropping a group of rows with millions and millions of rows

前端 未结 13 575
遇见更好的自我
遇见更好的自我 2021-02-04 12:26

I recently asked this question: MS SQL share identity seed amongst tables (Many people wondered why)

I have the following layout of a table:

Table: Star

相关标签:
13条回答
  • 2021-02-04 13:24

    Just taking your idea of many tables - how can you realise that...

    What about using dynamic queries.

    1. create the table of categories that have identity category_id column.
    2. create the trigger on insert for this tale - in it create table for stars with the name dynamically made from category_id.
    3. create the trigger on delete - in it drop the corresponding stars table also with the help of dynamically created sql.
    4. to select stars of concrete category you can use function that returns table. It will take category_id as a parameter and return result also through dynamic query.
    5. to insert stars of new category you firstly insert new row in categories table and then insert stars to appropriate table.

    Another direction in which I would make some researches is using xml typed column for storing stars data. The main idea here is if you need to operate stars only by categories than why not to store all stars of concrete category in one cell of the table in xml format. Unfortunately I absolutely cannot imaging what will be the performance of such decision.

    Both this variants are just like ideas in brainstorm.

    0 讨论(0)
  • 2021-02-04 13:25

    I know this is a bit of a tangent, but is SQL Server (or any relational database) really a good tool for this job? What relation database features are you actually using?

    If you are dropping whole categories at a time, you can't have much referential integrity depending on it. The data is read only, so you don't need ACID for data updates.

    Sounds to me like you are using basic SELECT query features?

    0 讨论(0)
  • 2021-02-04 13:28

    I didn't get an answer to my comment on the original post, so I am going under some assumptions...

    Here's my idea: use multiple databases, one for each category.

    You can use the managed ESE database that ships with every version of Windows, for free.

    Use the PersistentDictionary object, and keep track of the starid, starname pairs that way. If you need to delete a category, just delete the PersistentDictionary object for that category.

    PersistentDictionary<int, string> starsForCategory = new PersistentDictionary<int, string>("Category1");
    

    This will create a database called "Category1", on which you can use standard .NET dictionary methods (add, exists, foreach, etc).

    0 讨论(0)
  • 2021-02-04 13:31

    It sounds like the transaction log is struggling with the size of the delete. The transaction log grows in units, and this takes time whilst it allocates more disk space.

    It is not possible to delete rows from a table without enlisting a transaction, although it is possible to truncate a table using the TRUNCATE command. However this will remove all rows in the table without condition.

    I can offer the following suggestions:

    1. Switch to a non-transactional database or possibly flat files. It doesn't sound like you need atomicity of a transactional database.

    2. Attempt the following. After every x deletes (depending on size) issue the following statement

    BACKUP LOG WITH TRUNCATE_ONLY;

    This simply truncates the transaction log, the space remains for the log to refill. However Im not sure howmuch time this will add to the operation.

    0 讨论(0)
  • 2021-02-04 13:31

    What do you do with the star data? If you only look at data for one category at any given time this might work, but it is hard to maintain. Every time you have a new category, you will have to build a new table. If you want to query across categories, it becomes more complex and possibly more expensive in terms of time. If you do this and do want to query across categories a view is probably best (but do not pile views on top of views). If you are looking for data on a particular star, would you know which table to query? If not then how are you going to determine which table or are you goign to query them all? When entering data, how will the application decide which table to put the data into? How many categories will there be? And incidentally relating to each having a separate id, use the bigint identities and combine the identity with the category type for your unique identifier.

    Truly do you need to delete the whole category or only the star that the data changed for? And do you need to delete at all, maybe you only need to update information.

    Have you tried deleting in batches (1000 records or so at a time in a loop). This is often much faster than deleting a million records in one delete statement. It often keeps the table from getting locked during the delete as well.

    Another technique is mark the record for deletion. Then you can run a batch process when usage is low to delete those records and your queries can run on a view that excludes the records marked for deletion.

    Given your answers, I think your proposal may be reasonable.

    0 讨论(0)
  • 2021-02-04 13:32

    Must you delete them? Often it is better to just set an IsDeleted bit column to 1, and then do the actual deletion asynchronously during off hours.

    Edit:

    This is a shot in the dark, but adding a clustered index on CategoryId may speed up deletes. It may also impact other queries adversely. Is this something you can test?

    0 讨论(0)
提交回复
热议问题