I recently asked this question: MS SQL share identity seed amongst tables (Many people wondered why)
I have the following layout of a table:
Table: Star
Just taking your idea of many tables - how can you realise that...
What about using dynamic queries.
Another direction in which I would make some researches is using xml typed column for storing stars data. The main idea here is if you need to operate stars only by categories than why not to store all stars of concrete category in one cell of the table in xml format. Unfortunately I absolutely cannot imaging what will be the performance of such decision.
Both this variants are just like ideas in brainstorm.
I know this is a bit of a tangent, but is SQL Server (or any relational database) really a good tool for this job? What relation database features are you actually using?
If you are dropping whole categories at a time, you can't have much referential integrity depending on it. The data is read only, so you don't need ACID for data updates.
Sounds to me like you are using basic SELECT query features?
I didn't get an answer to my comment on the original post, so I am going under some assumptions...
Here's my idea: use multiple databases, one for each category.
You can use the managed ESE database that ships with every version of Windows, for free.
Use the PersistentDictionary object, and keep track of the starid, starname pairs that way. If you need to delete a category, just delete the PersistentDictionary object for that category.
PersistentDictionary<int, string> starsForCategory = new PersistentDictionary<int, string>("Category1");
This will create a database called "Category1", on which you can use standard .NET dictionary methods (add, exists, foreach, etc).
It sounds like the transaction log is struggling with the size of the delete. The transaction log grows in units, and this takes time whilst it allocates more disk space.
It is not possible to delete rows from a table without enlisting a transaction, although it is possible to truncate a table using the TRUNCATE command. However this will remove all rows in the table without condition.
I can offer the following suggestions:
Switch to a non-transactional database or possibly flat files. It doesn't sound like you need atomicity of a transactional database.
Attempt the following. After every x deletes (depending on size) issue the following statement
BACKUP LOG WITH TRUNCATE_ONLY;
This simply truncates the transaction log, the space remains for the log to refill. However Im not sure howmuch time this will add to the operation.
What do you do with the star data? If you only look at data for one category at any given time this might work, but it is hard to maintain. Every time you have a new category, you will have to build a new table. If you want to query across categories, it becomes more complex and possibly more expensive in terms of time. If you do this and do want to query across categories a view is probably best (but do not pile views on top of views). If you are looking for data on a particular star, would you know which table to query? If not then how are you going to determine which table or are you goign to query them all? When entering data, how will the application decide which table to put the data into? How many categories will there be? And incidentally relating to each having a separate id, use the bigint identities and combine the identity with the category type for your unique identifier.
Truly do you need to delete the whole category or only the star that the data changed for? And do you need to delete at all, maybe you only need to update information.
Have you tried deleting in batches (1000 records or so at a time in a loop). This is often much faster than deleting a million records in one delete statement. It often keeps the table from getting locked during the delete as well.
Another technique is mark the record for deletion. Then you can run a batch process when usage is low to delete those records and your queries can run on a view that excludes the records marked for deletion.
Given your answers, I think your proposal may be reasonable.
Must you delete them? Often it is better to just set an IsDeleted
bit column to 1, and then do the actual deletion asynchronously during off hours.
Edit:
This is a shot in the dark, but adding a clustered index on CategoryId
may speed up deletes. It may also impact other queries adversely. Is this something you can test?