If I stop a long running query, does it rollback?

后端 未结 12 1883
忘了有多久
忘了有多久 2020-12-06 04:23

A query that is used to loop through 17 millions records to remove duplicates has been running now for about 16 hours and I wanted to know if the query is sto

相关标签:
12条回答
  • 2020-12-06 04:57

    no, sql server will not roll back the deletes it has already performed if you stop query execution. oracle requires an explicit committal of action queries or the data gets rolled back, but not mssql.

    with sql server it will not roll back unless you are specifically running in the context of a transaction and you rollback that transaction, or the connection closes without the transaction having been committed. but i don't see a transaction context in your above query.

    you could also try re-structuring your query to make the deletes a little more efficient, but essentially if the specs of your box are not up to snuff then you might be stuck waiting it out.

    going forward, you should create a unique index on the table to keep yourself from having to go through this again.

    0 讨论(0)
  • 2020-12-06 04:57

    I think this query would be much more efficient if it was re-written using a single-pass algorithm using a cursor. You would order you cursor table by longitude,latitude,BusinessName AND @phoneNumber. You’d step through the rows one at a time. If a row has the same longitude, latitude, businessname, and phonenumber as the previous row, then delete it.

    0 讨论(0)
  • 2020-12-06 05:00

    If you don't do anything explicit about transactions then the connection will be in autocommit transactions mode. In this mode every SQL statement is considered a transaction.

    The question is whether this means the individual SQL statements are transactions and are therefore being committed as you go, or whether the outer WHILE loop counts as a transaction.

    There doesn't seem to be any discussion of this in the description of the WHILE construct on MSDN. However, since a WHILE statement can't directly modify the database it would seem logical that it doesn't start an auto-commit transaction.

    0 讨论(0)
  • 2020-12-06 05:01

    I inherited a system which had logic something like yours implemented in SQL. In our case, we were trying to link together rows using fuzzy matching that had similar names/addresses, etc, and that logic was done purely in SQL. At the time I inherited it we had about 300,000 rows in the table and according to the timings, we calculated it would take A YEAR to match them all.

    As an experiment to see how much faster I could do it outside of SQL, I wrote a program to dump the db table into flat files, read the flat files into a C++ program, build my own indexes, and do the fuzzy logic there, then reimport the flat files into the database. What took A YEAR in SQL took about 30 seconds in the C++ app.

    So, my advice is, don't even try what you are doing in SQL. Export, process, re-import.

    0 讨论(0)
  • 2020-12-06 05:04

    As a loop your query will struggle to scale well, even with appropriate indexes. The query should be rewritten to a single statement, as per the suggestions in your previous question on this.

    If you're not running it explicitly within a transaction it will only roll back the executing statement.

    0 讨论(0)
  • 2020-12-06 05:05

    Your query is not wrapped in a transaction, so it won't rollback the changes already made by the individual delete statements.

    I specifically tested this myself on my own SQL Server using the following query, and the ApplicationLog table was empty even though I cancelled the query:

    declare @count int
    select @count = 5
    WHILE @count > 0
    BEGIN
      print @count
      delete from applicationlog;
      waitfor time '20:00';
      select @count = @count -1
    END
    

    However your query is likely to take many days or weeks, much longer then 15 hours. Your estimate that you can process 2000 records every 6 seconds is wrong because each iteration in your while loop will take significantly longer with 17 million rows then it does with 2000 rows. So unless your query takes significantly less then a second for 2000 rows, it will take days for all 17 million.

    You should ask a new question on how you can delete duplicate rows efficiently.

    0 讨论(0)
提交回复
热议问题