If I stop a long running query, does it rollback?

后端 未结 12 1884
忘了有多久
忘了有多久 2020-12-06 04:23

A query that is used to loop through 17 millions records to remove duplicates has been running now for about 16 hours and I wanted to know if the query is sto

相关标签:
12条回答
  • 2020-12-06 05:05

    DELETES that have been performed up to this point will not be rolled back.


    As the original author of the code in question, and having issued the caveat that performance will be dependant on indexes, I would propose the following items to speed this up.

    RecordId better be PRIMARY KEY. I don't mean IDENTITY, I mean PRIMARY KEY. Confirm this using sp_help

    Some index should be used in evaluating this query. Figure out which of these four columns has the least repeats and index that...

    SELECT *
    FROM MyTable
    WHERE @long = longitude
      AND @lat = latitude
      AND @businessname = BusinessName
      AND @phoneNumber = Phone
    

    Before and After adding this index, check the query plan to see if index scanning has been added.

    0 讨论(0)
  • 2020-12-06 05:06

    Implicit transactions

    If no 'Implicit transactions' has been set, then each iteration in your loop committed the changes.

    It is possible for any SQL Server to be set with 'Implicit transactions'. This is a database setting (by default is OFF). You can also have implicit transactions in the properties of a particular query inside of Management Studio (right click in query pane>options), by default settings in the client, or a SET statement.

    SET IMPLICIT_TRANSACTIONS ON;
    

    Either way, if this was the case, you would still need to execute an explicit COMMIT/ROLLBACK regardless of interruption of the query execution.


    Implicit transactions reference:

    http://msdn.microsoft.com/en-us/library/ms188317.aspx

    http://msdn.microsoft.com/en-us/library/ms190230.aspx

    0 讨论(0)
  • 2020-12-06 05:06

    Also try thinking another method to remove duplicate rows:

    delete t1 from table1 as t1 where exists (
        select * from table1 as t2 where
            t1.column1=t2.column1 and
            t1.column2=t2.column2 and
            t1.column3=t2.column3 and
            --add other colums if any
            t1.id>t2.id
    )
    

    I suppose that you have an integer id column in your table.

    0 讨论(0)
  • 2020-12-06 05:10

    I'm pretty sure that is a negatory. Otherwise what would the point of transactions be?

    0 讨论(0)
  • 2020-12-06 05:18

    I think you need to seriously consider your methodolology. You need to start thinking in sets (although for performance you may need batch processing, but not row by row against a 17 million record table.)

    First do all of your records have duplicates? I suspect not, so the first thing you wan to do is limit your processing to only those records which have duplicates. Since this is a large table and you may need to do the deletes in batches over time depending on what other processing is going on, you first pull the records you want to deal with into a table of their own that you then index. You can also use a temp table if you are going to be able to do this all at the same time without ever stopping it other wise create a table in your database and drop at the end.

    Something like (Note I didn't write the create index statments, I figure you can look that up yourself):

    SELECT min(m.RecordID), m.longitude, m.latitude, m.businessname, m.phone  
         into  #RecordsToKeep    
    FROM MyTable   m
    join 
    (select longitude, latitude, businessname, phone
    from MyTable
    group by longitude, latitude, businessname, phone
    having count(*) >1) a 
    on a.longitude = m.longitude and a.latitude = m.latitude and
    a.businessname = b.businessname and a.phone = b.phone 
    group by  m.longitude, m.latitude, m.businessname, m.phone   
    ORDER BY CASE WHEN m.webAddress is not null THEN 1 ELSE 2 END,        
        CASE WHEN m.caption1 is not null THEN 1 ELSE 2 END,        
        CASE WHEN m.caption2 is not null THEN 1 ELSE 2 END
    
    
    
    while (select count(*) from #RecordsToKeep) > 0
    begin
    select top 1000 * 
    into #Batch
    from #RecordsToKeep
    
    Delete m
    from mytable m
    join #Batch b 
            on b.longitude = m.longitude and b.latitude = m.latitude and
            b.businessname = b.businessname and b.phone = b.phone 
    where r.recordid <> b.recordID
    
    Delete r
    from  #RecordsToKeep r
    join #Batch b on r.recordid = b.recordid
    
    end
    
    Delete m
    from mytable m
    join #RecordsToKeep r 
            on r.longitude = m.longitude and r.latitude = m.latitude and
            r.businessname = b.businessname and r.phone = b.phone 
    where r.recordid <> m.recordID
    
    0 讨论(0)
  • 2020-12-06 05:19

    If your machine doesn't have very advanced hardware then it may take sql server a very long time to complete that command. I don't know for sure how this operation is performed under the hood but based on my experience this could be done more efficiently by bringing the records out of the database and into memory for a program that uses a tree structure with a remove duplicate rule for insertion. Try reading the entirety of the table in chuncks (say 10000 rows at a time) into a C++ program using ODBC. Once in the C++ program use and std::map where key is the unique key and struct is a struct that holds the rest of the data in variables. Loop over all the records and perform insertion into the map. The map insert function will handle removing the duplicates. Since search inside a map is lg(n) time far less time to find duplicates than using your while loop. You can then delete the entire table and add the tuples back into the database from the map by forming insert queries and executing them via odbc or building a text file script and running it in management studio.

    0 讨论(0)
提交回复
热议问题