If I stop a long running query, does it rollback?

后端未结

关注

 12  1884

A query that is used to loop through 17 millions records to remove duplicates has been running now for about 16 hours and I wanted to know if the query is sto

相关标签:

12条回答

死守一世寂寞

2020-12-06 05:05
DELETES that have been performed up to this point will not be rolled back.

As the original author of the code in question, and having issued the caveat that performance will be dependant on indexes, I would propose the following items to speed this up.

RecordId better be PRIMARY KEY. I don't mean IDENTITY, I mean PRIMARY KEY. Confirm this using sp_help

Some index should be used in evaluating this query. Figure out which of these four columns has the least repeats and index that...
```
SELECT *
FROM MyTable
WHERE @long = longitude
  AND @lat = latitude
  AND @businessname = BusinessName
  AND @phoneNumber = Phone
```
Before and After adding this index, check the query plan to see if index scanning has been added.
0 讨论(0)
发布评论:

提交评论
- 加载中...
别跟我提以往

2020-12-06 05:06
Implicit transactions

If no 'Implicit transactions' has been set, then each iteration in your loop committed the changes.

It is possible for any SQL Server to be set with 'Implicit transactions'. This is a database setting (by default is OFF). You can also have implicit transactions in the properties of a particular query inside of Management Studio (right click in query pane>options), by default settings in the client, or a SET statement.
```
SET IMPLICIT_TRANSACTIONS ON;
```
Either way, if this was the case, you would still need to execute an explicit COMMIT/ROLLBACK regardless of interruption of the query execution.

Implicit transactions reference:

http://msdn.microsoft.com/en-us/library/ms188317.aspx

http://msdn.microsoft.com/en-us/library/ms190230.aspx
0 讨论(0)
发布评论:

提交评论
- 加载中...

醉梦人生

2020-12-06 05:06

Also try thinking another method to remove duplicate rows:

delete t1 from table1 as t1 where exists (
    select * from table1 as t2 where
        t1.column1=t2.column1 and
        t1.column2=t2.column2 and
        t1.column3=t2.column3 and
        --add other colums if any
        t1.id>t2.id
)

I suppose that you have an integer id column in your table.

0 讨论(0)

北恋

2020-12-06 05:10

I'm pretty sure that is a negatory. Otherwise what would the point of transactions be?

0 讨论(0)
发布评论:

提交评论
- 加载中...

青春惊慌失措

2020-12-06 05:18

I think you need to seriously consider your methodolology. You need to start thinking in sets (although for performance you may need batch processing, but not row by row against a 17 million record table.)

First do all of your records have duplicates? I suspect not, so the first thing you wan to do is limit your processing to only those records which have duplicates. Since this is a large table and you may need to do the deletes in batches over time depending on what other processing is going on, you first pull the records you want to deal with into a table of their own that you then index. You can also use a temp table if you are going to be able to do this all at the same time without ever stopping it other wise create a table in your database and drop at the end.

Something like (Note I didn't write the create index statments, I figure you can look that up yourself):

SELECT min(m.RecordID), m.longitude, m.latitude, m.businessname, m.phone  
     into  #RecordsToKeep    
FROM MyTable   m
join 
(select longitude, latitude, businessname, phone
from MyTable
group by longitude, latitude, businessname, phone
having count(*) >1) a 
on a.longitude = m.longitude and a.latitude = m.latitude and
a.businessname = b.businessname and a.phone = b.phone 
group by  m.longitude, m.latitude, m.businessname, m.phone   
ORDER BY CASE WHEN m.webAddress is not null THEN 1 ELSE 2 END,        
    CASE WHEN m.caption1 is not null THEN 1 ELSE 2 END,        
    CASE WHEN m.caption2 is not null THEN 1 ELSE 2 END



while (select count(*) from #RecordsToKeep) > 0
begin
select top 1000 * 
into #Batch
from #RecordsToKeep

Delete m
from mytable m
join #Batch b 
        on b.longitude = m.longitude and b.latitude = m.latitude and
        b.businessname = b.businessname and b.phone = b.phone 
where r.recordid <> b.recordID

Delete r
from  #RecordsToKeep r
join #Batch b on r.recordid = b.recordid

end

Delete m
from mytable m
join #RecordsToKeep r 
        on r.longitude = m.longitude and r.latitude = m.latitude and
        r.businessname = b.businessname and r.phone = b.phone 
where r.recordid <> m.recordID

0 讨论(0)

伪装坚强ぢ

2020-12-06 05:19

If your machine doesn't have very advanced hardware then it may take sql server a very long time to complete that command. I don't know for sure how this operation is performed under the hood but based on my experience this could be done more efficiently by bringing the records out of the database and into memory for a program that uses a tree structure with a remove duplicate rule for insertion. Try reading the entirety of the table in chuncks (say 10000 rows at a time) into a C++ program using ODBC. Once in the C++ program use and std::map where key is the unique key and struct is a struct that holds the rest of the data in variables. Loop over all the records and perform insertion into the map. The map insert function will handle removing the duplicates. Since search inside a map is lg(n) time far less time to find duplicates than using your while loop. You can then delete the entire table and add the tuples back into the database from the map by forming insert queries and executing them via odbc or building a text file script and running it in management studio.

0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2

If I stop a long running query, does it rollback?

Implicit transactions