I have a situation where I need to do an update on a very large set of rows that I can only identify by their ID (since the target records are selected by the user and have noth
How do you generate the IN clause?
If there is there another SELECT statement that generates those values, you could simply plug that into the UPDATE like so:
UPDATE TARGET_TABLE T
SET
SOME_VALUE = 'Whatever'
WHERE T.ID_NUMBER IN(
SELECT ID_NUMBER --this SELECT generates your ID #s.
FROM SOURCE_TABLE
WHERE SOME_CONDITIONS
)
In some RDBMses, you'll get better performance by using the EXISTS syntax, which would look like this:
UPDATE TARGET_TABLE T
SET
SOME_VALUE = 'Whatever'
WHERE EXISTS (
SELECT ID_NUMBER --this SELECT generates your ID #s.
FROM SOURCE_TABLE S
WHERE SOME_CONDITIONS
AND S.ID_NUMBER = T.ID_NUMBER
)
In general there are several things to consider.
We recently changed our system to limit the size of the in-clauses and always use bound variables because this reduced the number of different SQL statements and thus improved performance. Basically we generate our SQL statements and execute multiple statements if the in-clause exceeds a certain size. We don't do this for updates so we haven't had to worry about the locking. You will.
Using a temp table may not improve performance because you have to populate the temp table with the IDs. Experimentation and performance tests can tell you the answer here.
A single IN clause is very easy to understand and maintain. This is probably what you should worry about first. If you find that the performance of the queries is poor you might want to try a different strategy and see if it helps, but don't optimize prematurely. The IN-clause is semantically correct so leave it alone if it isn't broken.
I don't know the type of values in your IN list. If they are most of the values from 1 to 10,000, you might be able to process them to get something like:
WHERE MyID BETWEEN 1 AND 10000 AND MyID NOT IN (3,7,4656,987)
Or, if the NOT IN list would still be long, processing the list and generating a bunch of BETWEEN statements:
WHERE MyID BETWEEN 1 AND 343 AND MyID BETWEEN 344 AND 400 ...
And so forth.
Last of all, you don't have to worry about how Jet will process an IN clause if you use a passthrough query. You can't do that in code, but you could have a saved QueryDef that is defined as a passthrough and alter the WHERE clause in code at runtime to use your IN list. Then it's all passed off to SQL Server, and SQL Server will decide best how to process it.
I would use a table-variable / temp-table; insert the values into this, and join to it. Then you can use the same set multiple times. This works especially well if you are (for example) passing down a CSV of IDs as varchar. As a SQL Server example:
DECLARE @ids TABLE (id int NOT NULL)
INSERT @ids
SELECT value
FROM dbo.SplitCsv(@arg) // need to define separately
UPDATE t
SET t. // etc
FROM [TABLE] t
INNER JOIN @ids #i ON #i.id = t.id
In Oracle there is a limit of values you can put into a IN clause. So you better use a OR , x=1 or x=2 ... those are not limited, as far as I know.
If you were on Oracle, I'd recommend using table functions, similar to Marc Gravell's post.
-- first create a user-defined collection type, a table of numbers
create or replace type tbl_foo as table of number;
declare
temp_foo tbl_foo;
begin
-- this could be passed in as a parameter, for simplicity I am hardcoding it
temp_foo := tbl_foo(7369, 7788);
-- here I use a table function to treat my temp_foo variable as a table,
-- and I join it to the emp table as an alternative to a massive "IN" clause
select e.*
from emp e,
table(temp_foo) foo
where e.empno = foo.column_value;
end;