Selecting multiple rows by ID, is there a faster way than WHERE IN

后端 未结 3 1660
太阳男子
太阳男子 2021-01-04 05:46

I have a SQL Table and I would like to select multiple rows by ID. For example I would like to get the row with IDs 1, 5 and 9 from my table.

I have been doing this

相关标签:
3条回答
  • 2021-01-04 05:55

    I guess if you joined your table with a memory table indexed by a primary key, such as:

    declare @tbl table (ids int primary key)
    

    you could fill this table with the id's you need, and preform an optimized inner join.

    The problem could be the time it would take to fill it. I guess you could either have a linked server for that, or maybe use BCP utility to fill a temporary table this and then delete it.

    0 讨论(0)
  • 2021-01-04 06:10

    First, I think it is a stretch to claim that your data is suggestive of O(n log(n)). (It is great that you did the performance test, by the way.) Here is the time per value:

    1000    0.046
    2000    0.047
    3000    0.083
    4000    0.079
    5000    0.078
    6000    0.078
    7000    0.079
    8000    0.081
    9000    0.083
    10000   0.085
    

    Although there is a slight increase as time goes up, the jump from 2000-3000 is much, much more prominent. If this is reproducible, the question to me is why such a discontinuity.

    To me, this is more suggestion of O(n) and O(n log(n)). BUT, empirical estimates of theoretical values are difficult to approximate. So, the exact limit is not so important.

    I would expect performance to be O(n) (where n is the actual value and not the bit-length as it is in some estimates). My understanding is that the in behaves like a giant set of ors. Most records fail the test, so they have to do all the comparisons. Hence the O(n).

    The next question is if you have an index on the id field. In that case, you can get the set of matching ids in O(n log(n)) time (log (n)for traversing the index andn` for doing it for each value). This seams worse, but we have left out the factor for the size of the original table. This should be a big win.

    As Andre suggests, you can load a table and do a join to a temporary table. I would leave out the index, because you are probably better off using the index on the larger table. This should get you O(n log(n)) -- with no (significant) dependency on the size of the original table. Or, you can leave out the index and have O(n * m) where m is the size of the original table. I think any index build on the temporary table gets you back to O(n log(n)) performance (assuming the data is not presorted).

    Placing everything in a query has a similar, unstated problem -- parsing the query. This takes longer as the string gets longer.

    In short, I commend you for doing performance measurements, but not for coming to conclusions about algorithmic complexity. I don't think your data supports your conclusion. Also, the handling of queries is a bit more complicated than you suggest, and you have left out the size of the larger table -- which can have a dominant affect. And, I'm quite curious what is happening between 2000 and 3000 rows.

    0 讨论(0)
  • 2021-01-04 06:12

    OK so I got it going really fast by defining a table type and then passing that type directly into the query and joining onto it.

    in SQL

    CREATE TYPE [dbo].[IntTable] AS TABLE(
        [value] [int] NULL
    )
    

    in code

    DataTable dataTable = new DataTable("mythang");
    dataTable.Columns.Add("value", typeof(Int32));
    
    toSelect.ToList().ForEach(selectItem => dataTable.Rows.Add(selectItem));
    
    using (SqlCommand command = new SqlCommand(
        @"SELECT * 
        FROM [dbo].[Entities] e 
        INNER JOIN @ids on e.id = value", con))
    {
        var parameter = command.Parameters.AddWithValue("@ids", dataTable);
        parameter.SqlDbType = System.Data.SqlDbType.Structured;
        parameter.TypeName = "IntTable";
    
        using (SqlDataReader reader = command.ExecuteReader())
        {
            while (reader.Read())
            {
                results.Add(reader.GetInt32(0));
            }
        }
    }
    

    this produces the following results

    Querying for 1 random keys (passed in table value) took 2ms
    Querying for 1000 random keys (passed in table value) took 3ms
    Querying for 2000 random keys (passed in table value) took 4ms
    Querying for 3000 random keys (passed in table value) took 6ms
    Querying for 4000 random keys (passed in table value) took 8ms
    Querying for 5000 random keys (passed in table value) took 9ms
    Querying for 6000 random keys (passed in table value) took 11ms
    Querying for 7000 random keys (passed in table value) took 13ms
    Querying for 8000 random keys (passed in table value) took 17ms
    Querying for 9000 random keys (passed in table value) took 16ms
    Querying for 10000 random keys (passed in table value) took 18ms
    
    0 讨论(0)
提交回复
热议问题