Does SQLite optimize a query with multiple AND conditions in the WHERE clause?

后端 未结 4 2117
礼貌的吻别
礼貌的吻别 2021-01-14 05:41

In SQL databases (I use Python+Sqlite), how to make sure that, if we have 1 million rows, the query

SELECT * FROM mytable WHERE myfunction(description) <          


        
4条回答
  •  野趣味
    野趣味 (楼主)
    2021-01-14 06:27

    (Updated answer based on comments and subsequent testing.)

    The actual answer to your question

    how to make sure that, if we have 1 million rows, the query ... is optimized so that the 1st condition (CPU-expensive) is only tested if the easy-to-test second condition is already True?

    depends on

    • the actual conditions in the WHERE clause, and
    • how clever the SQLite query optimizer is in estimating the cost of those conditions.

    A simple test should tell you whether your query would be sufficiently "optimized" for your needs. The good news is that SQLite will perform the easy (inexpensive) condition first, at least under certain circumstances.

    For a test table "mytable"

    CREATE TABLE mytable (
        description TEXT(50) NOT NULL,
        column2 INTEGER NOT NULL,
        CONSTRAINT mytable_PK PRIMARY KEY (column2)
    );
    

    containing a million rows

    description  column2
    -----------  -------
    row000000          0
    row000001          1
    row000002          2
    ...
    row999999     999999
    

    the Python test code

    import sqlite3
    import time
    
    log_file_spec = r'C:\Users\Gord\Desktop\log_file.txt'
    
    def myfunc(thing):
        with open(log_file_spec, 'a') as log:
            log.write('HODOR\n')
        return(int(thing[-6:]))
    
    
    with open(log_file_spec, 'w'):
        pass  # just empty the file
    cnxn = sqlite3.connect(r'C:\__tmp\SQLite\test.sqlite')
    cnxn.create_function("myfunction", 1, myfunc)
    crsr = cnxn.cursor()
    t0 = time.time()
    sql = """\
    SELECT COUNT(*) AS n FROM mytable
    WHERE myfunction(description) < 500 AND column2 < 1000
    """
    crsr.execute(sql)
    num_rows = crsr.fetchone()[0]
    print(f"{num_rows} rows found in {(time.time() - t0):.1f} seconds")
    
    cnxn.close()
    

    returns

    500 rows found in 1.2 seconds
    

    and counting the lines in log_file.txt we see

    C:\Users\Gord>find /C "HODOR" Desktop\log_file.txt
    
    ---------- DESKTOP\LOG_FILE.TXT: 1000
    

    indicating that our function was only called one thousand times, not one million times. SQLite has clearly applied the column2 < 1000 first, and then applied the myfunction(description) < 500 condition on the subset of rows from the first condition.


    (Original "off the cuff" answer.)

    The actual answer to your question depends on how clever the query optimizer is. A simple test should tell you whether your query would be sufficiently "optimized" for your needs.

    However, you do have a couple of options if your tests find that your original approach is too slow:

    Option 1: Try doing the simple comparison "first"

    Changing the order might affect the query plan, e.g.

    ... WHERE  AND 
    

    might turn out to be faster than

    ... WHERE  AND  
    

    Option 2: Try forcing the order using a subquery

    Again, depending on the cleverness of the query optimizer

    SELECT easy.* 
    FROM 
        (SELECT * FROM mytable WHERE column2 < 1000) easy
    WHERE myfunction(easy.description) < 500
    

    might apply the inexpensive condition first, then apply the expensive condition on the resulting subset of rows. (However, a comment indicates that SQLite is too sophisticated to fall for that ploy.)

提交回复
热议问题