In what order are MySQL JOINs evaluated?

后端 未结 7 1332
情书的邮戳
情书的邮戳 2020-11-30 09:26

I have the following query:

SELECT c.*
FROM companies AS c
JOIN users AS u USING(companyid)
JOIN jobs AS j USING(userid)
JOIN useraccounts AS us USING(userid         


        
相关标签:
7条回答
  • 2020-11-30 10:11

    1) Using is not exactly the same as on, but it is short hand where both tables have a column with the same name you are joining on... see: http://www.java2s.com/Tutorial/MySQL/0100__Table-Join/ThekeywordUSINGcanbeusedasareplacementfortheONkeywordduringthetableJoins.htm

    It is more difficult to read in my opinion, so I'd go spelling out the joins.

    3) It is not clear from this query, but I would guess it does not.

    2) Assuming you are joining through the other tables (not all directly on companyies) the order in this query does matter... see comparisons below:

    Origional:

    SELECT c.* 
        FROM companies AS c 
        JOIN users AS u USING(companyid) 
        JOIN jobs AS j USING(userid) 
        JOIN useraccounts AS us USING(userid) 
    WHERE j.jobid = 123
    

    What I think it is likely suggesting:

    SELECT c.* 
        FROM companies AS c 
        JOIN users AS u on u.companyid = c.companyid
        JOIN jobs AS j on j.userid = u.userid
        JOIN useraccounts AS us on us.userid = u.userid 
    WHERE j.jobid = 123
    

    You could switch you lines joining jobs & usersaccounts here.

    What it would look like if everything joined on company:

    SELECT c.* 
        FROM companies AS c 
        JOIN users AS u on u.companyid = c.companyid
        JOIN jobs AS j on j.userid = c.userid
        JOIN useraccounts AS us on us.userid = c.userid
    WHERE j.jobid = 123
    

    This doesn't really make logical sense... unless each user has their own company.

    4.) The magic of sql is that you can only show certain columns but all of them are their for sorting and filtering...

    if you returned

    SELECT c.*, j.jobid....  
    

    you could clearly see what it was filtering on, but the database server doesn't care if you output a row or not for filtering.

    0 讨论(0)
  • 2020-11-30 10:14

    In MySQL, it's often interesting to ask the query optimizer what it plans to do, with:

    EXPLAIN SELECT [...]
    

    See "7.2.1 Optimizing Queries with EXPLAIN"

    0 讨论(0)
  • 2020-11-30 10:18

    Here is a more detailed answer on JOIN precedence. In your case, the JOINs are all commutative. Let's try one where they aren't.

    Build schema:

    CREATE TABLE users (
      name text
    );
    
    CREATE TABLE orders (
      order_id text,
      user_name text
    );
    
    CREATE TABLE shipments (
      order_id text,
      fulfiller text
    );
    

    Add data:

    INSERT INTO users VALUES ('Bob'), ('Mary');
    
    INSERT INTO orders VALUES ('order1', 'Bob');
    
    INSERT INTO shipments VALUES ('order1', 'Fulfilling Mary');
    

    Run query:

    SELECT *
      FROM users
           LEFT OUTER JOIN orders
           ON orders.user_name = users.name
           JOIN shipments
           ON shipments.order_id = orders.order_id
    

    Result:

    Only the Bob row is returned

    Analysis:

    In this query the LEFT OUTER JOIN was evaluated first and the JOIN was evaluated on the composite result of the LEFT OUTER JOIN.

    Second query:

    SELECT *
      FROM users
           LEFT OUTER JOIN (
             orders
             JOIN shipments
             ON shipments.order_id = orders.order_id)
             ON orders.user_name = users.name
    

    Result:

    One row for Bob (with the fulfillment data) and one row for Mary with NULLs for fulfillment data.

    Analysis:

    The parenthesis changed the evaluation order.


    Further MySQL documentation is at https://dev.mysql.com/doc/refman/5.5/en/nested-join-optimization.html

    0 讨论(0)
  • 2020-11-30 10:19

    I can't answer the bit about the USING syntax. That's weird. I've never seen it before, having always used an ON clause instead.

    But what I can tell you is that the order of JOIN operations is determined dynamically by the query optimizer when it constructs its query plan, based on a system of optimization heuristics, some of which are:

    1. Is the JOIN performed on a primary key field? If so, this gets high priority in the query plan.

    2. Is the JOIN performed on a foreign key field? This also gets high priority.

    3. Does an index exist on the joined field? If so, bump the priority.

    4. Is a JOIN operation performed on a field in WHERE clause? Can the WHERE clause expression be evaluated by examining the index (rather than by performing a table scan)? This is a major optimization opportunity, so it gets a major priority bump.

    5. What is the cardinality of the joined column? Columns with high cardinality give the optimizer more opportunities to discriminate against false matches (those that don't satisfy the WHERE clause or the ON clause), so high-cardinality joins are usually processed before low-cardinality joins.

    6. How many actual rows are in the joined table? Joining against a table with only 100 values is going to create less of a data explosion than joining against a table with ten million rows.

    Anyhow... the point is... there are a LOT of variables that go into the query execution plan. If you want to see how MySQL optimizes its queries, use the EXPLAIN syntax.

    And here's a good article to read:

    http://www.informit.com/articles/article.aspx?p=377652


    ON EDIT:

    To answer your 4th question: You aren't querying the "companies" table. You're querying the joined cross-product of ALL four tables in your FROM and USING clauses.

    The "j.jobid" alias is just the fully-qualified name of one of the columns in that joined collection of tables.

    0 讨论(0)
  • 2020-11-30 10:22

    SEE http://dev.mysql.com/doc/refman/5.0/en/join.html

    AND start reading here:


    Join Processing Changes in MySQL 5.0.12

    Beginning with MySQL 5.0.12, natural joins and joins with USING, including outer join variants, are processed according to the SQL:2003 standard. The goal was to align the syntax and semantics of MySQL with respect to NATURAL JOIN and JOIN ... USING according to SQL:2003. However, these changes in join processing can result in different output columns for some joins. Also, some queries that appeared to work correctly in older versions must be rewritten to comply with the standard.

    These changes have five main aspects:

    • The way that MySQL determines the result columns of NATURAL or USING join operations (and thus the result of the entire FROM clause).

    • Expansion of SELECT * and SELECT tbl_name.* into a list of selected columns.

    • Resolution of column names in NATURAL or USING joins.

    • Transformation of NATURAL or USING joins into JOIN ... ON.

    • Resolution of column names in the ON condition of a JOIN ... ON.

    0 讨论(0)
  • 2020-11-30 10:23
    1. USING (fieldname) is a shorthand way of saying ON table1.fieldname = table2.fieldname.

    2. SQL doesn't define the 'order' in which JOINS are done because it is not the nature of the language. Obviously an order has to be specified in the statement, but an INNER JOIN can be considered commutative: you can list them in any order and you will get the same results.

      That said, when constructing a SELECT ... JOIN, particularly one that includes LEFT JOINs, I've found it makes sense to regard the third JOIN as joining the new table to the results of the first JOIN, the fourth JOIN as joining the results of the second JOIN, and so on.

      More rarely, the specified order can influence the behaviour of the query optimizer, due to the way it influences the heuristics.

    3. No. The way the query is assembled, it requires that companies and users both have a companyid, jobs has a userid and a jobid and useraccounts has a userid. However, only one of companies or user needs a userid for the JOIN to work.

    4. The WHERE clause is filtering the whole result -- i.e. all JOINed columns -- using a column provided by the jobs table.

    0 讨论(0)
提交回复
热议问题