Is there any rule of thumb to construct SQL query from a human-readable description?

问题

Whenever there is any description of query in front of us, we try to apply heuristics and brainstorming to construct the query.

Is there any systematic step-by-step or mathematical way to construct SQL query from a given human-readable description?

For instance, how to determine that, whether a SQL query would need a join rather than a subquery, whether it would require a group by, whether it would require a IN clause, etc....

For example, whoever studied Digital Electronics would be aware of the methods like Karnaugh Map or Quin McClausky method. These, are some systematic approaches to simplify digital logic.

If there any method like these to analyze sql queries manually to avoid brainstorming each time?

回答1:

Is there any systematic step-by-step or mathematical way to construct SQL query from a given human-readable description?

Yes, there is.

It turns out that natural language expressions and logical expressions and relational algebra expressions and SQL expressions (a hybrid of the last two) correspond in a rather direct way. (What follows is for no duplicate rows & no nulls.)

Each table (base or query result) has an associated predicate--a natural language fill-in-the-(named-)blanks statement template parameterized by column names.

[liker] likes [liked]

A table holds every row that, using the row's column values to fill in the (named) blanks, makes a true statement aka proposition.

liker  | liked
--------------
Bob    | Dex    /* Bob likes Dex */
Bob    | Alice  /* Bob likes Alice */
Alice  | Carol  /* Alice likes Carol */

Each proposition from filling a predicate with the values from a row in a table is true. And each proposition from filling a predicate with the values from a row not in a table is false.

/*
    Alice likes Carol
AND NOT Alice likes Alice
AND NOT Alice likes Bob
AND NOT Alice likes Dex
AND NOT Alice likes Ed
...
AND Bob likes Alice
AND Bob likes Dex
AND NOT Bob likes Bob
AND NOT Bob likes Carol
AND NOT Bob likes Ed
...
AND NOT Carol likes Alice
...
AND NOT Dex likes Alice
...
AND NOT Ed likes Alice
...
*/

The DBA gives the predicate for each base table. The SQL syntax for a table declaration is a lot like the traditional logic shorthand for the natural language version of a given predicate.

/* (person, liked) rows where [liker] likes [liked] */
/* (person, liked) rows where Likes(liker, liked) */
SELECT * FROM Likes

An SQL query (sub)expression transforms argument table values to a new table value holding the rows that make a true statement from a new predicate. The new table predicate can be expressed in terms of the argument table predicate(s) according to the (sub)expression's relational/table operators. A query is an SQL expression whose predicate is the predicate for the table of rows we want.

Inside a SELECT statement:
• A base table named T with alias A has predicate / is rows where T(A.C,...).
• R CROSS JOIN S & R INNER JOIN S have predicate / are rows where the predicate of R AND the predicate of S. (Rows that are a combination of a row from each argument aliased A after renaming its columns C,... to A.C,....)
• R ON condition & R WHERE condition have predicate / are rows where the predicate of R AND condition.
• SELECT DISTINCT A.C AS D,... FROM R (maybe with implicit A. and/or implicit AS D) has predicate / is rows where FOR SOME [value for] then dropped columns then the predicate of R with A.C,... replaced by D,.... (Dropped columns are not parameters of the new predicate.)
• Equivalently SELECT DISTINCT A.C AS D,... FROM R has predicate / is rows where FOR SOME A.*,..., A.C=D AND ... AND the predicate of R. (This can be less compact but looks more like the SQL.)
• (X,...) IN (R) means predicate of R with columns C,... replaced by X,....
• So (...) IN (SELECT * FROM T) means T(...).

Natural language & shorthand for (person, liked) rows where [person] is Bob and Bob likes someone who likes [liked] but who doesn't like Ed.

/* (person, liked) rows where
for some value for x,
    [person] likes [x]
and [x] likes [liked]
and [person] = 'Bob'
and not [x] likes 'Ed'

/* (person, liked) rows where
FOR SOME [value for] x,
        Likes(person, x)
    AND Likes(x, liked)
    AND person = 'Bob'
    AND NOT Likes(x, 'Ed')
*/

Rewrite using the predicates of our base tables then SQL.

/* (person, liked) rows where
FOR SOME [values for] l1.*, l2.*,
        person = l1.liker AND liked = l2.liked
    AND Likes(l1.liker, l1.liked)
    AND Likes(l2.liker, l2.liked)
    AND l1.liked = l2.liker
    AND person = 'Bob'
    AND NOT Likes(l1.liked, 'Ed')
*/
SELECT l1.liker AS person, l2.liked AS liked
FROM
    /* (l1.liker, l1.liked, l2.liker, l2.liked) rows where
        Likes(l1.liker, l1.liked)
    AND Likes(l2.liker, l2.liked)
    AND l1.liked = l2.liker
    AND l1.liker = 'Bob'
    AND NOT Likes(l1.liked, 'Ed')
    */
Likes l1 INNER JOIN Likes l2
ON l1.liked = l2.liker
WHERE l1.liker = 'Bob'
AND NOT (l1.liked, 'Ed') IN (SELECT * FROM Likes)

• R UNION CORRESPONDING S has predicate / is rows where the predicate of R OR the predicate of S.
• R EXCEPT S has predicate / is rows where the predicate of R AND NOT the predicate of S.
• VALUES (X,...),... with columns C,... has predicate / is rows where (C = X AND ...) OR ....

/* (person) rows where
    (FOR SOME liked, Likes(person, liked))
OR  person = 'Bob'
*/
    SELECT liker AS person
    FROM Likes
UNION
    VALUES ('Bob')

So if we express our desired rows in terms of given base table natural language statement templates that rows make true or false (to be returned or not) then we can translate to SQL queries that are nestings of logic shorthands & operators and/or table names & operators. And then the DBMS can convert totally to tables to calculate the rows making our predicate true.

See How to get matching data from another SQL table for two different columns: Inner Join and/or Union? re applying this to SQL. (Another self-join.)
See Relational algebra for banking scenario for more on natural language phrasings. (In a relational algebra context.)

回答2:

Here's what I do in non-grouped queries:

I put into the FROM clause the table of which I expect to receive zero or one output row per row in the table. Often, you want something like "all customers with certain properties". Then, the customer table goes into the FROM clause.

Use joins to add columns and filter rows. Joins should not duplicate rows. A join should find zero or one rows, never more. That keeps it very intuitive because you can say that "a join adds columns and filters out some rows".

Subqueries are to be avoided if a join can replace them. Joins look nicer, are more general and often are more efficient (due to common query optimizer weaknesses).

How to use WHERE and projections is easy.

来源：https://stackoverflow.com/questions/33947260/is-there-any-rule-of-thumb-to-construct-sql-query-from-a-human-readable-descript

标签

sql

database

relational-database

heuristics

human-readable