Can MySQL be convinced of functional dependency when HAVING COUNT(*) = 1?

与世无争的帅哥 提交于 2021-02-11 04:59:39

问题


I'm trying to find orders with only one item in a database running on MySQL 5.7.23 on Ubuntu 18.04 LTS. But somehow MySQL can't infer that COUNT(*) = 1 implies a functional dependence.

The following 2-table database of orders with order items illustrates the failure:

DROP TABLE IF EXISTS t_o, t_oi;
CREATE TABLE t_o (
  order_id INTEGER UNSIGNED PRIMARY KEY,
  placed_on DATE NOT NULL,
  INDEX (placed_on)
);
INSERT INTO t_o (order_id, placed_on) VALUES
(1, '2018-10-01'),
(2, '2018-10-02');
CREATE TABLE t_oi (
  item_id INTEGER UNSIGNED PRIMARY KEY AUTO_INCREMENT,
  order_id INTEGER UNSIGNED NOT NULL,
  sku VARCHAR(31) CHARACTER SET ascii COLLATE ascii_general_ci NOT NULL,
  qty INTEGER UNSIGNED NOT NULL,
  unit_price INTEGER UNSIGNED NOT NULL,
  INDEX (sku),
  FOREIGN KEY (order_id) REFERENCES t_o (order_id)
    ON DELETE CASCADE ON UPDATE CASCADE
);
INSERT INTO t_oi (order_id, sku, qty, unit_price) VALUES
(1, 'SO', 1, 599),
(1, 'SF', 2, 399),
(2, 'SU', 1, 399);

SELECT t_oi.order_id, t_o.placed_on, t_oi.sku, t_oi.qty, t_oi.unit_price
FROM t_o
INNER JOIN t_oi ON t_o.order_id = t_oi.order_id
GROUP BY t_oi.order_id
HAVING COUNT(*) = 1

I expect this to return (2, '2018-10-02', 'SU', 1, 399) because it is the only order with only one item. I don't want any rows where order_id = 1 because that order has more than one item. But instead, MySQL gives the following error:

#1055 - Expression #3 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'phs_apps.t_oi.sku' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

The manual explains "functionally dependent". But is there a way to express this functional dependence to MySQL that's cleaner than slinging MIN() around each output column for which MySQL complains? If at all possible, I'd prefer a solution that does not involve joining to t_oi twice, once to find relevant t_o.order_id values and once to append the details of each such order's sole item, as including a table twice in a single query is incompatible with use of TEMPORARY TABLE because of a 13-year-old "Can't reopen table" bug.


回答1:


No, I don't think it's possible to convince MySQL to recognize the functional dependency with the special condition in the HAVING clause.

The HAVING clause gets evaluated much later in the query execution, after the rows have been accessed, after the GROUP BY operation, after the aggregates, etc.


We could remove ONLY_FULL_GROUP_BY from sql_mode. That would allow MySQL to process the query without throwing the error. But that's just going old school with a MySQL-specific non-standard extension to GROUP BY behavior. That doesn't mean that MySQL is convinced of functional dependency.




回答2:


You can use function ANY_VALUE():

MySQL 8.0 Reference Manual / Functions and Operators / Miscellaneous Functions
12.22 Miscellaneous Functions

  • ANY_VALUE(arg)

    This function is useful for GROUP BY queries when the ONLY_FULL_GROUP_BY SQL mode is enabled, for cases when MySQL rejects a query that you know is valid for reasons that MySQL cannot determine. The function return value and type are the same as the return value and type of its argument, but the function result is not checked for the ONLY_FULL_GROUP_BY SQL mode.

Or just take MIN() of each non-grouped column. Comment it. There will always be cases the DBMS can't or couldn't prove statically for given literals & functions or at runtime. So you need a solution like MIN() in your toolbox. You have to have some query/code rearrangement since there's no way to give the DMBS a proof or override. Although you could consider clearing ONLY_FULL_GROUP_BY to be that override. But wouldn't you have to comment clearing & restoring that too because it's not obvious?

You could assign the subquery to a table with an appropriate PK (primary key) or UNIQUE NOT NULL constraint. But you'd still want to comment why. Since the DBMS doesn't know about the FD (functional dependency) we can expect the assignment to not be optimized either. We can expect minimal overhead from something like MIN().

Indeed that manual section goes on to say:

There are multiple ways to cause MySQL to accept the query:

  • Alter the table to make [the functionally dependent column] a primary key or a unique NOT NULL column. [...]

  • Use ANY_VALUE() [...]

  • Disable ONLY_FULL_GROUP_BY. [...]




回答3:


On that query "SELECT t_oi.order_id, t_o.placed_on, t_oi.sku, t_oi.qty, t_oi.unit_price", you are grouping on first column. You have to tell what to do with others columns. You can do a group_concat on sku column, or take the first entries on t_oi table with ranking function, so no group by will be needed anymore.

Try this, with ranking. Not sure, not tested.

SELECT t_o.order_id, t_o.placed_on, t_oi2.sku, t_oi2.qty, t_oi2.unit_price
FROM t_o
INNER JOIN (
    select t_oi.order_id, t_o.placed_on, t_oi.sku, t_oi.qty, t_oi.unit_price,
    @rank := case when @cur_order_id = t_oi.order_id then @rank + 1 else 1 end,
    @cur_order_id := t_oi.order_id
    from t_oi, (select @cur_order_id := 0, @rank := 0) tmp
    order by t_oi.order_id
    ) t_oi2 ON t_o.order_id = t_oi2.order_id and t_oi2.rnk = 1;



回答4:


I believe your assumption about functional dependence to be wrong.

If R is a relation with attributes X and Y, a functional dependency between the attributes is represented as X->Y, which specifies Y is functionally dependent on X. Here X is a determinant set and Y is a dependent attribute. Each value of X is associated with precisely one Y value. techopedia

These 2 columns are functionally dependent (and the query operates). nb: Each value of t_o.placed_on is associated with precisely onet_oi.order_id value

SELECT t_oi.order_id, t_o.placed_on
FROM t_o
INNER JOIN t_oi ON t_o.order_id = t_oi.order_id
GROUP BY t_oi.order_id
HAVING COUNT(*) = 1

These are NOT functionally dependent (and the query will not work unless you remove ONLY_FULL_GROUP_BY)

SELECT t_oi.order_id, t_o.placed_on, t_oi.sku, t_oi.qty, t_oi.unit_price
FROM t_o
INNER JOIN t_oi ON t_o.order_id = t_oi.order_id
GROUP BY t_oi.order_id
HAVING COUNT(*) = 

Any of these t_oi.sku, t_oi.qty, t_oi.unit_price columns can hold any valid value for their data types. So they are NOT pre-determined by the relationship involved in the query.

select @@sql_mode;
| @@sql_mode                                                                                                            |
| :-------------------------------------------------------------------------------------------------------------------- |
| ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION |
/* functionally dependent columns only */
SELECT t_oi.order_id, t_o.placed_on
FROM t_o
INNER JOIN t_oi ON t_o.order_id = t_oi.order_id
GROUP BY t_oi.order_id
HAVING COUNT(*) = 1
order_id | placed_on 
-------: | :---------
       2 | 2018-10-02
/* any columns some not functionally dependent */
SELECT t_oi.order_id, t_o.placed_on, t_oi.sku, t_oi.qty, t_oi.unit_price
FROM t_o
INNER JOIN t_oi ON t_o.order_id = t_oi.order_id
GROUP BY t_oi.order_id
HAVING COUNT(*) = 1
Expression #3 of SELECT list is not in GROUP BY clause and 
contains nonaggregated column 'fiddle_YRLHCAMPBMVSWYXFQGUD.t_oi.sku' 
which is not functionally dependent on columns in GROUP BY clause; 
this is incompatible with sql_mode=only_full_group_by
SET sql_mode = 'STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION'
select @@sql_mode
| @@sql_mode                                                                                         |
| :------------------------------------------------------------------------------------------------- |
| STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION |
/* any columns some not functionally dependent */
SELECT t_oi.order_id, t_o.placed_on, t_oi.sku, t_oi.qty, t_oi.unit_price
FROM t_o
INNER JOIN t_oi ON t_o.order_id = t_oi.order_id
GROUP BY t_oi.order_id
HAVING COUNT(*) = 1
order_id | placed_on  | sku | qty | unit_price
-------: | :--------- | :-- | --: | ---------:
       2 | 2018-10-02 | SU  |   1 |        399

db<>fiddle here



来源:https://stackoverflow.com/questions/52841443/can-mysql-be-convinced-of-functional-dependency-when-having-count-1

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!