Sybase: HAVING operates on rows?

假装没事ソ 提交于 2019-12-02 06:12:39

My understanding: Yes, fundamentally, HAVING operates on rows. By omitting a GROUP BY, it operates on all result rows within a single "supergroup" rather than on rows-within-groups. Read the section "How group by and having queries with aggregates work" in your originally-linked Sybase docco:-

How group by and having queries with aggregates work

  • The where clause excludes rows that do not meet its search conditions; its function remains the same for grouped or nongrouped queries.
  • The group by clause collects the remaining rows into one group for each unique value in the group by expression. Omitting group by creates a single group for the whole table.
  • Aggregate functions specified in the select list calculate summary values for each group. For scalar aggregates, there is only one value for the table. Vector aggregates calculate values for the distinct groups.
  • The having clause excludes groups from the results that do not meet its search conditions. Even though the having clause tests only rows, the presence or absence of a group by clause may make it appear to be operating on groups:
    • When the query includes group by, having excludes result group rows. This is why having seems to operate on groups.
    • When the query has no group by, having excludes result rows from the (single-group) table. This is why having seems to operate on rows (the results are similar to where clause results).

Secondly, a brief summary appears in the section "How the having, group by, and where clauses interact":-

How the having, group by, and where clauses interact

When you include the having, group by, and where clauses in a query, the sequence in which each clause affects the rows determines the final results:

  • The where clause excludes rows that do not meet its search conditions.
  • The group by clause collects the remaining rows into one group for each unique value in the group by expression.
  • Aggregate functions specified in the select list calculate summary values for each group.
  • The having clause excludes rows from the final results that do not meet its search conditions.

@SQLGuru's explanation is an illustration of this.

Edit...

On a related point, I was surprised by the behaviour of non-ANSI-conforming queries that utilise TSQL "extended columns". Sybase handles the extended columns (i) after the WHERE clause (ii) by creating extra joins to the original tables and (iii) the WHERE clause is not used in the join. Such queries might return more rows than expected and the HAVING clause then requires additional conditions to filter these out.

See examples b, c and d under "Transact-SQL extensions to group by and having" on the page of your originally-linked docco. I found it useful to install the pubs2 sample database from Sybase to play along with the examples.

I haven't done Sybase since it shared code with MS SQL Server....90's, but my interpretation of what you are doing is this:

First, the list is filtered to <= 11

id   ts
1    2
1    10
2    5

Everything else is filtered out.

Next, you are filtering the list to the rows where TS = the Max(TS) for that group.

id   ts
1    10
2    5

10 is the Max(TS) for group 1 and 5 is the Max(TS) for group 2. Those two rows are the ones that remain. What result would you expect otherwise?

If you read the documentation here, it seems that Sybase use of columns in the having clause that don't appear in the group by clause is different from MySQL.

The example they give has this explanation:

The Transact-SQL extended column, price (in the select list, but not an aggregate and not in the group by clause), causes all qualified rows to display in each qualified group, even though a standard group by clause produces a single row per group. The group by still affects the vector aggregate, which computes the average price per group displayed on each row of each group (they are the same values that were computed for example a):

So, ts = max(ts) essentially does this:

select *
from (select t.*,
             max(ts) over (partition by id) as maxts
      from #t
      where ts <= @time
     ) t
where ts = maxts

The subquery is important, because the where clause gets used for the max() calculation and all rows would be returned.

I find this behavior rather confusing and non-standard. I would replace it with more typical constructs. These are about the same level of complexity and seem clearer to a larger audience.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!