Sybase: HAVING operates on rows?

问题

I've came across the following SYBASE SQL:

-- Setup first
create table #t (id int, ts int)
go

insert into #t values (1, 2)
insert into #t values (1, 10)
insert into #t values (1, 20)
insert into #t values (1, 30)

insert into #t values (2, 5)
insert into #t values (2, 13) 
insert into #t values (2, 25)
go

declare @time int select @time=11
-- This is the SQL I am asking about
select * from (select * from #t where ts <= @time) t group by id having ts = max(ts)
go

The results of this SQL are

 id          ts          
 ----------- ----------- 
           1          10 
           2           5

This looks like HAVING condition applied to rows rather than groups. Can someone please point me at a place is Sybase 15.5 documentation where this case is described? All I see is "HAVING operates on groups". The closest I see in the docs is:

The having clause can include columns or expressions that are not in the select list and not in the group by clause.

(Quote from here).

However, they don't exactly explain what happens when you do that.

回答1:

My understanding: Yes, fundamentally, HAVING operates on rows. By omitting a GROUP BY, it operates on all result rows within a single "supergroup" rather than on rows-within-groups. Read the section "How group by and having queries with aggregates work" in your originally-linked Sybase docco:-

How group by and having queries with aggregates work

The where clause excludes rows that do not meet its search conditions; its function remains the same for grouped or nongrouped queries.

The group by clause collects the remaining rows into one group for each unique value in the group by expression. Omitting group by creates a single group for the whole table.

Aggregate functions specified in the select list calculate summary values for each group. For scalar aggregates, there is only one value for the table. Vector aggregates calculate values for the distinct groups.

The having clause excludes groups from the results that do not meet its search conditions. Even though the having clause tests only rows, the presence or absence of a group by clause may make it appear to be operating on groups:
When the query includes group by, having excludes result group rows. This is why having seems to operate on groups.

When the query has no group by, having excludes result rows from the (single-group) table. This is why having seems to operate on rows (the results are similar to where clause results).

Secondly, a brief summary appears in the section "How the having, group by, and where clauses interact":-

How the having, group by, and where clauses interact

When you include the having, group by, and where clauses in a query, the sequence in which each clause affects the rows determines the final results:

The where clause excludes rows that do not meet its search conditions.

The group by clause collects the remaining rows into one group for each unique value in the group by expression.

Aggregate functions specified in the select list calculate summary values for each group.

The having clause excludes rows from the final results that do not meet its search conditions.

@SQLGuru's explanation is an illustration of this.

Edit...

On a related point, I was surprised by the behaviour of non-ANSI-conforming queries that utilise TSQL "extended columns". Sybase handles the extended columns (i) after the WHERE clause (ii) by creating extra joins to the original tables and (iii) the WHERE clause is not used in the join. Such queries might return more rows than expected and the HAVING clause then requires additional conditions to filter these out.

See examples b, c and d under "Transact-SQL extensions to group by and having" on the page of your originally-linked docco. I found it useful to install the pubs2 sample database from Sybase to play along with the examples.

回答2:

I haven't done Sybase since it shared code with MS SQL Server....90's, but my interpretation of what you are doing is this:

First, the list is filtered to <= 11

Everything else is filtered out.

Next, you are filtering the list to the rows where TS = the Max(TS) for that group.

id   ts
1    10
2    5

10 is the Max(TS) for group 1 and 5 is the Max(TS) for group 2. Those two rows are the ones that remain. What result would you expect otherwise?

回答3:

If you read the documentation here, it seems that Sybase use of columns in the having clause that don't appear in the group by clause is different from MySQL.

The example they give has this explanation:

The Transact-SQL extended column, price (in the select list, but not an aggregate and not in the group by clause), causes all qualified rows to display in each qualified group, even though a standard group by clause produces a single row per group. The group by still affects the vector aggregate, which computes the average price per group displayed on each row of each group (they are the same values that were computed for example a):

So, ts = max(ts) essentially does this:

select *
from (select t.*,
             max(ts) over (partition by id) as maxts
      from #t
      where ts <= @time
     ) t
where ts = maxts

The subquery is important, because the where clause gets used for the max() calculation and all rows would be returned.

I find this behavior rather confusing and non-standard. I would replace it with more typical constructs. These are about the same level of complexity and seem clearer to a larger audience.

来源：https://stackoverflow.com/questions/15507825/sybase-having-operates-on-rows

标签

sql

sybase-ase