Count equal date parts of the timestamp column ACROSS the groups by the said column

问题

I would like to count duplicate date values. I already know that my field "WHEN" is a timestamp, so I have to cast it to type of date.

my actual query looks like this:

SELECT 
    u.USERNAME,  
    r."WHEN",
    r.UPDATEINOUT,
    case (r.UPDATEINOUT) when 0 then  0 when 1 then 1 else r.INOUT end INOUT
FROM 
    ATTENDANT r 
LEFT JOIN  
    USERS u ON r.USERID = u.ID 
where 
    u.USERNAME = 'rk' and (r.UPDATEINOUT = 1 or r.UPDATEINOUT = 0 or r.UPDATEINOUT is null)
group by
    r."WHEN",
    INOUT,
    u.USERNAME,
    r.UPDATEINOUT
order by 
    r."WHEN"

And this is the result:

Username     WHEN             UPDATEINOUT  INOUT

rk  09.04.2018, 14:59:45.000    [null]  0
rk  09.04.2018, 14:59:51.000    [null]  1
rk  11.04.2018, 08:31:02.000    [null]  0
rk  11.04.2018, 12:06:52.000    [null]  1
rk  11.04.2018, 12:10:29.000    [null]  0
rk  11.04.2018, 12:23:09.000    [null]  1
rk  11.04.2018, 12:43:47.000    [null]  0
rk  11.04.2018, 17:07:40.000    [null]  1

Now I would like to count the duplicate dates, the result should look like this:

Username     WHEN             UPDATEINOUT  INOUT  Count

    rk  09.04.2018, 14:59:45.000    [null]  0       2
    rk  09.04.2018, 14:59:51.000    [null]  1       2
    rk  11.04.2018, 08:31:02.000    [null]  0       6
    rk  11.04.2018, 12:06:52.000    [null]  1       6
    rk  11.04.2018, 12:10:29.000    [null]  0       6
    rk  11.04.2018, 12:23:09.000    [null]  1       6
    rk  11.04.2018, 12:43:47.000    [null]  0       6
    rk  11.04.2018, 17:07:40.000    [null]  1       6

When I add

count(cast(r."WHEN" as date))

then it shows me only one.

回答1:

You could upgrade to Firebird 3.0 and use window functions.

Alternatively, you can use CTEs

with t as ( < your query here > )
select t.*, tw.cnt
from t join
     (select cast("WHEN" as date) as wdate, count(*) as cnt
      from t
      group by cast("WHEN" as date)
     ) tw
     on cast(t."WHEN" as date) = tw.wdate;

回答2:

It seems there are answers suggesting "what to do" but not trying to explain why the results are like what they are. What is happening. It seems like the topic starter is being given a fish, not a fishing rod.

When I add count(cast(r."WHEN" as date)) then it shows me only one.

This is a bit funny. This line shows novice misconception between natural human language like English, and mathematical sets languages like SQL.

R.Kut reads this his added line as "i want to count how many [distinct] values of the given expression are there".

But that is not what this command actually means in SQL. In SQL it means "I want to count how many rows are there, where the given expression is not null."

And so there is actually NO DIFFERENCE between count(cast(r."WHEN" as date)) and count(r."WHEN") - those two parameters-expressions are either both NULL or both NOT NULL. Hence the value of count of those equally nullable parameters is equal too.

The very attempt to truncate a value of the parameter of the aggregate function, like if it could alter nullability, is a misconception. I was there too. It takes time to get used what aggregates over mathematical sets really mean, and that you do not reading English when you are reading SQL.

Frankly, you could do just count (1) here removing not only the typecast but the column itself - it would still be the same, as those are rows, not values that are being asked to be counted. Unless there are rows where "WHEN" IS NULL - that would be accounted for by group by but not by count. After you read and think through the next section, come back and toy with http://sqlfiddle.com/#!9/ee09a/7

Now, there is another parameter to the count function that I kind of mentioned above. It is that "distinct" parameter.

Remark: One may say distinct is a keyword of the SQL language, not a parameter for a function, but de facto, not de jure, it changes the way the function works, so in my perception it is the parameter, in that unusual SQLish way the parameters are often given to functions. Or, another way to reason about this, it may be seen a part of the function name, if to imagine have we two functions to choose from, count and count-distinct.

So, the topic starter could had added count(distinct cast(r."WHEN" as date)) instead, and...

....and see that nothing changed. Because while this time he would really say to the server to count row with not-null (always only not-null!) and distinct values - the counting goes within the group.

And what those groups are?

group by
    r."WHEN",
    INOUT,
    u.USERNAME,
    r.UPDATEINOUT

See, across the groups we have rows with distinct time-and-date values of "WHEN". And other columns too, but I am not focusing on them. The thing here is that within every group both time and date parts of "WHEN" are kept the same. And "being the same" means "there is one distinct value, repeated again and again". And if there is only one distinct time-and-date value then the reduced values of only-time or only-date would have the same counts (reducing a value can only make to previously different values equal, but not make previously equal values now different).

Granted, in other situations, when the counting is going over columns not included in the group (or when there is not group by clause at all), the result may differ. There count(distinct ...) might do what the topic starter is expecting it to do.

http://sqlfiddle.com/#!9/0d65bf/7 - example.

However one has to keep in mind:

that happens at the expense of the extra sorting and grouping work done by a server, potentially making the query slow or consuming lot of memory
still it would work within the group (just groups then would start to contain different values for the counted columns).
....just sometimes the group is the whole query resultset (most simple variant - the whole table), if the programmer did not set it to be different by adding group by clause :-D

The code of the example, linked above:

create table X(a integer, b integer);

insert into X values (1,1);
insert into X values (1,2);
insert into X values (1,2);

commit;

select count(distinct b) from x group by a
-- Result: 1 row: 2

-- or if the whole table is the group
select count(distinct b) from x 
-- Result: 1 row: 2

-- but if the group includes the counted column
-- then every group would contain EXACTLY ONE
-- row with a not-null distinct value
select count(distinct b) from x group by b
-- Result: 2 rows: 1 and 1

Here we come to yet another modification to toy with.

group by
    cast(r."WHEN" as date),   -- <====
    INOUT,
    u.USERNAME,
    r.UPDATEINOUT

Now, this time we are telling the server to assemble the groups, in which only "date" part of "WHEN" is the same, while "time" part might differ.

However....

I think not every SQL server supports expressions in group by clause.
Unless there is an index made exactly by that expression, the server would have to do much extra work, probably resorting to natural scan and temporary files grouping, which would make the query heavy and slow.
And of course you would HAVE to modify the columns list accordingly.

SELECT 
    u.USERNAME,  
    cast(r."WHEN" as date),  -- <=== no more raw r."WHEN"
    r.UPDATEINOUT,

Because, well, you just cannot have "One True Value" for WHEN in the row, if you explicitly asked server to group different values of it.

Here you may come to the conclusion, that there is no simple straightforward way to have both groups and across-the-groups aggregates. Or to put it differently, to have two different sets of group by in one query (I do not want to mention UNION here, mkay?).

You would need ONE set of group by criteria for counting rows with the same date part (but maybe different time part), and yet ANOTHER group by criteria for picking and rendering groups different by both date-and-time.

In plain and simple SQL 1999 that means you would have to make TWO selects to make both groupings one way or another, and here Yogesh's and Gordon's answer kick in.

Like Gordon mentioned in post-99 SQL there came Window Functions which could let you have those criteria sets in one select, but they are not available in Firebird 2.x anyway.

回答3:

I think you need subquery :

SELECT  u.USERNAME, r."WHEN", r.UPDATEINOUT,
        case (r.UPDATEINOUT) when 0 then  0 when 1 then 1 else r.INOUT end INOUT,
        (SELECT COUNT(*) 
         FROM ATTENDANT r1 
         WHERE cast(r1."WHEN" as date)) = cast(r."WHEN" as date)
        ) as Count
FROM ATTENDANT r LEFT JOIN  
     USERS u 
     ON r.USERID = u.ID 
WHERE u.USERNAME = 'rk' AND 
    (r.UPDATEINOUT = 1 or r.UPDATEINOUT = 0 or r.UPDATEINOUT is null)
GROUP BY r."WHEN", INOUT, u.USERNAME, r.UPDATEINOUT
ORDER BY r."WHEN";

来源：https://stackoverflow.com/questions/51152947/count-equal-date-parts-of-the-timestamp-column-across-the-groups-by-the-said-col

标签

sql

count

firebird

firebird2.5