问题
I have this table named hr_holidays_by_calendar
. I just want to filter out the rows where the same employee is having two leaves in same day.
Table hr_holidays_by_calendar
:
Query I tried:
Wasn't anywhere near in solving this.
select hol1.employee_id, hol1.leave_date, hol1.no_of_days, hol1.leave_state
from hr_holidays_by_calendar hol1
inner join
(select employee_id, leave_date
from hr_holidays_by_calendar hol1
group by employee_id, leave_date
having count(*)>1)sub
on hol1.employee_id=sub.employee_id and hol1.leave_date=sub.leave_date
where hol1.leave_state != 'refuse'
order by hol1.employee_id, hol1.leave_date
回答1:
This returns all rows where a duplicate exists:
SELECT employee_id, leave_date, no_of_days, leave_state
FROM hr_holidays_by_calendar h
WHERE EXISTS (
SELECT -- select list can be empty for this
FROM hr_holidays_by_calendar
WHERE employee_id = h.employee_id
AND leave_date = h.leave_date
AND leave_state <> 'refuse'
AND ctid <> h.ctid
)
AND leave_state <> 'refuse'
ORDER BY employee_id, leave_date;
It's unclear where leave_state <> 'refuse'
should be applied. You would have to define requirements. My example ignores rows with leave_state = 'refuse'
(and leave_state IS NULL
with it!) completely.
ctid
is a poor man's surrogate for your undeclared (undefined?) primary key.
Related:
- How do I (or can I) SELECT DISTINCT on multiple columns?
- What is easier to read in EXISTS subqueries?
回答2:
I assume you just need to reverse your logic. You could use NOT EXISTS
:
select h1.employee_id, h1.leave_date, h1.no_of_days, h1.leave_state
from hr_holidays_by_calendar h1
where
h1.leave_state <> 'refuse'
and not exists (
select 1
from hr_holidays_by_calendar h2
where
h1.employee_id = h2.employee_id
and h1.leave_date = h2.leave_date
group by employee_id, leave_date
having count(*) > 1
)
This will discard every (employee, date) pair where they have more than one row (leave on the same day).
I did not take number of days into account, since that seems to be wrong anyways - you can't have a leave twice in on the same day which lasts for different amount of days. If your application allows it, consider applying additional logic. Also, you shouldn't let these records get in the table in the first place :-)
回答3:
I believe that simple use of a GROUP BY
can do the job for you
select hol1.employee_id, hol1.leave_date, max(hol1.no_of_days)
from hr_holidays_by_calendar hol1
where hol1.leave_state != 'refuse'
group by hol1.employee_id, hol1.leave_date
It is not clear what should happen if two rows have different no_of_days
.
回答4:
If you want the complete rows, one method uses window functions:
select hc.*
from (select hc.*, count(*) over (partition by employee_id, leave_date) as cnt
from hr_holidays_by_calendar hc
) hc
where cnt >= 2;
Aggregation is appropriate if you just want the employee id and dates.
来源:https://stackoverflow.com/questions/52659545/postgresql-group-by-for-multiple-lines