What is the difference between UNION
and UNION ALL
?
I add an example,
UNION, it is merging with distinct --> slower, because it need comparing (In Oracle SQL developer, choose query, press F10 to see cost analysis).
UNION ALL, it is merging without distinct --> faster.
SELECT to_date(sysdate, 'yyyy-mm-dd') FROM dual
UNION
SELECT to_date(sysdate, 'yyyy-mm-dd') FROM dual;
and
SELECT to_date(sysdate, 'yyyy-mm-dd') FROM dual
UNION ALL
SELECT to_date(sysdate, 'yyyy-mm-dd') FROM dual;
UNION - results in distinct records
while
UNION ALL - results in all the records including duplicates.
Both are blocking operators and hence I personally prefer using JOINS over Blocking Operators(UNION, INTERSECT, UNION ALL etc. ) anytime.
To illustrate why Union operation performs poorly in comparison to Union All checkout the following example.
CREATE TABLE #T1 (data VARCHAR(10))
INSERT INTO #T1
SELECT 'abc'
UNION ALL
SELECT 'bcd'
UNION ALL
SELECT 'cde'
UNION ALL
SELECT 'def'
UNION ALL
SELECT 'efg'
CREATE TABLE #T2 (data VARCHAR(10))
INSERT INTO #T2
SELECT 'abc'
UNION ALL
SELECT 'cde'
UNION ALL
SELECT 'efg'
Following are results of UNION ALL and UNION operations.
A UNION statement effectively does a SELECT DISTINCT on the results set. If you know that all the records returned are unique from your union, use UNION ALL instead, it gives faster results.
Using UNION results in Distinct Sort operations in the Execution Plan. Proof to prove this statement is shown below:
UNION removes duplicate records in other hand UNION ALL does not. But one need to check the bulk of data that is going to be processed and the column and data type must be same.
since union internally uses "distinct" behavior to select the rows hence it is more costly in terms of time and performance. like
select project_id from t_project
union
select project_id from t_project_contact
this gives me 2020 records
on other hand
select project_id from t_project
union all
select project_id from t_project_contact
gives me more than 17402 rows
on precedence perspective both has same precedence.
As a habit, Always use UNION ALL. Use only UNION in special cases when you need to eliminate duplicates which can be extremely messy and you can read all about in the other comments here.
UNION
removes duplicates, whereas UNION ALL
does not.
In order to remove duplicates the result set must be sorted, and this may have an impact on the performance of the UNION, depending on the volume of data being sorted, and the settings of various RDBMS parameters ( For Oracle PGA_AGGREGATE_TARGET
with WORKAREA_SIZE_POLICY=AUTO
or SORT_AREA_SIZE
and SOR_AREA_RETAINED_SIZE
if WORKAREA_SIZE_POLICY=MANUAL
).
Basically, the sort is faster if it can be carried out in memory, but the same caveat about the volume of data applies.
Of course, if you need data returned without duplicates then you must use UNION, depending on the source of your data.
I would have commented on the first post to qualify the "is much less performant" comment, but have insufficient reputation (points) to do so.
If there is no ORDER BY
, a UNION ALL
may bring rows back as it goes, whereas a UNION
would make you wait until the very end of the query before giving you the whole result set at once. This can make a difference in a time-out situation - a UNION ALL
keeps the connection alive, as it were.
So if you have a time-out issue, and there's no sorting, and duplicates aren't an issue, UNION ALL
may be rather helpful.