How can I pick only the non matching elements between two arrays.
Example:
base_array [12,3,5,7,8]
temp_array [3,7,8]
So here I wan
The contrib/intarray module provides this functionality--for arrays of integers, anyway. For other data types, you may have to write your own functions (or modify the ones provided with intarray).
I'd approach this with the array operator.
select array(select unnest(:arr1) except select unnest(:arr2));
If :arr1 and :arr2 don't intersect, using array_agg() leads to a null.
I've constructed a set of functions to deal specifically with these types of issues: https://github.com/JDBurnZ/anyarray
The greatest thing is these functions work across ALL data-types, not JUST integers, as intarray
is limited to.
After loading loading the functions defined in those SQL files from GitHub, all you'd need to do is:
SELECT
ANYARRAY_DIFF(
ARRAY[12, 3, 5, 7, 8],
ARRAY[3, 7, 8]
)
Returns something similar to: ARRAY[12, 5]
If you also need to return the values sorted:
SELECT
ANYARRAY_SORT(
ANYARRAY_DIFF(
ARRAY[12, 3, 5, 7, 8],
ARRAY[3, 7, 8]
)
)
Returns exactly: ARRAY[5, 12]
An extension to Denis' answer that returns the difference, regardless of which array was entered first. It's not the most concise query, maybe someone has a tidier way.
select array_cat(
(select array(select unnest(a.b::int[]) except select unnest(a.c::int[]))),
(select array(select unnest(a.c::int[]) except select unnest(a.b::int[]))))
from (select '{1,2}'::int[] b,'{1,3}'::int[] c) as a;
Returns:
{2,3}
select array_agg(elements)
from (
select unnest(array[12,3,5,7,8])
except
select unnest(array[3,7,8])
) t (elements)
Let's try the unnest() / except :
EXPLAIN ANALYZE SELECT array(select unnest(ARRAY[1,2,3,n]) EXCEPT SELECT unnest(ARRAY[2,3,4,n])) FROM generate_series( 1,10000 ) n;
Function Scan on generate_series n (cost=0.00..62.50 rows=1000 width=4) (actual time=1.373..140.969 rows=10000 loops=1)
SubPlan 1
-> HashSetOp Except (cost=0.00..0.05 rows=1 width=0) (actual time=0.011..0.011 rows=1 loops=10000)
-> Append (cost=0.00..0.04 rows=2 width=0) (actual time=0.002..0.008 rows=8 loops=10000)
-> Subquery Scan "*SELECT* 1" (cost=0.00..0.02 rows=1 width=0) (actual time=0.002..0.003 rows=4 loops=10000)
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.002 rows=4 loops=10000)
-> Subquery Scan "*SELECT* 2" (cost=0.00..0.02 rows=1 width=0) (actual time=0.001..0.003 rows=4 loops=10000)
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.002 rows=4 loops=10000)
Total runtime: 142.531 ms
And the intarray special operator :
EXPLAIN ANALYZE SELECT ARRAY[1,2,3,n] - ARRAY[2,3,4,n] FROM generate_series( 1,10000 ) n;
Function Scan on generate_series n (cost=0.00..15.00 rows=1000 width=4) (actual time=1.338..11.381 rows=10000 loops=1)
Total runtime: 12.306 ms
Baseline :
EXPLAIN ANALYZE SELECT ARRAY[1,2,3,n], ARRAY[2,3,4,n] FROM generate_series( 1,10000 ) n;
Function Scan on generate_series n (cost=0.00..12.50 rows=1000 width=4) (actual time=1.357..7.139 rows=10000 loops=1)
Total runtime: 8.071 ms
Time per array intersection :
intarray - : 0.4 µs
unnest() / intersect : 13.4 µs
Of course the intarray way is much faster, but I find it amazing that postgres can zap a dependent subquery (which contains a hash and other stuff) in 13.4 µs...