Remove unique elements only

问题

There are many resources on how to remove duplicates and similar issues but I can't seem to be able to find any on removing unique elements. I'm using SWI-Prolog but I don't want to use built-ins to achieve this.

That is, calling remove_unique([1, 2, 2, 3, 4, 5, 7, 6, 7], X). should happily result in X = [2, 2, 7, 7].

The obvious solution is as something along the lines of

count(_, [], 0) :- !.
count(E, [E | Es], A) :-
  S is A + 1,
  count(E, Es, S).
count(E, [_ | Es], A) :-
  count(E, Es, A).

is_unique(E, Xs) :-
  count(E, Xs, 1).

remove_unique(L, R) :- remove_unique(L, L, R).
remove_unique([], _, []) :- !.
remove_unique([X | Xs], O, R) :-
  is_unique(X, O), !,
  remove_unique(Xs, O, R).
remove_unique([X | Xs], O, [X | R]) :-
  remove_unique(Xs, O, R).

It should become quickly apparent why this isn't an ideal solution: count is O(n) and so is is_unique as it just uses count. I could improve this by failing when we find more than one element but worst-case is still O(n).

So then we come to remove_unique. For every element we check whether current element is_unique in O. If the test fails, the element gets added to the resulting list in the next branch. Running in O(n²), we get a lot of inferences. While I don't think we can speed it in the worst case, can we do better than this naïve solution? The only improvement that I can clearly see is to change count to something that fails as soon as >1 elements are identified.

回答1:

Using tpartition/4 in tandem with if_/3 and (=)/3, we define remove_unique/2 like this:

remove_unique([], []).
remove_unique([E|Xs0], Ys0) :-
   tpartition(=(E), Xs0, Es, Xs),
   if_(Es = [], Ys0 = Ys, append([E|Es], Ys, Ys0)),
   remove_unique(Xs, Ys).

Here's the sample query, as given by the OP:

?- remove_unique([1,2,2,3,4,5,7,6,7], Xs). 
Xs = [2,2,7,7].                       % succeeds deterministically

回答2:

As long as you don't know that the list is sorted in any way, and you want to keep the sequence of the non-unique elements, it seems to me you can't avoid making two passes: first count occurrences, then pick only repeating elements.

What if you use a (self-balancing?) binary tree for counting occurrences and look-up during the second pass? Definitely not O(n²), at least...

来源：https://stackoverflow.com/questions/15990666/remove-unique-elements-only

标签

list

prolog

prolog-dif