Tuples duplicate elimination from a list

问题

Consider the following list of tuples:

val input= List((A,B), 
                (C,B), 
                (B,A))

and assuming that the elements (A,B) and (B,A) are the same and therefore are duplicates, what is the efficient way (preferably in Scala) to eliminate duplicates from the list above. That means the desired output is an another list:

val deduplicated= List((A,B), 
                       (C,B))

Thanks in advance!

p.s: this is not a home work ;)

UPDATE:

Thanks to all! The "set"-solution seems to be the preferable one.

回答1:

You could try it with a set, but you need to declare your own tuple class to make it work.

case class MyTuple[A](t: (A, A)) {
  override def hashCode = t._1.hashCode + t._2.hashCode
  override def equals(other: Any) = other match {
    case MyTuple((a, b)) => a.equals(t._1) && b.equals(t._2) || a.equals(t._2) && b.equals(t._1)
    case _ => false
  }
}

val input= List(("A","B"), 
                ("C","B"), 
                ("B","A"))

val output = input.map(MyTuple.apply).toSet.toList.map((mt: MyTuple[String]) => mt.t)
println(output)

edit: Travis's answer made me realise that there is a nicer way to do this. And that is by writing a distinctBy method that works analog to sortBy.

implicit class extList[T](list: List[T]) {
  def distinctBy[U](f: T => U): List[T] = {
    var set = Set.empty[U]
    var result = List.empty[T]
    for(t <- list) {
      val u = f(t)
      if(!set(u)) {
        result ::= t
        set += u
      }
    }
    result.reverse
  }
}

println(input.distinctBy { case (a, b) => Set((a,b), (b,a)) })

回答2:

We can use a Set to keep track of elements that we have seen already, while using filter to eliminate duplicates:

def removeDuplicates[T](l: List[(T, T)]) = {
  val set = scala.collection.mutable.Set[(T, T)]()
  l.filter { case t@(x, y) =>
    if (set(t)) false else {
      set += t
      set += ((y, x))
      true
    }
  }
}

When we find a tuple we haven't seen before, we put both it and and it with its elements swapped into the set.

回答3:

On the same lines as SpiderPig's answer, here's a solution that makes no use of sets (since going through a set doesn't preserve the order of the original list, which could be an annoiance)

case class MyPimpedTuple(t: Tuple2[String, String]) {
  override def hashCode = t._1.hashCode + t._2.hashCode
  override def equals(other: Any) = other match {
      case MyPimpedTuple((a, b)) => a.equals(t._1) && b.equals(t._2) || a.equals(t._2) && b.equals(t._1)
      case _ => false
  }
}

val input = List[MyPimpedTuple](("A","B"), ("C","B"),("B","A"))

input.map(MyPimpedTuple(_)).distinct.map(_.t)

Example

val input = List(("A","B"), ("C","B"),("B","A"))
//> input: List[(String, String)] = List((A,B), (C,B), (B,A))

val distinctTuples = input.map(MyPimpedTuple(_)).distinct.map(_.t)
//> distinctTuples: List[(String, String)] = List((A,B), (C,B))

回答4:

For the sake of completeness, it's possible to do this very simply in a purely functional way with a fold (manually defining equality makes me nervous and I'm not sure mutability buys you much here):

def distinctPairs[A](xs: List[(A, A)]) = xs.foldLeft(List.empty[(A, A)]) {
  case (acc, (a, b)) if acc.contains((a, b)) || acc.contains((b, a)) => acc
  case (acc, p) => acc :+ p
}

This isn't very efficient, since it's searching the list twice for each item (and appending to the list), but that's not too hard to fix:

def distinctPairs[A](xs: List[(A, A)]) = xs.foldLeft(
  (List.empty[(A, A)], Set.empty[(A, A)])
) {
  case (current @ (_, seen), p) if seen(p) => current
  case ((acc, seen), p @ (a, b)) => (p :: acc, seen ++ Set((a, b), (b, a)))
}._1.reverse

Both of these implementations maintain order.

回答5:

Consider also relying on unique keys on Map, where keys are sets of duple elements,

def uniq[A](a: List[(A,A)]) = a.map( t => Set(t._1,t._2) -> t ).toMap.values

Not the most efficient, yet simple enough; valid for small collections.

回答6:

Yes I would also suggest a set as the target data structure because the set lookup could be more efficient then two for loops. (Sorry I am a clojure guy and surely this is not the shortest version in clojure...)

(def data `(("A" "B") ("B" "C") ("B" "A")))
;;(def data `(("A" "B") ("B" "C") ("B" "A") ("C" "D") ("C" "B") ("D" "F")))

(defn eliminator [source]
 (println "Crunching: " source)
  (loop [s source t '#{}]
    (if (empty? s) (reverse t) ;; end
      (if (contains? t (list (last (first s)) (first (first s)))) ;reverse is in set !
        (recur (rest s) t) ; next iteration
        (recur (rest s) (conj t (first s))))))) ;; add it

来源：https://stackoverflow.com/questions/24851249/tuples-duplicate-elimination-from-a-list

标签

scala

duplicate-removal

performance