Efficient way to convert Scala Array to Unique Sorted List

前端 未结 7 1891
长发绾君心
长发绾君心 2021-01-04 15:00

Can anybody optimize following statement in Scala:

// maybe large
val someArray = Array(9, 1, 6, 2, 1, 9, 4, 5, 1, 6, 5, 0, 6) 

// output a sorted list whic         


        
相关标签:
7条回答
  • 2021-01-04 15:34

    How about adding everything to a sorted set?

    val a = scala.collection.immutable.SortedSet(someArray filter (0 !=): _*)
    

    Of course, you should benchmark the code to check what is faster, and, more importantly, that this is truly a hot spot.

    0 讨论(0)
  • 2021-01-04 15:42

    For efficiency, depending on your value of large:

    val a = someArray.toSet.filter(_>0).toArray
    java.util.Arrays.sort(a) // quicksort, mutable data structures bad :-)
    res15: Array[Int] = Array(1, 2, 4, 5, 6, 9)
    

    Note that this does the sort using qsort on an unboxed array.

    0 讨论(0)
  • 2021-01-04 15:45

    I'm not in a position to measure, but some more suggestions...

    Sorting the array in place before converting to a list might well be more efficient, and you might look at removing dups from the sorted list manually, as they will be grouped together. The cost of removing 0's before or after the sort will also depend on their ratio to the other entries.

    0 讨论(0)
  • 2021-01-04 15:52

    I haven't measured, but I'm with Duncan, sort in place then use something like:

    util.Sorting.quickSort(array)
    array.foldRight(List.empty[Int]){ 
      case (a, b) => 
        if (!b.isEmpty && b(0) == a) 
          b 
        else 
          a :: b 
    }
    

    In theory this should be pretty efficient.

    0 讨论(0)
  • 2021-01-04 15:52

    Without benchmarking I can't be sure, but I imagine the following is pretty efficient:

    val list = collection.SortedSet(someArray.filter(_>0) :_*).toList
    

    Also try adding .par after someArray in your version. It's not guaranteed to be quicker, bit it might be. You should run a benchmark and experiment.

    sort is deprecated. Use .sortWith(_ > _) instead.

    0 讨论(0)
  • 2021-01-04 15:57

    This simple line is one of the fastest codes so far:

    someArray.toList.filter (_ > 0).sortWith (_ > _).distinct
    

    but the clear winner so far is - due to my measurement - Jed Wesley-Smith. Maybe if Rex' code is fixed, it looks different.

    bench diagram

    Typical disclaimer 1 + 2:

    1. I modified the codes to accept an Array and return an List.
    2. Typical benchmark considerations:
      • This was random data, equally distributed. For 1 Million elements, I created an Array of 1 Million ints between 0 and 1 Million. So with more or less zeros, and more or less duplicates, it might vary.
      • It might depend on the machine etc.. I used a single core CPU, Intel-Linux-32bit, jdk-1.6, scala 2.9.0.1

    Here is the underlying benchcoat-code and the concrete code to produce the graph (gnuplot). Y-axis: time in seconds. X-axis: 100 000 to 1 000 000 elements in Array.

    update:

    After finding the problem with Rex' code, his code is as fast as Jed's code, but the last operation is a transformation of his Array to a List (to fullfill my benchmark-interface). Using a var result = List [Int], and result = someArray (i) :: result speeds his code up, so that it is about twice as fast as the Jed-Code.

    Another, maybe interesting, finding is: If I rearrange my code in the order of filter/sort/distinct (fsd) => (dsf, dfs, fsd, ...), all 6 possibilities don't differ significantly.

    0 讨论(0)
提交回复
热议问题