Process Array in parallel using GCD

前端 未结 5 1302
青春惊慌失措
青春惊慌失措 2020-12-09 11:22

I have a large array that I would like to process by handing slices of it to a few asynchronous tasks. As a proof of concept, I have the written the following code:

相关标签:
5条回答
  • 2020-12-09 11:45

    This is a slightly different take on the approach in @Eduardo's answer, using the Array type's withUnsafeMutableBufferPointer<R>(body: (inout UnsafeMutableBufferPointer<T>) -> R) -> R method. That method's documentation states:

    Call body(p), where p is a pointer to the Array's mutable contiguous storage. If no such storage exists, it is first created.

    Often, the optimizer can eliminate bounds- and uniqueness-checks within an array algorithm, but when that fails, invoking the same algorithm on body's argument lets you trade safety for speed.

    That second paragraph seems to be exactly what's happening here, so using this method might be more "idiomatic" in Swift, whatever that means:

    func calcSummary() {
        let group = dispatch_group_create()
        let queue = dispatch_get_global_queue(QOS_CLASS_USER_INITIATED, 0)
        
        self.summary.withUnsafeMutableBufferPointer {
            summaryMem -> Void in
            for i in 0 ..< 10 {
                dispatch_group_async(group, queue, {
                    let base = i * 50000
                    for x in base ..< base + 50000 {
                        summaryMem[i] += self.array[x]
                    }
                })
            }
        }
    
        dispatch_group_notify(group, queue, {
            println(self.summary)
        })
    }
    
    0 讨论(0)
  • 2020-12-09 11:46

    When you use the += operator, the LHS is an inout parameter -- I think you're getting race conditions when, as you mention in your update, Swift moves around the array for optimization. I was able to get it to work by summing the chunk in a local variable, then simply assigning to the right index in summary:

    func calcSummary() {
        let group =  dispatch_group_create()
        let queue = dispatch_get_global_queue(QOS_CLASS_USER_INITIATED, 0)
    
        for i in 0 ..< 10 {
            dispatch_group_async(group, queue, {
                let base = i * 50000
                var sum = 0
                for x in base ..< base + 50000 {
                    sum += self.array[x]
                }
                self.summary[i] = sum
            })
        }
    
        dispatch_group_notify(group, queue, {
            println(self.summary)
        })
    }
    
    0 讨论(0)
  • 2020-12-09 11:54

    Any solution that assigns the i'th element of the array concurrently risks race condition (Swift's array is not thread-safe). On the other hand, dispatching to the same queue (in this case main) before updating solves the problem but results in a slower performance overall. The only reason I see for taking either of these two approaches is if the array (summary) cannot wait for all concurrent operations to finish.

    Otherwise, perform the concurrent operations on a local copy and assign it to summary upon completion. No race condition, no performance hit:

    Swift 4

    func calcSummary(of array: [Int]) -> [Int] {
        var summary = Array<Int>.init(repeating: 0, count: array.count)
    
        let iterations = 10 // number of parallel operations  
    
        DispatchQueue.concurrentPerform(iterations: iterations) { index in
            let start = index * array.count / iterations
            let end = (index + 1) * array.count / iterations
    
            for i in start..<end {
                // Do stuff to get the i'th element
                summary[i] = Int.random(in: 0..<array.count)
            }
        }
    
        return summary
    }
    

    I've answered a similar question here for simply initializing an array after computing on another array

    0 讨论(0)
  • 2020-12-09 11:56

    You can also use concurrentPerform(iterations: Int, execute work: (Int) -> Swift.Void) (since Swift 3).

    It has a much simpler syntax and will wait for all threads to finalise before returning.:

    DispatchQueue.concurrentPerform(iterations: iterations) { i in
        performOperation(i)
    }
    
    0 讨论(0)
  • 2020-12-09 12:00

    I think Nate is right: there are race conditions with the summary variable. To fix it, I used summary's memory directly:

    func calcSummary() {
        let group = dispatch_group_create()
        let queue = dispatch_get_global_queue(QOS_CLASS_USER_INITIATED, 0)
    
        let summaryMem = UnsafeMutableBufferPointer<Int>(start: &summary, count: 10)
    
        for i in 0 ..< 10 {
            dispatch_group_async(group, queue, {
               let base = i * 50000
               for x in base ..< base + 50000 {
                  summaryMem[i] += self.array[x]
               }
            })
        }
    
        dispatch_group_notify(group, queue, {
            println(self.summary)
        })
    }
    

    This works (so far).

    EDIT Mike S has a very good point, in his comment below. I have also found this blog post, which sheds some light on the problem.

    0 讨论(0)
提交回复
热议问题