How are you taking advantage of Multicore?

后端未结

关注

 22  2085

As someone in the world of HPC who came from the world of enterprise web development, I\'m always curious to see how developers back in the \"real world\" are taking advanta

相关标签:

22条回答

忘掉有多难

2020-12-12 11:17

I work in C# with .Net Threads. You can combine object-oriented encapsulation with Thread management.

I've read some posts from Peter talking about a new book from Packt Publishing and I've found the following article in Packt Publishing web page:

http://www.packtpub.com/article/simplifying-parallelism-complexity-c-sharp

I've read Concurrent Programming with Windows, Joe Duffy's book. Now, I am waiting for "C# 2008 and 2005 Threaded Programming", Hillar's book - http://www.amazon.com/2008-2005-Threaded-Programming-Beginners/dp/1847197108/ref=pd_rhf_p_t_2

I agree with Szundi "No silver bullet"!

0 讨论(0)
发布评论:

提交评论
- 加载中...
迷失自我

2020-12-12 11:18

We're having a lot of success with task parallelism in .NET 4 using F#. Our customers are crying out for multicore support because they don't want their n-1 cores idle!

0 讨论(0)
发布评论:

提交评论
- 加载中...
时光取名叫无心

2020-12-12 11:18

I've decided to take advantage of multiple cores in an implementation of the DEFLATE algorithm. MArc Adler did something similar in C code with PIGZ (parallel gzip). I've delivered the philosophical equivalent, but in a managed code library, in DotNetZip v1.9. This is not a port of PIGZ, but a similar idea, implemented independently.

The idea behind DEFLATE is to scan a block of data, look for repeated sequences, build a "dictionary" that maps a short "code" to each of those repeated sequences, then emit a byte stream where each instance of one of the repeated sequences is replaced by a "code" from the dictionary.

Because building the dictionary is CPU intensive, DEFLATE is a perfect candidate for parallelization. i've taken a Map+Reduce type approach, where I divide the incoming uncompressed bytestreeam into a set of smaller blocks (map), say 64k each, and then compress those independently. Then I concatenate the resulting blocks together (reduce). Each 64k block is compressed independently, on its own thread, without regard for the other blocks.

On a dual-core machine, this approach compresses in about 54% of the time of the traditional serial approach. On server-class machines, with more cores available, it can potentially deliver even better results; with no server machine, I haven't tested it personally, but people tell me it's fast.

There's runtime (cpu) overhead associated to the management of multiple threads, runtime memory overhead associated to the buffers for each thead, and data overhead associated to concatenating the blocks. So this approach pays off only for larger bytestreams. In my tests, above 512k, it can pay off. Below that, it is better to use a serial approach.

DotNetZip is delivered as a library. My goal was to make all of this transparent. So the library automatically uses the extra threads when the buffer is above 512kb. There's nothing the application has to do, in order to use threads. It just works, and when threads are used, it's magically faster. I think this is a reasonable approach to take for most libbraries being consumed by applications.

It would be nice for the computer to be smart about automatically and dynamically exploiting resources on parallizable algorithms, but the reality today is that apps designers have to explicitly code the parallelization in.

0 讨论(0)
发布评论:

提交评论
- 加载中...
傲寒

2020-12-12 11:21

I work in medical imaging and image processing.

We're handling multiple cores in much the same way we handled single cores-- we have multiple threads already in the applications we write in order to have a responsive UI.

However, because we can now, we're taking strong looks at implementing most of our image processing operations in either CUDA or OpenMP. The Intel Compiler provides a lot of good sample code for OpenMP, and is just a much more mature product than CUDA, and provides a much larger installed base, so we're probably going to go with that.

What we tend to do for expensive (ie, more than a second) operations is to fork that operation off into another process, if we can. That way, the main UI remains responsive. If we can't, or it's just far too inconvenient or slow to move that much memory around, the operation is still in a thread, and then that operation can itself spawn multiple threads.

The key for us is to make sure that we don't hit concurrency bottlenecks. We develop in .NET, which means that UI updates have to be done from an Invoke call to the UI in order to have the main thread update the UI.

Maybe I'm lazy, but really, I don't want to have to spend too much time figuring a lot of this stuff out when it comes to parallelizing things like matrix inversions and the like. A lot of really smart people have spent a lot of time making that stuff fast like nitrous, and I just want to take what they've done and call it. Something like CUDA has an interesting interface for image processing (of course, that's what it's defined for), but it's still too immature for that kind of plug-and-play programming. If I or another developer get a lot of spare time, we might give it a try. So instead, we'll just go with OpenMP to make our processing faster (and that's definitely on the development roadmap for the next few months).

0 讨论(0)
发布评论:

提交评论
- 加载中...
余生分开走

2020-12-12 11:21

I'm in image processing. We're taking advantage of multicore where possible by processing images in slices doled out to different threads.

0 讨论(0)
发布评论:

提交评论
- 加载中...
抹茶落季

2020-12-12 11:21

I'm using and programming on a Mac. Grand Central Dispatch for the win. The Ars Technica review of Snow Leopard has a lot of interesting things to say about multicore programming and where people (or at least Apple) are going with it.

0 讨论(0)
发布评论:

提交评论
- 加载中...