How are you taking advantage of Multicore?

后端未结

关注

 22  2086

As someone in the world of HPC who came from the world of enterprise web development, I\'m always curious to see how developers back in the \"real world\" are taking advanta

相关标签:

22条回答

花落未央

2020-12-12 11:08

We create the VivaMP code analyzer for error detecting in parallel OpenMP programs.

VivaMP is a lint-like static C/C++ code analyzer meant to indicate errors in parallel programs based on OpenMP technology. VivaMP static analyzer adds much to the abilities of the existing compilers, diagnoses any parallel code which has some errors or is an eventual source of such errors. The analyzer is integrated into VisualStudio2005/2008 development environment.

VivaMP – a tool for OpenMP

32 OpenMP Traps For C++ Developers

0 讨论(0)
发布评论:

提交评论
- 加载中...
暖寄归人

2020-12-12 11:08

I believe that "Cycles are an engineers' best friend".

My company provides a commercial tool for analyzing and transforming very large software systems in many computer languages. "Large" means 10-30 million lines of code. The tool is the DMS Software Reengineering Toolkit (DMS for short).

Analyses (and even transformations) on such huge systems take a long time: our points-to analyzer for C code takes 90 CPU hours on an x86-64 with 16 Gb RAM. Engineers want answers faster than that.

Consequently, we implemented DMS in PARLANSE, a parallel programming language of our own design, intended to harness small-scale multicore shared memory systems.

The key ideas behind parlanse are: a) let the programmer expose parallelism, b) let the compiler choose which part it can realize, c) keep the context switching to an absolute minimum. Static partial orders over computations are an easy to help achieve all 3; easy to say, relatively easy to measure costs, easy for compiler to schedule computations. (Writing parallel quicksort with this is trivial).

Unfortunately, we did this in 1996 :-( The last few years have finally been a vindication; I can now get 8 core machines at Fry's for under $1K and 24 core machines for about the same price as a small car (and likely to drop rapidly).

The good news is that DMS is now a fairly mature, and there are a number of key internal mechanisms in DMS which take advantage of this, notably an entire class of analyzers call "attribute grammars", which we write using a domain-specific language which is NOT parlanse. DMS compiles these atrribute grammars into PARLANSE and then they are executed in parallel. Our C++ front end uses attribute grammars, and is about 100K sloc; it is compiled into 800K SLOC of parallel parlanse code that actually works reliably.

Now (June 2009), we are pretty busy making DMS useful, and don't always have enough time to harness the parallelism well. Thus the 90 hour points-to analysis. We are working on parallelizing that, and have reasonable hope of 10-20x speedup.

We believe that in the long run, harnessing SMP well will make workstations far more friendly to engineers asking hard questions. As well they should.

0 讨论(0)
发布评论:

提交评论
- 加载中...
说谎

2020-12-12 11:10
1. At the moment - doesn't affect it that much, to be honest. I'm more in 'preparation stage', learning about the technologies and language features that make this possible.
2. I don't have one particular domain, but I've encountered domains like math (where multi-core is essential), data sort/search (where divide & conquer on multi-core is helpful) and multi-computer requirements (e.g., a requirement that a back-up station's processing power is used for something).
3. This depends on what language I'm working. Obviously in C#, my hands are tied with a not-yet-ready implementation of Parallel Extensions that does seem to boost performance, until you start comparing same algorithms with OpenMP (perhaps not a fair comparison). So on .NET it's going to be an easy ride with some for → Parallel.For refactorings and the like.
  Where things get really interesting is with C++, because the performance you can squeeze out of things like OpenMP is staggering compared to .NET. In fact, OpenMP surprised me a lot, because I didn't expect it to work so efficiently. Well, I guess its developers have had a lot of time to polish it. I also like that it is available in Visual Studio out-of-the-box, unlike TBB for which you have to pay.
  As for MPI, I use PureMPI.net for little home projects (I have a LAN) to fool around with computations that one machine can't quite take. I've never used MPI commercially, but I do know that MKL has some MPI-optimized functions, which might be interesting to look at for anyone needing them.
4. I plan to do 'frivolous computing', i.e. use extra cores for precomputation of results that might or might not be needed - RAM permitting, of course. I also intend to delve into costly algorithms and approaches that most end users' machines right now cannot handle.
5. As for domains not benefitting from parallellization... well, one can always find something. One thing I am concerned about is decent support in .NET, though regrettably I have given up hope that speeds similar to C++ can be attained.
0 讨论(0)
发布评论:

提交评论
- 加载中...
清酒与你

2020-12-12 11:11
My research work includes work on compilers and on spam filtering. I also do a lot of 'personal productivity' Unix stuff. Plus I write and use software to administer classes that I teach, which includes grading, testing student code, tracking grades, and myriad other trivia.
1. Multicore affects me not at all except as a research problem for compilers to support other applications. But those problems lie primarily in the run-time system, not the compiler.
2. At great trouble and expense, Dave Wortman showed around 1990 that you could parallelize a compiler to keep four processors busy. Nobody I know has ever repeated the experiment. Most compilers are fast enough to run single-threaded. And it's much easier to run your sequential compiler on several different source files in parallel than it is to make your compiler itself parallel. For spam filtering, learning is an inherently sequential process. And even an older machine can learn hundreds of messages a second, so even a large corpus can be learned in under a minute. Again, training is fast enough.
3. The only significant way I have of exploiting parallel machines is using parallel make. It is a great boon, and big builds are easy to parallelize. Make does almost all the work automatically. The only other thing I can remember is using parallelism to time long-running student code by farming it out to a bunch of lab machines, which I could do in good conscience because I was only clobbering a single core per machine, so using only 1/4 of CPU resources. Oh, and I wrote a Lua script that will use all 4 cores when ripping MP3 files with lame. That script was a lot of work to get right.
4. I will ignore tens, hundreds, and thousands of cores. The first time I was told "parallel machines are coming; you must get ready" was 1984. It was true then and is true today that parallel programming is a domain for highly skilled specialists. The only thing that has changed is that today manufacturers are forcing us to pay for parallel hardware whether we want it or not. But just because the hardware is paid for doesn't mean it's free to use. The programming models are awful, and making the thread/mutex model work, let alone perform well, is an expensive job even if the hardware is free. I expect most programmers to ignore parallelism and quietly get on about their business. When a skilled specialist comes along with a parallel make or a great computer game, I will quietly applaud and make use of their efforts. If I want performance for my own apps I will concentrate on reducing memory allocations and ignore parallelism.
5. Parallelism is really hard. Most domains are hard to parallelize. A widely reusable exception like parallel make is cause for much rejoicing.
Summary (which I heard from a keynote speaker who works for a leading CPU manufacturer): the industry backed into multicore because they couldn't keep making machines run faster and hotter and they didn't know what to do with the extra transistors. Now they're desperate to find a way to make multicore profitable because if they don't have profits, they can't build the next generation of fab lines. The gravy train is over, and we might actually have to start paying attention to software costs.

Many people who are serious about parallelism are ignoring these toy 4-core or even 32-core machines in favor of GPUs with 128 processors or more. My guess is that the real action is going to be there.
0 讨论(0)
发布评论:

提交评论
- 加载中...
情书的邮戳

2020-12-12 11:11

I think this trend will first persuade some developers, and then most of them will see that parallelization is a really complex task. I expect some design pattern to come to take care of this complexity. Not low level ones but architectural patterns which will make hard to do something wrong.

For example I expect messaging patterns to gain popularity, because it's inherently asynchronous, but you don't think about deadlock or mutex or whatever.

0 讨论(0)
发布评论:

提交评论
- 加载中...
天涯浪人

2020-12-12 11:16

You say "For web applications it's very, very easy: ignore it. Unless you've got some code that really begs to be done in parallel you can simply write old-style single-threaded code and be happy."

I am working with Web applications and I do need to take full advantage of parallelism. I understand your point. However, we must prepare for the multicore revolution. Ignoring it is the same than ignoring the GUI revolution in the 90's.

We are not still developing for DOS? We must tackle multicore or we'll be dead in many years.

0 讨论(0)
发布评论:

提交评论
- 加载中...