What is faster- Java or C# (or good old C)? [closed]

后端未结

关注

 26  2194

情话喂你

相关标签:

26条回答

旧巷少年郎

2021-01-31 10:05

I would consider what everyone else uses - not the folks on this site, but the folks who write the same kind of massively parallel, or super high-performance applications.

I find they all write their code in C/C++. So, just for this fact alone (ie. regardless of any speed issues between the languages), I would go with C/C++. The tools they use and have developed will be of much more use to you if you're writing in the same language.

Aside from that, I've found C# apps to have somewhat less than optimal performance in some areas, multithreading is one. .NET will try to keep you safe from thread problems (probably a good thing in most cases), but this will cause your specific case problems (to test: try writing a simple loop that accesses a shared object using lots of threads. Run that on a single core PC and you get better performance than if you run it on a multiple core box - .net is adding its own locks to make sure you don't muck it up)(I used Jon Skeet's singleton benchmark. The static lock on took 1.5sec on my old laptop, 8.5s on my superfast desktop, the lock version is even worse, try it yourself)

The next point is that with C you tend to access memory and data directly - nothing gets in the way, with C#/Java you will use some of the many classes that are provided. These will be good in the general case, but you're after the best, most efficient way to access this (which, for your case is a big deal with multi-terabytes of data, those classes were not designed with those datasets in mind, they were designed for the common cases everyone else uses), so again, you would be safer using C for this - you'll never get the GC getting clogged up by a class that creates new strings internally when you read a couple of terabytes of data if you write it in C!

So it may appear that C#/Java can give you benefits over a native application, but I think you'll find those benefits are only realised for the kind of line-of-business applications that are commonly written.

0 讨论(0)
发布评论:

提交评论
- 加载中...
难免孤独

2021-01-31 10:11
You say "the code is multithreaded" which implies that the algorithms are parallelisable. Also, you save the "data sets are several terabytes in size".

Optimising is all about finding and eliminating bottlenecks.

The obvious bottleneck is the bandwidth to the data sets. Given the size of the data, I'm guessing that the data is held on a server rather than on a desktop machine. You haven't given any details of the algorithms you're using. Is the time taken by the algorithm greater than the time taken to read/write the data/results? Does the algorithm work on subsets of the total data?

I'm going to assume that the algorithm works on chunks of data rather than the whole dataset.

You have two scenarios to consider:
1. The algorithm takes more time to process the data than it does to get the data. In this case, you need to optimise the algorithm.
2. The algorithm takes less time to process the data than it does to get the data. In this case, you need to increase the bandwidth between the algorithm and the data.
In the first case, you need a developer that can write good assembler code to get the most out of the processors you're using, leveraging SIMD, GPUs and multicores if they're available. Whatever you do, don't just crank up the number of threads because as soon as the number of threads exceeds the number of cores, your code goes slower! This due to the added overhead of switching thread contexts. Another option is to use a SETI like distributed processing system (how many PCs in your organisation are used for admin purposes - think of all that spare processing power!). C#/Java, as bh213 mentioned, can be an order of magnitude slower than well written C/C++ using SIMD, etc. But that is a niche skillset these days.

In the latter case, where you're limited by bandwidth, then you need to improve the network connecting the data to the processor. Here, make sure you're using the latest ethernet equipment - 1Gbps everywhere (PC cards, switches, routers, etc). Don't use wireless as that's slower. If there's lots of other traffic, consider a dedicated network in parallel with the 'office' network. Consider storing the data closer to the clients - for every five or so clients use a dedicated server connected directly to each client which mirrors the data from the server.

If saving a few percent of processing time saves "tens of thousands of dollars" then seriously consider getting a consultant in, two actually - one software, one network. They should easily pay for themselves in the savings made. I'm sure there's many here that are suitably qualified to help.

But if reducing cost is the ultimate goal, then consider Google's approach - write code that keeps the CPU ticking over below 100%. This saves energy directly and indirectly through reduced cooling, thus costing less. You'll want more bang for your buck so it's C/C++ again - Java/C# have more overhead, overhead = more CPU work = more energy/heat = more cost.

So, in summary, when it comes to saving money there's a lot more to it than what language you're going to choose.
0 讨论(0)
发布评论:

提交评论
- 加载中...
离开以前

2021-01-31 10:13

One thing to notice is that IF your application(s) would benefit of lazy evaluation a functional programming language like Haskell may yield speedups of a totally different magnitude than the theretically optimal structured/OO code just by not evaluating unnecessary branches.

Also, if you are talking about the monetary benefit of better performance, don't forget to add the cost of maintaing your software into the equation.

0 讨论(0)
发布评论:

提交评论
- 加载中...
南笙

2021-01-31 10:14

I'm sorry, but that is not a simple question. It would depend a lot on what exactly was going on. C# is certainly no slouch, and you'd be hard-pressed to say "java is faster" or "C# is faster". C is a very different beast... it maybe has the potential to be faster - if you get it right; but in most cases it'll be about the same, but much harder to write.

It also depends how you do it - locking strategies, how you do the parallelization, the main code body, etc.

Re JIT - you could use NGEN to flatten this, but yes; if you are hitting the same code it should be JITted very early on.

One very useful feature of C#/Java (over C) is that they have the potential to make better use of the local CPU (optimizations etc), without you having to worry about it.

Also - with .NET, consider things like "Parallel Extensions" (to be bundled in 4.0), which gives you a much stronger threading story (compared to .NET without PFX).

0 讨论(0)
发布评论:

提交评论
- 加载中...
难免孤独

2021-01-31 10:14
Don't worry about language; parallelize!

If you have a highly multithreaded, data-intensive scientific code, then I don't think worrying about language is the biggest issue for you. I think you should concentrate on making your application parallel, especially making it scale past a single node. This will get you far more performance than just switching languages.

As long as you're confined to a single node, you're going to be starved for compute power and bandwidth for your app. On upcoming many-core machines, it's not clear that you'll have the bandwidth you need to do data-intensive computing on all the cores. You can do computationally intensive work (like a GPU does), but you may not be able to feed all the cores if you need to stream a lot of data to every one of them.

I think you should consider two options:
1. MapReduce
  Your problem sounds like a good match for something like Hadoop, which is designed for very data-intensive jobs.
  
  Hadoop has scaled to 10,000 nodes on Linux, and you can shunt your work off either to someone else's (e.g. Amazon's, Microsoft's) or your own compute cloud. It's written in Java, so as far as porting goes, you can either call your existing C code from within Java, or you can port the whole thing to Java.
2. MPI
  If you don't want to bother porting to MapReduce, or if for some reason your parallel paradigm doesn't fit the MapReduce model, you could consider adapting your app to use MPI. This would also allow you to scale out to (potentially thousands) of cores. MPI is the de-facto standard for computationally intensive, distributed-memory applications, and I believe there are Java bindings, but mostly people use MPI with C, C++, and Fortran. So you could keep your code in C and focus on parallelizing the performance-intensive parts. Take a look at OpenMPI for starters if you are interested.
0 讨论(0)
发布评论:

提交评论
- 加载中...
被撕碎了的回忆

2021-01-31 10:14
The most important things are already said here. I would add:

The developer utilizes a language which the compiler(s) utilize(s) to generate machine instructions which the processor(s) utilize(s) to use system resources. A program will be "fast" when ALL parts of the chain perform optimally.

So for the "best" language choice:
- take that language which you are best able to control and
- which is able to instruct the compiler sufficiently to
- generate nearly optimal machine code so that
- the processor on the target machine is able to utilize processing resources optimally.
If you are not a performance expert you will have a hard time to archieve 'peak performance' within ANY language. Possibly C++ still provides the most options to control the machine instructions (especially SSE extensions a.s.o).

I suggest to orient on the well known 80:20 rule. This is fairly well true for all: the hardware, the languages/platforms and the developer efforts.

Developers have always relied on the hardware to fix all performance issues automatically due to an upgrade to a faster processor f.e.. What might have worked in the past will not work in the (nearest) future. The developer now has the responsibility to structure her programs accordingly for parallelized execution. Languages for virtual machines and virtual runtime environments will show some advantage here. And even without massive parallelization there is little to no reason why C# or Java shouldn't succeed similar well as C++.

@Edit: See this comparison of C#, Matlab and FORTRAN, where FORTRAN does not win alone!
0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题