I am designing a dedicated syslog-processing daemon for Linux that needs to be robust and scalable and I\'m debating multithread vs. multiprocess.
The obvious objection
Depends on what programming language you want to use (and which libraries). Personally I would choose multithreading, as I know the problems associated with threads (and how to solve them).
Multiprocessing might help you if you want to run the daemon on multiple machines and distribute the load amongst them, but I don't think that that's a major problem here.
If you want robustness, use multi-processing.
The processes will share the logging load between them. Sooner or later, a logging request will hit a bug and crash the logger. With multi-processing, you only lose one process and so only that one logging request (which you couldn't have handled anyway, because of the bug).
Multi-threading is vulnerable to crashes, since one fatal bug takes out your single process.
Mulit-processing is in some ways more technically challenging, since you have to balance workload over processes, which may entail using shared memory.
Thanks everyone for your feedback.
I have decided on a multi-process architecture, similar to the Apache web server. The processes will scale nicely on multi-processor/core systems. Communications will be performed with pipes or sockets.
Processes will be ready to use in a process-pool so there's no process spawning cost.
The performance hit will be negligible in comparison to the robustness I'll gain.
One question is whether it's necessary to do either. I don't know the details of your requirements, but a single threaded app using select(2)
may fit your needs and not have the disadvantages of either processes or threads. This requires that you be able to centralize all of your I/O into one central place, most likely dispatching to other modules via callbacks, but that isn't all that hard unless you have a lot of libraries that want to do their own I/O and can't be restructured in this way.
You've left out too many details. Actually, in terms of what you have already stated, the choice is irrelevant and there is nothing inherently more buggy about multithreading than multiprocessing; you're missing why these techniques have such a reputation. If you aren't sharing data then there isn't much problem to be had (of course, there may be some other issues, but we need details to decide about those). Also, it matters what platform, on UNIX like operating systems, processes are pretty lightweight anyway.
However, there are other issues to consider? What kind of system(s) will you be running on? You definitely don't want to spawn out several processes on a uniprocessor system as you aren't going to get much benefit, depending on some other details you could specify. If you describe the nature of the problem you are trying to solve, we can help further.
Do you need to share updating data between the instances where the updates are frequent and IPC would be too expensive? In that case multithreading is probably better. Otherwise you have to weigh whether the robustness of separate processes or the ease of thread creation/communication is more important to you.