Does multithreading always yield better performance than single threading?

前端 未结 7 1743
自闭症患者
自闭症患者 2020-12-25 08:03

I know the answer is No, here is an example Why single thread is faster than multithreading in Java? .

So when processing a task in a thread is triv

相关标签:
7条回答
  • 2020-12-25 08:28

    This is a very good question regarding threading and its link to the real work, meaning the available physical CPU(s) and its cores and hyperthreads.

    1. Multiple threads might allow you to do things in parallel, if your CPU has more than one core available. So in an ideal world, e.g. calulating some primes, might be 4 times faster using 4 threads, if your CPU has 4 cores available and your algorithm work really parallel.
    2. If you start more threads as cores are available, the thread management of your OS will spend more and more time in Thread-Switches and in such your effiency using your CPU(s) becomes worse.
    3. If the compiler, CPU cache and/or runtime realized that you run more than one thread, accessing the same data-area in memory, is operates in a different optimization mode: As long as the compile/runtime is sure that only one thread access the data, is can avoid writing data out to extenral RAM too often and might efficently use the L1 cache of your CPU. If not: Is has to activate semaphores and also flush cached data more often from L1/L2 cache to RAM.

    So my lessons learned from highly parrallel multithreading have been:

    • If possible use single threaded, shared-nothing processes to be more efficient
    • If threads are required, decouple the shared data access as much as possible
    • Don't try to allocate more loaded worker threads than available cores if possible

    Here a small programm (javafx) to play with. It:

    • Allocates a byte array of 100.000.000 size, filled with random bytes
    • Provides a method, counting the number of bits set in this array
    • The method allow to count every 'nth' bytes bits
    • count(0,1) will count all bytes bits
    • count(0,4) will count the 0', 4', 8' byte bits allowing a parallel interleaved counting

    Using a MacPro (4 cores) results in:

    1. Running one thread, count(0,1) needs 1326ms to count all 399993625 bits
    2. Running two threads, count(0,2) and count(1,2) in parallel needs 920ms
    3. Running four threads, needs 618ms
    4. Running eight threads, needs 631ms

    enter image description here enter image description here enter image description here enter image description here

    Changing the way to count, e.g. incrementing a commonly shared integer (AtomicInteger or synchronized) will dramatically change the performance of many threads.

    public class MulithreadingEffects extends Application {
        static class ParallelProgressBar extends ProgressBar {
            AtomicInteger myDoneCount = new AtomicInteger();
            int           myTotalCount;
            Timeline      myWhatcher = new Timeline(new KeyFrame(Duration.millis(10), e -> update()));
            BooleanProperty running = new SimpleBooleanProperty(false);
    
            public void update() {
                setProgress(1.0*myDoneCount.get()/myTotalCount);
                if (myDoneCount.get() >= myTotalCount) {
                    myWhatcher.stop();
                    myTotalCount = 0;
                    running.set(false);
                }
            }
    
            public boolean isRunning() { return myTotalCount > 0; }
            public BooleanProperty runningProperty() { return running; }
    
            public void start(int totalCount) {
                myDoneCount.set(0);
                myTotalCount = totalCount;
                setProgress(0.0);
                myWhatcher.setCycleCount(Timeline.INDEFINITE);
                myWhatcher.play();
                running.set(true);
            }
    
            public void add(int n) {
                myDoneCount.addAndGet(n);
            }
        }
    
        int mySize = 100000000;
        byte[] inData = new byte[mySize];
        ParallelProgressBar globalProgressBar = new ParallelProgressBar();
        BooleanProperty iamReady = new SimpleBooleanProperty(false);
        AtomicInteger myCounter = new AtomicInteger(0);
    
        void count(int start, int step) {
            new Thread(""+start){
                public void run() {
                    int count = 0;
                    int loops = 0;
                    for (int i = start; i < mySize; i+=step) {
                        for (int m = 0x80; m > 0; m >>=1) {
                            if ((inData[i] & m) > 0) count++;
                        }
                        if (loops++ > 99) {
                            globalProgressBar.add(loops);
                            loops = 0;
                        }
                    }
                    myCounter.addAndGet(count);
                    globalProgressBar.add(loops);
                }
            }.start();
        }
    
        void pcount(Label result, int n) {
            result.setText("("+n+")");
            globalProgressBar.start(mySize);
            long start = System.currentTimeMillis();
            myCounter.set(0);
            globalProgressBar.runningProperty().addListener((p,o,v) -> {
                if (!v) {
                    long ms = System.currentTimeMillis()-start;
                    result.setText(""+ms+" ms ("+myCounter.get()+")");
                }
            });
            for (int t = 0; t < n; t++) count(t, n);
        }
    
        void testParallel(VBox box) {
            HBox hbox = new HBox();
    
            Label result = new Label("-");
            for (int i : new int[]{1, 2, 4, 8}) {
                Button run = new Button(""+i);
                run.setOnAction( e -> {
                    if (globalProgressBar.isRunning()) return;
                    pcount(result, i);
                });
                hbox.getChildren().add(run);
            }
    
            hbox.getChildren().addAll(result);
            box.getChildren().addAll(globalProgressBar, hbox);
        }
    
    
        @Override
        public void start(Stage primaryStage) throws Exception {        
            primaryStage.setTitle("ProgressBar's");
    
            globalProgressBar.start(mySize);
            new Thread("Prepare"){
                public void run() {
                    iamReady.set(false);
                    Random random = new Random();
                    random.setSeed(4711);
                    for (int i = 0; i < mySize; i++) {
                        inData[i] = (byte)random.nextInt(256);
                        globalProgressBar.add(1);
                    }
                    iamReady.set(true);
                }
            }.start();
    
            VBox box = new VBox();
            Scene scene = new Scene(box,400,80,Color.WHITE);
            primaryStage.setScene(scene);
    
            testParallel(box);
            GUIHelper.allowImageDrag(box);
    
            primaryStage.show();   
        }
    
        public static void main(String[] args) { launch(args); }
    }
    
    0 讨论(0)
  • 2020-12-25 08:35

    The overhead may be not only for creation, but for thread-intercommunications. The other thing that should be noted that synchronization of threads on a, for example, single object may lead to alike single thread execution.

    0 讨论(0)
  • 2020-12-25 08:39

    Not all algorithms can be processed in parallel (algorithms that are strictly sequential; where P=0 in Amdahl's law) or at least not efficiently (see P-complete). Other algorithms are more suitable for parallel execution (extreme cases are called "embarrassingly parallel").

    A naive implementation of a parallel algorithm can be less efficient in terms of complexity or space compared to a similar sequential algorithm. If there is no obvious way to parallelize an algorithm so that it will get a speedup, you may need to choose another similar parallel algorithm that solves the same problem but can be more or less efficient. If you ignore thread/process creation and direct inter-process communication overhead, there can still be other limiting factors when using shared resources like IO bottlenecks or increased paging caused by higher memory consumption.

    When should we decide to give up multithreading and only use a single thread to accomplish our goal?

    When deciding between single and multithreading, the time needed to change the implementation and the added complexity for developers should be taken into account. If there is only small gain by using multiple threads you could argue that the increased maintenance cost that are usually caused by multi-threaded applications are not worth the speedup.

    0 讨论(0)
  • 2020-12-25 08:41

    Threading is about taking advantage of idle resources to handle more work. If you have no idle resources, multi-threading has no advantages, so the overhead would actually make your overall runtime longer.

    For example, if you have a collection of tasks to perform and they are CPU-intensive calculations. If you have a single CPU, multi-threading probably wouldn't speed that process up (though you never know until you test). I would expect it to slow down slightly. You are changing how the work is split up, but no changes in capacity. If you have 4 tasks to do on a single CPU, doing them serially is 1 * 4. If you do them in parallel, you'll come out to basically 4 * 1, which is the same. Plus, the overhead of merging results and context switching.

    Now, if you have multiple CPU's, then running CPU-intensive tasks in multiple threads would allow you to tap unused resources, so more gets done per unit time.

    Also, think about other resources. If you have 4 tasks which query a database, running them in parallel helps if the database has extra resources to handle them all. Though, you are also adding more work, which removes resources from the database server, so I probably wouldn't do that.

    Now, let's say we need to make web service calls to 3 external systems and none of the calls have input dependent on each other. Doing them in parallel with multiple threads means that we don't have to wait for one to end before the other starts. It also means that running them in parallel won't negatively impact each task. This would be a great use case for multi-threading.

    0 讨论(0)
  • 2020-12-25 08:49

    Are there more cases where a single thread will be faster than multithreading?

    So in a GUI application you will benefit from multithreading. At the most basic level you will be updating the front end as well as what the front end is presenting. If you're running something basic like hello world then like you showed it would be more overhead.

    That question is very broad... Do you count Unit Tests as applications? If so then there are probably more applications that use single threads because any complex system will have (hopefully) at least 1 unit test. Do you count every Hello world style program as a different application or the same? If an application is deleted does it still count?

    As you can see I can't give a good response other than you would have to narrow the scope of your question to get a meaningful answer. That being said this may be a statistic out there that I'm unaware of.

    When should we decide to give up multithreading and only use a single thread to accomplish our goal?

    When multithreading will perform 'better' by whatever metric you think is important.

    Can your problem be broken into parts that can be processed simultaneously? Not in a contrived way like breaking Hello World into two threads where one thread waits on the other to print. But in a way that 2+ threads would be able to accomplish the task more efficiently than one?

    Even if a task is easily parallelizable doesn't mean that it should be. I could multithread an application that trolled thousands of new sites constantly to get me my news. For me personally this would suck because it would eat my pipe up and I wouldn't be able to get my FPS in. For CNN this might be exactly what they want and will build a mainframe to accommodate it.

    Can you narrow your questions?

    0 讨论(0)
  • 2020-12-25 08:51

    As already mentionened in a comment by @Jim Mischel, you can use

    Amdahl's law

    to calculate this. Amdahl's law states that the speedup gained from adding processors to solve a task is

    enter image description here

    where

    N is the number of processors, and

    P is the fraction of the code that can be executed in parallel (0 .. 1)

    Now if T is the time it takes to execute the task on a single processor, and O is the total 'overhead' time (create and set up a second thread, communication, ...), a single thread is faster if

    T < T/S(2) + O

    or, after reordering, if

    O/T > P/2

    When the ratio Overhead / Execution Time is greater than P/2, a single thread is faster.

    0 讨论(0)
提交回复
热议问题