Matlab matrix multiplication speed

后端 未结 2 1188
感情败类
感情败类 2021-01-05 06:44

I was wondering how can matlab multiply two matrices so fast. When multiplying two NxN matrices, N^3 multiplications are performed. Even with the Strassen Algorithm it take

相关标签:
2条回答
  • 2021-01-05 06:58

    It's a combination of several things:

    • Matlab does indeed multi-thread.
    • The core is heavily optimized with vector instructions.

    Here's the numbers on my machine: Core i7 920 @ 3.5 GHz (4 cores)

    >> a = rand(10000);
    >> b = rand(10000);
    >> tic;a*b;toc
    Elapsed time is 52.624931 seconds.
    

    Task Manager shows 4 cores of CPU usage.

    Now for some math:

    Number of multiplies = 10000^3 = 1,000,000,000,000 = 10^12
    
    Max multiplies in 53 secs =
        (3.5 GHz) * (4 cores) * (2 mul/cycle via SSE) * (52.6 secs) = 1.47 * 10^12
    

    So Matlab is achieving about 1 / 1.47 = 68% efficiency of the maximum possible CPU throughput.

    I see nothing out of the ordinary.

    0 讨论(0)
  • 2021-01-05 07:00

    To check whether you do or not use multi-threading in MATLAB use this command

    maxNumCompThreads(n)
    

    This sets the number of cores to use to n. Now I have a Core i7-2620M, which has a maximum frequency of 2.7GHz, but it also has a turbo mode with 3.4GHz. The CPU has two cores. Let's see:

    A = rand(5000);
    B = rand(5000);
    maxNumCompThreads(1);
    tic; C=A*B; toc
    Elapsed time is 10.167093 seconds.
    
    maxNumCompThreads(2);
    tic; C=A*B; toc
    Elapsed time is 5.864663 seconds.
    

    So there is multi-threading.

    Let's look at the single CPU results. A*B executes approximately 5000^3 multiplications and additions. So the performance of single-threaded code is

    5000^3*2/10.8 = 23 GFLOP/s
    

    Now the CPU. 3.4 GHz, and Sandy Bridge can do maximum 8 FLOPs per cycle with AVX:

    3.4 [Ginstructions/second] * 8 [FLOPs/instruction] = 27.2 GFLOP/s peak performance
    

    So single core performance is around 85% peak, which is to be expected for this problem.

    You really need to look deeply into the capabilities of your CPU to get accurate performannce estimates.

    0 讨论(0)
提交回复
热议问题