Why sliced thread affect so much on realtime encoding using ffmpeg x264?

后端 未结 1 919
-上瘾入骨i
-上瘾入骨i 2021-02-20 03:26

I\'m using ffmpeg libx264 to encode a 720p screen captured from x11 in realtime with a fps of 30. when I use -tune zerolatency paramenter, the average encode ti

相关标签:
1条回答
  • 2021-02-20 04:15

    The documentation shows that frame-based threading has better throughput than slice-based. It also notes that the latter doesn't scale well due to parts of the encoder that are serial.

    Speedup vs. encoding threads for the veryfast profile (non-realtime):

    threads  speedup       psnr
          slice frame   slice  frame
    x264 --preset veryfast --tune psnr --crf 30
     1:   1.00x 1.00x  +0.000 +0.000
     2:   1.41x 2.29x  -0.005 -0.002
     3:   1.70x 3.65x  -0.035 +0.000
     4:   1.96x 3.97x  -0.029 -0.001
     5:   2.10x 3.98x  -0.047 -0.002
     6:   2.29x 3.97x  -0.060 +0.001
     7:   2.36x 3.98x  -0.057 -0.001
     8:   2.43x 3.98x  -0.067 -0.001
     9:         3.96x         +0.000
    10:         3.99x         +0.000
    11:         4.00x         +0.001
    12:         4.00x         +0.001
    

    The main difference seems to be that frame threading adds frame latency as is needs different frames to work on, while in the case of slice-based threading all threads work on the same frame. In realtime encoding it would need to wait for more frames to arrive to fill the pipeline as opposed to offline.

    Normal threading, also known as frame-based threading, uses a clever staggered-frame system for parallelism. But it comes at a cost: as mentioned earlier, every extra thread requires one more frame of latency. Slice-based threading has no such issue: every frame is split into slices, each slice encoded on one core, and then the result slapped together to make the final frame. Its maximum efficiency is much lower for a variety of reasons, but it allows at least some parallelism without an increase in latency.

    From: Diary of an x264 Developer

    Sliceless threading: example with 2 threads. Start encoding frame #0. When it's half done, start encoding frame #1. Thread #1 now only has access to the top half of its reference frame, since the rest hasn't been encoded yet. So it has to restrict the motion search range. But that's probably ok (unless you use lots of threads on a small frame), since it's pretty rare to have such long vertical motion vectors. After a little while, both threads have encoded one row of macroblocks, so thread #1 still gets to use motion range = +/- 1/2 frame height. Later yet, thread #0 finishes frame #0, and moves on to frame #2. Thread #0 now gets motion restrictions, and thread #1 is unrestricted.

    From: http://web.archive.org/web/20150307123140/http://akuvian.org/src/x264/sliceless_threads.txt

    Therefore it makes sense to enable sliced-threads with -tune zereolatency as you need to send a frame as soon as possible rather then encode them efficiently (performance and quality wise).

    Using too many threads on the contrary can impact performance as the overhead to maintain them can exceed the potential gains.

    0 讨论(0)
提交回复
热议问题