Can the ffmpeg av libs return an accurate PTS?

后端 未结 3 694
無奈伤痛
無奈伤痛 2020-12-29 14:28

I\'m working with an mpeg stream that uses a IBBP... GOP sequence. The (DTS,PTS) values returned for the first 4 AVPackets are as follows: I=(0,3) B=(1,1)

相关标签:
3条回答
  • 2020-12-29 14:57

    I'm fairly certain you are getting accurate values. It might help if you thing of an MPEG stream as, well, a stream. In that case, prior to the IBBPBB that you see there would normally be another GOP. Maybe something like this (using same notation as original question):

    P(-3,-2)  B(-2,-1)  B(-1,0)
    

    Basically the B frames after the I frames are based on the I frame and the last P frame from the previous GOP.

    While it makes logical sense for a video to start off with this:

    Start GOP: IPBBPBBPBB...
    

    Later on it must be

    Start GOP: IBBPBBPBBPBB
    Start GOP: IBBPBBPBBPBB
    Start GOP: IBB... 
    

    Remember that decoding any B frame requires a complete frame before it and after it. So each pair of B frames should be displayed before the I or P frame just prior to it in the file.

    FFMPEG may just have forgone the "special case" of first GOP.

    Since the first two B frames don't have a prior frame to manipulate, you should be able to safely discard them. Just rebase your timestamps off of the first I frame and adjust the audio stream the same amount.

    Whether this will actually result in a loss of frames will depend on FFMPEG's implementation, but worse case scenario is that you lose 83 milliseconds (2 frames at 24 frames/sec).

    0 讨论(0)
  • 2020-12-29 14:59

    I think I finally figured out what's going on based on a comment made in http://www.dranger.com/ffmpeg/tutorial05.html:

    ffmpeg reorders the packets so that the DTS of the packet being processed by avcodec_decode_video() will always be the same as the PTS of the frame it returns

    Translation: If I feed a packet into avcodec_decode_video() that has a PTS of 12, avcodec_decode_video() will not return the decoded frame contained in that packet until I feed it a later packet that has a DTS of 12. If the packet's PTS is the same as its DTS, then the packet given is the same as the frame returned. If the packet's PTS is 2 frames later than its DTS, then avcodec_decode_video() will delay the frame and not return it until I provide 2 more packets.

    Based on this behavior, I'm guessing that av_read_frame() is maybe reordering the packets from IPBB to IBBP so that avcodec_decode_video() only has to buffer the P frames for 3 frames instead of 5. For example, the difference between the input and the output of the P frame with this ordering is 3 (6 - 3):

    |                  I B B P B B P
    |             DTS: 0 1 2 3 4 5 6
    | decode() result:       I B B P
    

    vs. a difference of 5 with the standard ordering (6 - 1):

    |                  I P B B P B B
    |             DTS: 0 1 2 3 4 5 6
    | decode() result:       I B B P
    

    <shrug/> but that is pure conjecture.

    0 讨论(0)
  • 2020-12-29 15:00

    Ok, scratch my previous confused reply.

    For a IBBPBBI movie, you'd expect the PTSes to look like this (in decoding order)

    0, 3, 1, 2, 6, 4, 5, ...
    

    corresponding to the frames

    I, P, B, B, I, B, B, ...
    

    So you appear to be missing an I at the start of your sequence but otherwise the timestamps look correct.

    0 讨论(0)
提交回复
热议问题