How to encode a video from several images generated in a C++ program without writing the separate frame images to disk?

后端 未结 4 742
北海茫月
北海茫月 2020-12-07 15:51

I am writing a C++ code where a sequence of N different frames is generated after performing some operations implemented therein. After each frame is completed, I write it o

相关标签:
4条回答
  • 2020-12-07 16:24

    After some intense struggle, I finally managed to make it work after learning a bit how to use the FFmpeg and libx264 C APIs for my specific purpose, thanks to the useful information that some users provided in this site and some others, as well as some FFmpeg's documentation examples. For the sake of illustration, the details will be presented next.

    First of all, the libx264 C library was compiled and, after that, the FFmpeg one with the configure options --enable-gpl --enable-libx264. Now let us go to the coding. The relevant part of the code that achieved the requested purpose is the following one:

    Includes:

    #include <stdint.h>
    extern "C"{
    #include <x264.h>
    #include <libswscale/swscale.h>
    #include <libavcodec/avcodec.h>
    #include <libavutil/mathematics.h>
    #include <libavformat/avformat.h>
    #include <libavutil/opt.h>
    }
    

    LDFLAGS on Makefile:

    -lx264 -lswscale -lavutil -lavformat -lavcodec
    

    Inner code (for the sake of simplicity, the error checkings will be omitted and the variable declarations will be done when needed instead of the beginning for better understanding):

    av_register_all(); // Loads the whole database of available codecs and formats.
    
    struct SwsContext* convertCtx = sws_getContext(width, height, AV_PIX_FMT_RGB24, width, height, AV_PIX_FMT_YUV420P, SWS_FAST_BILINEAR, NULL, NULL, NULL); // Preparing to convert my generated RGB images to YUV frames.
    
    // Preparing the data concerning the format and codec in order to write properly the header, frame data and end of file.
    char *fmtext="mp4";
    char *filename;
    sprintf(filename, "GeneratedVideo.%s", fmtext);
    AVOutputFormat * fmt = av_guess_format(fmtext, NULL, NULL);
    AVFormatContext *oc = NULL;
    avformat_alloc_output_context2(&oc, NULL, NULL, filename);
    AVStream * stream = avformat_new_stream(oc, 0);
    AVCodec *codec=NULL;
    AVCodecContext *c= NULL;
    int ret;
    
    codec = avcodec_find_encoder_by_name("libx264");
    
    // Setting up the codec:
    av_dict_set( &opt, "preset", "slow", 0 );
    av_dict_set( &opt, "crf", "20", 0 );
    avcodec_get_context_defaults3(stream->codec, codec);
    c=avcodec_alloc_context3(codec);
    c->width = width;
    c->height = height;
    c->pix_fmt = AV_PIX_FMT_YUV420P;
    
    // Setting up the format, its stream(s), linking with the codec(s) and write the header:
    if (oc->oformat->flags & AVFMT_GLOBALHEADER) // Some formats require a global header.
        c->flags |= AV_CODEC_FLAG_GLOBAL_HEADER;
    avcodec_open2( c, codec, &opt );
    av_dict_free(&opt);
    stream->time_base=(AVRational){1, 25};
    stream->codec=c; // Once the codec is set up, we need to let the container know which codec are the streams using, in this case the only (video) stream.
    av_dump_format(oc, 0, filename, 1);
    avio_open(&oc->pb, filename, AVIO_FLAG_WRITE);
    ret=avformat_write_header(oc, &opt);
    av_dict_free(&opt); 
    
    // Preparing the containers of the frame data:
    AVFrame *rgbpic, *yuvpic;
    
    // Allocating memory for each RGB frame, which will be lately converted to YUV:
    rgbpic=av_frame_alloc();
    rgbpic->format=AV_PIX_FMT_RGB24;
    rgbpic->width=width;
    rgbpic->height=height;
    ret=av_frame_get_buffer(rgbpic, 1);
    
    // Allocating memory for each conversion output YUV frame:
    yuvpic=av_frame_alloc();
    yuvpic->format=AV_PIX_FMT_YUV420P;
    yuvpic->width=width;
    yuvpic->height=height;
    ret=av_frame_get_buffer(yuvpic, 1);
    
    // After the format, code and general frame data is set, we write the video in the frame generation loop:
    // std::vector<uint8_t> B(width*height*3);
    

    The above commented vector has the same structure than the one I exposed in my question; however, the RGB data is stored on the AVFrames in a specific way. Therefore, for the sake of exposition, let us assume we have instead a pointer to a structure of the form uint8_t[3] Matrix(int, int), whose way to access the color values of the pixels for a given coordinate (x, y) is Matrix(x, y)->Red, Matrix(x, y)->Green and Matrix(x, y)->Blue, in order to get, respectively, to the red, green and blue values of the coordinate (x, y). The first argument stands for the horizontal position, from left to right as x increases and the second one for the vertical position, from top to bottom as y increases.

    Being that said, the for loop to transfer the data, encode and write each frame would be the following one:

    Matrix B(width, height);
    int got_output;
    AVPacket pkt;
    for (i=0; i<N; i++)
    {
        generateframe(B, i); // This one is the function that generates a different frame for each i.
        // The AVFrame data will be stored as RGBRGBRGB... row-wise, from left to right and from top to bottom, hence we have to proceed as follows:
        for (y=0; y<height; y++)
        {
            for (x=0; x<width; x++)
            {
                // rgbpic->linesize[0] is equal to width.
                rgbpic->data[0][y*rgbpic->linesize[0]+3*x]=B(x, y)->Red;
                rgbpic->data[0][y*rgbpic->linesize[0]+3*x+1]=B(x, y)->Green;
                rgbpic->data[0][y*rgbpic->linesize[0]+3*x+2]=B(x, y)->Blue;
            }
        }
        sws_scale(convertCtx, rgbpic->data, rgbpic->linesize, 0, height, yuvpic->data, yuvpic->linesize); // Not actually scaling anything, but just converting the RGB data to YUV and store it in yuvpic.
        av_init_packet(&pkt);
        pkt.data = NULL;
        pkt.size = 0;
        yuvpic->pts = i; // The PTS of the frame are just in a reference unit, unrelated to the format we are using. We set them, for instance, as the corresponding frame number.
        ret=avcodec_encode_video2(c, &pkt, yuvpic, &got_output);
        if (got_output)
        {
            fflush(stdout);
            av_packet_rescale_ts(&pkt, (AVRational){1, 25}, stream->time_base); // We set the packet PTS and DTS taking in the account our FPS (second argument) and the time base that our selected format uses (third argument).
            pkt.stream_index = stream->index;
            printf("Write frame %6d (size=%6d)\n", i, pkt.size);
            av_interleaved_write_frame(oc, &pkt); // Write the encoded frame to the mp4 file.
            av_packet_unref(&pkt);
        }
    }
    // Writing the delayed frames:
    for (got_output = 1; got_output; i++) {
        ret = avcodec_encode_video2(c, &pkt, NULL, &got_output);
        if (got_output) {
            fflush(stdout);
            av_packet_rescale_ts(&pkt, (AVRational){1, 25}, stream->time_base);
            pkt.stream_index = stream->index;
            printf("Write frame %6d (size=%6d)\n", i, pkt.size);
            av_interleaved_write_frame(oc, &pkt);
            av_packet_unref(&pkt);
        }
    }
    av_write_trailer(oc); // Writing the end of the file.
    if (!(fmt->flags & AVFMT_NOFILE))
        avio_closep(oc->pb); // Closing the file.
    avcodec_close(stream->codec);
    // Freeing all the allocated memory:
    sws_freeContext(convertCtx);
    av_frame_free(&rgbpic);
    av_frame_free(&yuvpic);
    avformat_free_context(oc);
    

    Side notes:

    For future reference, as the available information on the net concerning the time stamps (PTS/DTS) looks so confusing, I will next explain as well how I did manage to solve the issues by setting the proper values. Setting these values incorrectly caused that the output size was being much bigger than the one obtained through the ffmpeg built binary commandline tool, because the frame data was being redundantly written through smaller time intervals than the actually set by the FPS.

    First of all, it should be remarked that when encoding there are two kinds of time stamps: one associated to the frame (PTS) (pre-encoding stage) and two associated to the packet (PTS and DTS) (post-encoding stage). In the first case, it looks like the frame PTS values can be assigned using a custom unit of reference (with the only restriction that they must be equally spaced if one wants constant FPS), so one can take for instance the frame number as we did in the above code. In the second one, we have to take into account the following parameters:

    • The time base of the output format container, in our case mp4 (=12800 Hz), whose information is held in stream->time_base.
    • The desired FPS of the video.
    • If the encoder generates B-frames or not (in the second case the PTS and DTS values for the frame must be set the same, but it is more complicated if we are in the first case, like in this example). See this answer to another related question for more references.

    The key here is that luckily it is not necessary to struggle with the computation of these quantities, as libav provides a function to compute the correct time stamps associated to the packet by knowing the aforementioned data:

    av_packet_rescale_ts(AVPacket *pkt, AVRational FPS, AVRational time_base)
    

    Thanks to these considerations, I was finally able to generate a sane output container and essentially the same compression rate than the one obtained using the commandline tool, which were the two remaining issues before investigating more deeply how the format header and trailer and how the time stamps are properly set.

    0 讨论(0)
  • 2020-12-07 16:26

    avcodec_encode_video2 & avcodec_encode_audio2 seems to be deprecated. FFmpeg of Current Version (4.2) has new API: avcodec_send_frame & avcodec_receive_packet.

    0 讨论(0)
  • 2020-12-07 16:32

    Thanks for your excellent work, @ksb496 !

    One minor improvement:

    c=avcodec_alloc_context3(codec);
    

    should be better written as:

    c = stream->codec;
    

    to avoid a memory leak.

    If you don't mind, I've uploaded the complete ready-to-deploy library onto GitHub: https://github.com/apc-llc/moviemaker-cpp.git

    0 讨论(0)
  • 2020-12-07 16:32

    Thanks to ksb496 I managed to do this task, but in my case I need to change some codes to work as expected. I thought maybe it could help others so I decided to share (with two years delay :D).

    I had an RGB buffer filled by directshow sample grabber that I needed to take a video from. RGB to YUV conversion from given answer didn't do the job for me. I did it like this :

    int stride = m_width * 3;
    int index = 0;
    for (int y = 0; y < m_height; y++) {
        for (int x = 0; x < stride; x++) {
            int j = (size - ((y + 1)*stride)) + x;
            m_rgbpic->data[0][j] = data[index];
            ++index;
        }
    }
    

    data variable here is my RGB buffer (simple BYTE*) and size is data buffer size in bytes. It's start filling RGB AVFrame from bottom left to top right.

    The other thing is that my version of FFMPEG didn't have av_packet_rescale_ts function. It's latest version but FFMPEG docs didn't say this function is deprecated anywhere, I guess this might be the case for windows only. Anyway I used av_rescale_q instead that does the same job. like this :

    AVPacket pkt;
    pkt.pts = av_rescale_q(pkt.pts, { 1, 25 }, m_stream->time_base);
    

    And the last thing, using this format conversion I needed to change my swsContext to BGR24 instead of RGB24 like this :

    m_convert_ctx = sws_getContext(width, height, AV_PIX_FMT_BGR24, width, height,
            AV_PIX_FMT_YUV420P, SWS_FAST_BILINEAR, nullptr, nullptr, nullptr);
    
    0 讨论(0)
提交回复
热议问题