How to encode resampled PCM-audio to AAC using ffmpeg-API when input pcm samples count not equal 1024

前端未结

关注

 4  1549

I am working on capturing and streaming audio to RTMP server at a moment. I work under MacOS (in Xcode), so for capturing audio sample-buffer I use AVFoundation-framework.

相关标签:

4条回答

忘了有多久

2021-01-12 16:54

I also ended up here after having a similar problem. I'm reading audio and video from a Blackmagic Decklink SDI card in 720p50 meaning I had 960 samples per videoframe (48k/50fps) I wanted to encode together with the video. Got really weird audio when only sending 960 samples to aacenc and it didn't really complain about this fact either.

Started to use AVAudioFifo (see ffmpeg/doc/examples/transcode_aac.c) and kept adding frames to it until I had enough frames to satisfy aacenc. This will mean I have samples playing too late I guess, since pts will be set on 1024 samples when the first 960 should really have another value. But, it's not really noticeable as far as I can hear/see.

0 讨论(0)
发布评论:

提交评论
- 加载中...
有刺的猬

2021-01-12 16:59
If anyone ended up here, I had the same issue, and just as @Mohit pointed out for AAC each audio frame has to be broken down into 1024 bytes chunks.

example:
```
uint8_t *buffer = (uint8_t*) malloc(1024);
AVFrame *frame = av_frame_alloc();
while((fread(buffer, 1024, 1, fp)) == 1) {
    frame->data[0] = buffer;
}
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
陌清茗

2021-01-12 17:05

You have to break sample buffer into chunks of size 1024, i did for recording mp3 in android for more info follow these links link1,links2

0 讨论(0)
发布评论:

提交评论
- 加载中...

梦毁少年i

2021-01-12 17:10

I got a similar problem. I was encoding PCM packets to AAC while the length of PCM packets are sometimes smaller than 1024.

If I encode the packet that's smaller than 1024, the audio will be slow. On the other hand, if I throw it away, the audio will get faster. swr_convert function didn't have any automatic buffering from my observation.

I ended up with a buffer scheme that packets was filled to a 1024 buffer and the buffer gets encoded and cleaned everytime it's full.

The function to fill buffer is below:

// put frame data into buffer of fixed size
bool ffmpegHelper::putAudioBuffer(const AVFrame *pAvFrameIn, AVFrame **pAvFrameBuffer, AVCodecContext *dec_ctx, int frame_size, int &k0) {
  // prepare pFrameAudio
  if (!(*pAvFrameBuffer)) {
    if (!(*pAvFrameBuffer = av_frame_alloc())) {
      av_log(NULL, AV_LOG_ERROR, "Alloc frame failed\n");
      return false;
    } else {
      (*pAvFrameBuffer)->format = dec_ctx->sample_fmt;
      (*pAvFrameBuffer)->channels = dec_ctx->channels;
      (*pAvFrameBuffer)->sample_rate = dec_ctx->sample_rate;
      (*pAvFrameBuffer)->nb_samples = frame_size;
      int ret = av_frame_get_buffer(*pAvFrameBuffer, 0);
      if (ret < 0) {
        char err[500];
        av_log(NULL, AV_LOG_ERROR, "get audio buffer failed: %s\n",
          av_make_error_string(err, AV_ERROR_MAX_STRING_SIZE, ret));
        return false;
      }
      (*pAvFrameBuffer)->nb_samples = 0;
      (*pAvFrameBuffer)->pts = pAvFrameIn->pts;
    }
  }

  // copy input data to buffer
  int n_channels = pAvFrameIn->channels;
  int new_samples = min(pAvFrameIn->nb_samples - k0, frame_size - (*pAvFrameBuffer)->nb_samples);
  int k1 = (*pAvFrameBuffer)->nb_samples;

  if (pAvFrameIn->format == AV_SAMPLE_FMT_S16) {
    int16_t *d_in = (int16_t *)pAvFrameIn->data[0];
    d_in += n_channels * k0;
    int16_t *d_out = (int16_t *)(*pAvFrameBuffer)->data[0];
    d_out += n_channels * k1;

    for (int i = 0; i < new_samples; ++i) {
      for (int j = 0; j < pAvFrameIn->channels; ++j) {
        *d_out++ = *d_in++;
      }
    }
  } else {
    printf("not handled format for audio buffer\n");
    return false;
  }

  (*pAvFrameBuffer)->nb_samples += new_samples;
  k0 += new_samples;

  return true;
}

And the loop for fill buffer and encode is below:

// transcoding needed
int got_frame;
AVMediaType stream_type;
// decode the packet (do it your self)
decodePacket(packet, dec_ctx, &pAvFrame_, got_frame);

if (enc_ctx->codec_type == AVMEDIA_TYPE_AUDIO) {
    ret = 0;
    // break audio packet down to buffer
    if (enc_ctx->frame_size > 0) {
        int k = 0;
        while (k < pAvFrame_->nb_samples) {
            if (!putAudioBuffer(pAvFrame_, &pFrameAudio_, dec_ctx, enc_ctx->frame_size, k))
                return false;
            if (pFrameAudio_->nb_samples == enc_ctx->frame_size) {
                // the buffer is full, encode it (do it yourself)
                ret = encodeFrame(pFrameAudio_, stream_index, got_frame, false);
                if (ret < 0)
                    return false;
                pFrameAudio_->pts += enc_ctx->frame_size;
                pFrameAudio_->nb_samples = 0;
            }
        }
    } else {
        ret = encodeFrame(pAvFrame_, stream_index, got_frame, false);
    }
} else {
    // encode packet directly
    ret = encodeFrame(pAvFrame_, stream_index, got_frame, false);
}

0 讨论(0)