I am working on capturing and streaming audio to RTMP server at a moment. I work under MacOS (in Xcode), so for capturing audio sample-buffer I use AVFoundation-framework.
If anyone ended up here, I had the same issue, and just as @Mohit pointed out for AAC each audio frame has to be broken down into 1024 bytes chunks.
example:
uint8_t *buffer = (uint8_t*) malloc(1024);
AVFrame *frame = av_frame_alloc();
while((fread(buffer, 1024, 1, fp)) == 1) {
frame->data[0] = buffer;
}