I have a uncompressed .wav file that I turn into a 96k MP3 file:
ffmpeg.exe -i song.wav -vn -b:a 96000 -ac 2 -ar 48000 -acodec libmp3lame -y song.mp3
The amount of delay added by LAME in FFmpeg is
avctx->initial_padding = lame_get_encoder_delay(s->gfp) + 528 + 1;
From the FAQ of the LAME project:
2. Why does LAME add silence to the beginning each song?
DECODER DELAY AT START OF FILE:
All decoders I have tested introduce a delay of 528 samples. That is, after decoding an mp3 file, the output will have 528 samples of 0's appended to the front. This is because the standard MDCT/filterbank routines used by the ISO have a 528 sample delay. It would be possible to write a MDCT/filterbank routine with a 0 sample delay (see description of Takehiro's MDCT/filterbank routine used in LAME encoding below) but I dont know that anyone has done this. Furthermore, because of the overlapped nature of MDCT frames, the first half of the first granule (1 granule=576 samples) doesn't have a previous frame to overlap with, resulting in attenuation of the first N samples. The value of N depends on the window type. For "STOP_TYPE" and "SHORT_TYPE", N=96, while for "START_TYPE" and "NORMAL_TYPE", N=288. The first frame produced by LAME 3.56 and up will always be of STOP_TYPE or SHORT_TYPE.
ENCODER DELAY AT START OF FILE:
ISO based encoders (BladeEnc, 8hz-mp3, etc) use a MDCT/filterbank routine similar to the one used in decoding, and thus also introduce their own 528 sample delay. A .wav file encoded & decoded will have a 1056 sample delay (1056 samples will be appended to the beginning).
The discrepancy as per the FAQ isn't the same as in your output, probably because of technical nuances that I don't know of, but it's not a bug.