When playing back the decoded audio, I\'ve managed to produce a variety of sounds from gurgling to screeching to demonic chants. The closest of which sounds similar to being pla
Looking to your code, I assume you missunderstood the meaning of "frame length". You are taking the number of bytes, but the frame length depends directly on how the file was encoded.
An audio file recorded at 48000 Hz has 48000 samples per second. This audio sample is usually a 16-bit integer (2 bytes), what means that you will have 48000 * 2 bytes per second in the non-encoded form (PCM-WAV).
An audio encoder like the opus will take multiple audio samples at once and encode them in a package. THIS is the frame. At 48 kHz these values could be for opus 120, 240, 480, 960, 1920, and 2880.