问题
I have a server that encodes real-time voice into mono or stereo mp3 thanks to libmp3lame and sends it chunk by chunk through a WebSocket.
I'm trying to make an Android App that receives those mp3 chunks and play them with the most appropriate Audio player Android have. I went with AudioTrack since it seems pretty easy to add chunks to the player as well as "stream" oriented. (Since what I'm doing is sending to the track some byte array and not a full song that is locally stocked in the Android phone).
Since AudioTrack does not support compressed audio format (such as MP3), I have to decode those chunks into PCM to play them afterward. I'm using the famous JLayer to do this real-time decoding. Thanks to that, I can play each sample into my AudioTrack and hear what the server is sending.
My problem is that the received/player audio is badly hashed. (I can understand whatever the speaker is saying perfectly, but the quality is bad, like if the speaker had a "robotic voice").
Here is the code I'm using to receive/decode/play those byte[].
public void addSample(byte[] data) throws BitstreamException, DecoderException, IOException {
// JLayer decoder
Decoder decoder = new Decoder();
// Input Stream with the byte[] voice data
InputStream bis = new ByteArrayInputStream(data);
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
Bitstream bits = new Bitstream(bis);
// Decoding MP3 data into PCM in a PCM BUFFER
SampleBuffer pcmBuffer = (SampleBuffer) decoder.decodeFrame(bits.readFrame(), bits);
// Sending the PCMBuffer data into Audio Track to play it
mTrack.write(pcmBuffer.getBuffer(), 0, pcmBuffer.getBufferLength());
bits.closeFrame();
}
And here is my AudioTrack initialization
mTrack= new AudioTrack.Builder()
.setAudioAttributes(new AudioAttributes.Builder()
.setUsage(AudioAttributes.USAGE_MEDIA)
.setContentType(AudioAttributes.CONTENT_TYPE_SPEECH)
.build())
.setAudioFormat(new AudioFormat.Builder()
.setEncoding(AudioFormat.ENCODING_PCM_16BIT)
.setSampleRate(48000)
.setChannelMask(AudioFormat.CHANNEL_OUT_STEREO)
.build())
.setBufferSizeInBytes(AudioTrack.getMinBufferSize(48000, AudioFormat.CHANNEL_OUT_STEREO, AudioFormat.ENCODING_PCM_16BIT))
.build();
mTrack.play();
So to understand what was happening I tried to lag each data contained in the pcmBuffer. It seems like a huge part of those data where 0 at the very beginning of the buffer (I'd say 1/5 of the buffer is 0, all of them located at the beginning). So then I took an oscilloscope and tried to get the signal my Android phone was receiving. Here is the result:
As you can see, each frame is present, but as some "blank" or 0 data values. Those 0 in the beginning of each frame makes the signal hashed and pretty annoying to listen.
I have no idea whether this comes from the MP3 signal itself, the way I'm playing it, AudioTrack, JLayer, or the way I'm decoding it. So if anyone has an idea it would be really awesome.
EDIT :
Found out something interesting. By decoding each frame header I can have access to a lot of information such as the time in ms for each frame. I logged it :
System.out.println(bits.readFrame().ms_per_frame());
I found out that each of my frames are 24ms. When I look back at the oscilloscope, I can see that each frame actually take 24ms, but the beginning/end of each frame is filled with 0. So first of all, is it a decoding problem ? If it is not, how can I have a clear signal without small breakup in each frame ? I've been printing all the data that each frame is sending me, each frame starts with a looot of zeros. How am I supposed to have a clear signal if each frame have some kind of audio void ?
If I print the MP3 data that I'm receiving each frame (96 bits), I have the first four bytes (probably the header?) that always have the same value : "-1, -5, 20, -60" Then I have a fifth bit that is always equal to 0, and sometimes a sixth bit that is also equal to 0. Should I be removing those ?
来源:https://stackoverflow.com/questions/47414321/android-mp3-jlayer-missing-data