When I was trying to parse xml using sax over sockets I came across a strange occurence. Upon analysing I noticed that DataOutputStream adds 2 bytes in front of my data.
The output of DataOutputStream.writeUTF()
is a custom format, intended to be read by DataInputStream.readUTF()
.
The javadocs of the writeUTF
method you are calling say:
Writes a string to the underlying output stream using modified UTF-8 encoding in a machine-independent manner.
First, two bytes are written to the output stream as if by the
writeShort
method giving the number of bytes to follow. This value is the number of bytes actually written out, not the length of the string. Following the length, each character of the string is output, in sequence, using the modified UTF-8 encoding for the character. If no exception is thrown, the counterwritten
is incremented by the total number of bytes written to the output stream. This will be at least two plus the length ofstr
, and at most two plus thrice the length ofstr
.
Always use the same type of stream when reading and writing data. If you are feeding the stream directly into a sax parser, then you should not use a DataOutputStream.
Just use
BufferedOutputStream bos = new BufferedOutputStream(socket.getOutputStream());
bos.write(os.getBytes("UTF-8"));