问题
We built a java REST-API to receive event data (like click on a buy button) and write that data to HDFS. Essentially we open streams for every host that is sending data (in JSON) or use existing ones, enrich data with a timestamp, an event name and hostname and write it into (FS)DataOutputStream:
1 public synchronized void writeToFile(String filename, String hostname, String content) throws IOException {
2 FSDataOutputStream stream = registry.getStream(filename, hostname);
3 stream.writeBytes(content);
4 stream.hflush();
5 }
First, we used stream.writeChars(content)
in line 3, resulting in files like:
.{.".m.e.s.s.a.g.e.".:.".h.e.l.l.o.".}
Looking into the implementation of DataOutputStream.writeChars(String s), you see an 8-bit shift to the right and adding a leading x00 for every char, for reasons i don't understand.
Then I tried stream.writeUTF(content)
in line 3, files looked much better:
.W{"message":"hello"}
But still, a few bytes to many. Looking into the code, writeUTF(String s) sends the number of bytes in s first, and then the string itself. So .W
represents the number of bytes in the event data, proven when varying the length of the event data showed different leading chars in the file.
So my last resort, stream.writeBytes(content)
. Here everything looked fine:
{"message":"hello"}
until special characters came into play:
{"message":"hallöchen"}
became {"message":"hall.chen"}
. writeBytes cuts the leading 8 bits of the character before writing it. I think I need some UTF-8 functionality to write these chars correctly.
So, now I'm kind of lost. How can I solve that?
回答1:
When I read this: Why does DataOutputStream.writeUTF() add additional 2 bytes at the beginning? i felt like the mentioned FSDataOutputStream methods will not work for this. A quick (and maybe dirty) solution is this:
3 byte[] contentAsBytes = content.getBytes("UTF-8");
4 for (byte singleByte : contentAsBytes) {
5 stream.writeByte(singleByte);
6 }
A cleaner way would be not to use the FSDataOutputStream, but I couldn't find an alternative. Any hint is still appreciated.
回答2:
Have you tried wrapping the FSDataOutputStream in a java.io.PrintStream and using its print methods. It is a long shot but let me know if that works for you.
来源:https://stackoverflow.com/questions/19687576/unwanted-chars-written-from-java-rest-api-to-hadoopdfs-using-fsdataoutputstream