I am interfacing with a Java application via Python. I need to be able to construct byte sequences which contain UTF-8 strings. Java uses a modified UTF-8 encoding in
I know this question is very very old, but I still want to contribute, since I got in the same problem and solved it
I found the implementation of this modified utf8 in the openjdk sources and translated it to python. here is a link to the gist i created.