Java JNI: Passing multibyte characters from java to c

我与影子孤独终老i 提交于 2019-12-11 18:40:52

问题


I'm once again messing around with the java natve interface, and I've runned into another interesting problem. I'm sending a filepath to c via jni and then doing some I/O. So the most common chars I have troubles with is 'äåö'. Here is a short demo of a program with the exact same problem:

Java:

public class java {

  private static native void printBytes(String text);
  static{
    System.loadLibrary("dll");
  }

  public static void main(String[] args){
    printBytes("C:/Users/ä-å-ö/Documents/Bla.txt");
  }
}

C:

#include "java.h"
#include <jni.h>

JNIEXPORT void JNICALL Java_java_printBytes(JNIEnv *env, jclass class, jstring text){
  const jbyte* text_input = (*env)->GetStringUTFChars(env, text, 0);
  jsize size = (*env)->GetStringUTFLength(env, text);
  int i = 0;
  printf("%s\n",text_input);
  (*env)->ReleaseStringUTFChars(env, text, text_input);
}

Output: C:/Users/├ñ-├Ñ-├Â/Documents/Bla.txt

This is NOT my desired result, I would like it to output the same string as in java.


回答1:


You are dealing with platform specific character encoding issues. Although the standard c printf should be able to handle multibyte (utf-8) encoded strings the windows/msvc provided one is anything but standard and cannot. On a non-windows standard conforming platform would expect your code would work. The string coming from java is in UTF-8 (multibyte char) and the MS printf is expecting a ASCII (single byte per char). This is working for ASCII characters because in UTF-8 those characters have the same value. It does not work for characters outside of ASCII.

Basically you need to either convert your string to wide characters (text.getBytes(Charset.forName(UTF-16LE"))) and pass it as an array from java to c or convert the multibyte string to wide characters in c after receiving it (MultiByteToWideChar(CP_UTF8, ...)). Then you can use printf("%S") or wprintf("%s") to output it.

See Printing UTF-8 strings with printf - wide vs. multibyte string literals for more information. Also note that the answer says you have to set unicode output mode with _setmode if you want unicode output on the windows console.

Also note that I don't believe GetStringUTFLength guarantees a NUL terminator but it's been too long.



来源:https://stackoverflow.com/questions/22054617/java-jni-passing-multibyte-characters-from-java-to-c

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!