I have embedded a Python interpreter in a C program. Suppose the C program reads some bytes from a file into a char array and learns (somehow) that the bytes represent text
Try calling PyErr_Print() in the "if (!py_string)
" clause. Perhaps the python exception will give you some more information.
PyString_Decode does this:
PyObject *PyString_Decode(const char *s,
Py_ssize_t size,
const char *encoding,
const char *errors)
{
PyObject *v, *str;
str = PyString_FromStringAndSize(s, size);
if (str == NULL)
return NULL;
v = PyString_AsDecodedString(str, encoding, errors);
Py_DECREF(str);
return v;
}
IOW, it does basically what you're doing in your second example - converts to a string, then decode the string. The problem here arises from PyString_AsDecodedString, rather than PyString_AsDecodedObject. PyString_AsDecodedString does PyString_AsDecodedObject, but then tries to convert the resulting unicode object into a string object with the default encoding (for you, looks like that's ASCII). That's where it fails.
I believe you'll need to do two calls - but you can use PyString_AsDecodedObject rather than calling the python "decode" method. Something like:
#include <Python.h>
#include <stdio.h>
int main(int argc, char *argv[])
{
char c_string[] = { (char)0x93, 0 };
PyObject *py_string, *py_unicode;
Py_Initialize();
py_string = PyString_FromStringAndSize(c_string, 1);
if (!py_string) {
PyErr_Print();
return 1;
}
py_unicode = PyString_AsDecodedObject(py_string, "windows_1252", "replace");
Py_DECREF(py_string);
return 0;
}
I'm not entirely sure what the reasoning behind PyString_Decode working this way is. A very old thread on python-dev seems to indicate that it has something to do with chaining the output, but since the Python methods don't do the same, I'm not sure if that's still relevant.
You don't want to decode the string into a Unicode representation, you just want to treat it as an array of bytes, right?
Just use PyString_FromString
:
char *cstring;
PyObject *pystring = PyString_FromString(cstring);
That's all. Now you have a Python str()
object. See docs here: https://docs.python.org/2/c-api/string.html
I'm a little bit confused about how to specify "str" or "unicode." They are quite different if you have non-ASCII characters. If you want to decode a C string and you know exactly what character set it's in, then yes, PyString_DecodeString
is a good place to start.