I am using Google Cloud Datalab for the first time to build a classifier for a Kaggle competition. But I am stuck trying to write a csv file containing the pre-processed training data to Cloud Storage using the google.datalab.storage API.
The file contains strings with unicode characters which causes the write_stream to a Storage object to trigger the error: Failed to process HTTP response.
Here is the simplified code only trying to write a single string:
from google.datalab import Context
import google.datalab.storage as storage
project = Context.default().project_id
bucket_name = project
bucket_object = storage.Bucket(bucket_name)
file_object = bucket_object.object('x.txt')
test_string = 'Congratulations from me as well, use the tools well. \xc2\xa0\xc2\xb7 talk'
#test_string = 'Congratulations from me as well, use the tools well. talk'
print type(test_string)
print len(test_string)
test_string = test_string.decode('utf-8')
print type(test_string)
print len(test_string)
test_string = test_string.encode('utf-8')
print type(test_string)
print len(test_string)
try:
file_object.write_stream(test_string, 'text/plain')
except Exception as e:
print e
Output:
<type 'str'>
62
<type 'unicode'>
60
<type 'str'>
62
Failed to process HTTP response.
If I use the string without the Unicode characters the Storage object is created and the string is written to the file. It makes no difference whether I am trying to write the Unicode decoded version or the string encoded one. The content type ('text/plain' or 'application/octet-stream') also makes no difference.
I would appreciate any help or idea how to solve this, especially since the google.datalab.storage API is barely documented (like most things GCP).
Thx.
来源:https://stackoverflow.com/questions/49122740/google-cloud-datalab-error-writing-to-cloud-storage