I am new to using google colaboratory (colab) and pydrive along with it. I am trying to load data in 'CAS_num_strings' which was written in a pickle file in a specific directory on my google drive using colab as:
pickle.dump(CAS_num_strings,open('CAS_num_strings.p', 'wb'))
dump_meta = {'title': 'CAS.pkl', 'parents': [{'id':'1UEqIADV_tHic1Le0zlT25iYB7T6dBpBj'}]}
pkl_dump = drive.CreateFile(dump_meta)
pkl_dump.SetContentFile('CAS_num_strings.p')
pkl_dump.Upload()
print(pkl_dump.get('id'))
Where 'id':'1UEqIADV_tHic1Le0zlT25iYB7T6dBpBj' makes sure that it has a specific parent folder with this given by this id. The last print command gives me the output:
'1ZgZfEaKgqGnuBD40CY8zg0MCiqKmi1vH'
Hence, I am able to create and dump the pickle file whose id is '1ZgZfEaKgqGnuBD40CY8zg0MCiqKmi1vH'. Now, I want to load this pickle file in another colab script for a different purpose. In order to load, I use the command set:
cas_strings = drive.CreateFile({'id':'1ZgZfEaKgqGnuBD40CY8zg0MCiqKmi1vH'})
print('title: %s, mimeType: %s' % (cas_strings['title'], cas_strings['mimeType']))
print('Downloaded content "{}"'.format(cas_strings.GetContentString()))
This gives me the output:
title: CAS.pkl, mimeType: text/x-pascal
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-9-a80d9de0fecf> in <module>()
30 cas_strings = drive.CreateFile({'id':'1ZgZfEaKgqGnuBD40CY8zg0MCiqKmi1vH'})
31 print('title: %s, mimeType: %s' % (cas_strings['title'], cas_strings['mimeType']))
---> 32 print('Downloaded content "{}"'.format(cas_strings.GetContentString()))
33
34
/usr/local/lib/python3.6/dist-packages/pydrive/files.py in GetContentString(self, mimetype, encoding, remove_bom)
192 self.has_bom == remove_bom:
193 self.FetchContent(mimetype, remove_bom)
--> 194 return self.content.getvalue().decode(encoding)
195
196 def GetContentFile(self, filename, mimetype=None, remove_bom=False):
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte
As you can see, it finds the file CAS.pkl but cannot decode the data. I want to be able to resolve this error. I understand that the normal utf-8 encoding/decoding works smoothly during normal pickle dumping and loading with the 'wb' and 'rb' options. However in the present case, after dumping I can't seem to load it from the pickle file in google drive created in the previous step. The error exists somewhere in me not being able to specify how to decode the data at "return self.content.getvalue().decode(encoding)". I can't seem to find from here (https://developers.google.com/drive/v2/reference/files#resource-representations) which keywords/metadata tags to modify. Any help is appreciated. Thanks
The problem is that GetContentString
only works if the contents are a valid UTF-8 string (docs), and your pickle is not.
Unfortunately, you'll have to do a little extra work, since there's no GetContentBytes
-- you have to save the contents to a file and read them back out. Here's a working example:
https://colab.research.google.com/drive/1gmh21OrJL0Dv49z28soYq_YcqKEnaQ1X
Actually, I found an elegant answer with a little help from my friends. Instead of GetContentStrings, I use GetContentFile, which is the counterpart of the SetContentFile. This loads the file in the current workspace from which I can read it like any pickle file. Finally, the data gets loaded into cas_nums all well.
cas_strings = drive.CreateFile({'id':'1ZgZfEaKgqGnuBD40CY8zg0MCiqKmi1vH'})
print('title: %s, mimeType: %s' % (cas_strings['title'], cas_strings['mimeType']))
cas_strings.GetContentFile(cas_strings['title'])
cas_nums = pickle.load(open(cas_strings['title'],'rb'))
More details about this can be found in the pydrive documentation in the section download file content - http://pythonhosted.org/PyDrive/filemanagement.html#download-file-content
来源:https://stackoverflow.com/questions/49145328/unicodedecodeerror-utf-8-codec-cant-decode-byte-0x80-while-loading-pickle