I am fetching data from a catalog and it\'s giving data in bytes format.
Bytes data:
b\'\\x80\\x00\\x00\\x00\\n\\x00\\x00%\\x83\\xa0\\x08\\x01\\x00\\
For this encoding error
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
or other like that, you just have to open the database file with .json extension and change the encoding to UTF-8 (for exemple in VScode, you can change it in right-bottom nav-bar) and save the file...
Now run
$ git status
you'll have something like this result
On branch master
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: store/dumps/store.json
(use "git add <file>..." to include in what will be committed)
.gitignore
no changes added to commit (use "git add" and/or "git commit -a")
or something like this one
On branch master
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
modified: store/dumps/store.json
Untracked files:
(use "git add <file>..." to include in what will be committed)
.gitignore
for the first case, you just have to do this one
$ git add store/dumps/
the second case don't need this previous part...
Now, for the two cases, you have to commit the changes with
$ git commit -m "launching to production"
the console will return you a message informed you for the adds and changes...
You have to build log for the app again with
$ git push heroku master
(for heroku users)
after the build, you just have to load the database again with
heroku run python manage.py loaddata store/dumps/store.json
it will install the objects./.
excuses for my english level !!!
You can try ignoring the non-readable blocks.
blobs.decode('utf-8', 'ignore')
It's not a great solution but the way you're generating the byte object has some issues. Maybe, utf-8
is not the proper encoding for your data.
The UTF-8 encoding has some built-in redundancy that serves at least two purposes:
Start bytes (in binary dots carrying actual data) match one of these 4 patterns
0.......
110.....
1110....
11110...
whereas continuation bytes (0 to 3) have always this form
10......
If this encoding is not respected, it is safe to say that it is not UTF-8 data, e.g. because corruptions occurred during a transfer.
Why is it possible to say that b'\x80\'
cannot be UTF-8?
Already at the first two bytes the encoding is violated: because 80 must be a continuation byte. This is exactly what your error message says:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte
And even if you skip this one, you get another problem some bytes later at b'%\x83'
, so it's most likely that either you are trying to decode the wrong data or assume the wrong encoding.