Apparently, the following is the valid syntax:
my_string = b\'The string\'
I would like to know:
b
In addition to what others have said, note that a single character in unicode can consist of multiple bytes.
The way unicode works is that it took the old ASCII format (7-bit code that looks like 0xxx xxxx) and added multi-bytes sequences where all bytes start with 1 (1xxx xxxx) to represent characters beyond ASCII so that Unicode would be backwards-compatible with ASCII.
>>> len('Öl') # German word for 'oil' with 2 characters
2
>>> 'Öl'.encode('UTF-8') # convert str to bytes
b'\xc3\x96l'
>>> len('Öl'.encode('UTF-8')) # 3 bytes encode 2 characters !
3
To quote the Python 2.x documentation:
A prefix of 'b' or 'B' is ignored in Python 2; it indicates that the literal should become a bytes literal in Python 3 (e.g. when code is automatically converted with 2to3). A 'u' or 'b' prefix may be followed by an 'r' prefix.
The Python 3 documentation states:
Bytes literals are always prefixed with 'b' or 'B'; they produce an instance of the bytes type instead of the str type. They may only contain ASCII characters; bytes with a numeric value of 128 or greater must be expressed with escapes.
You can use JSON to convert it to dictionary
import json
data = b'{"key":"value"}'
print(json.loads(data))
{"key":"value"}
FLASK:
This is an example from flask. Run this on terminal line:
import requests
requests.post(url='http://localhost(example)/',json={'key':'value'})
In flask/routes.py
@app.route('/', methods=['POST'])
def api_script_add():
print(request.data) # --> b'{"hi":"Hello"}'
print(json.loads(request.data))
return json.loads(request.data)
{'key':'value'}
It turns it into a bytes
literal (or str
in 2.x), and is valid for 2.6+.
The r
prefix causes backslashes to be "uninterpreted" (not ignored, and the difference does matter).
Python 3.x makes a clear distinction between the types:
str
= '...'
literals = a sequence of Unicode characters (Latin-1, UCS-2 or UCS-4, depending on the widest character in the string)bytes
= b'...'
literals = a sequence of octets (integers between 0 and 255)If you're familiar with:
str
as String
and bytes
as byte[]
;str
as NVARCHAR
and bytes
as BINARY
or BLOB
;str
as REG_SZ
and bytes
as REG_BINARY
.If you're familiar with C(++), then forget everything you've learned about char
and strings, because a character is not a byte. That idea is long obsolete.
You use str
when you want to represent text.
print('שלום עולם')
You use bytes
when you want to represent low-level binary data like structs.
NaN = struct.unpack('>d', b'\xff\xf8\x00\x00\x00\x00\x00\x00')[0]
You can encode a str
to a bytes
object.
>>> '\uFEFF'.encode('UTF-8')
b'\xef\xbb\xbf'
And you can decode a bytes
into a str
.
>>> b'\xE2\x82\xAC'.decode('UTF-8')
'€'
But you can't freely mix the two types.
>>> b'\xEF\xBB\xBF' + 'Text with a UTF-8 BOM'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: can't concat bytes to str
The b'...'
notation is somewhat confusing in that it allows the bytes 0x01-0x7F to be specified with ASCII characters instead of hex numbers.
>>> b'A' == b'\x41'
True
But I must emphasize, a character is not a byte.
>>> 'A' == b'A'
False
Pre-3.0 versions of Python lacked this kind of distinction between text and binary data. Instead, there was:
unicode
= u'...'
literals = sequence of Unicode characters = 3.x str
str
= '...'
literals = sequences of confounded bytes/characters
struct.pack
output.In order to ease the 2.x-to-3.x transition, the b'...'
literal syntax was backported to Python 2.6, in order to allow distinguishing binary strings (which should be bytes
in 3.x) from text strings (which should be str
in 3.x). The b
prefix does nothing in 2.x, but tells the 2to3
script not to convert it to a Unicode string in 3.x.
So yes, b'...'
literals in Python have the same purpose that they do in PHP.
Also, just out of curiosity, are there more symbols than the b and u that do other things?
The r
prefix creates a raw string (e.g., r'\t'
is a backslash + t
instead of a tab), and triple quotes '''...'''
or """..."""
allow multi-line string literals.
The b denotes a byte string.
Bytes are the actual data. Strings are an abstraction.
If you had multi-character string object and you took a single character, it would be a string, and it might be more than 1 byte in size depending on encoding.
If took 1 byte with a byte string, you'd get a single 8-bit value from 0-255 and it might not represent a complete character if those characters due to encoding were > 1 byte.
TBH I'd use strings unless I had some specific low level reason to use bytes.