字符编码与转码 | 易学教程

需知

在python2默认编码是ASCII, python3里默认是unicode
在py3中encode,在转码的同时还会把string 变成bytes类型，decode在解码的同时还会把bytes变回string

转换原则

在这里插入图片描述

所有的编码都需要unicode作为中介来转换
utf-8转换程gb2312
首先通过解码【decode】转换成unicode编码
其次通过编码【encode】转换成gb2312编码
gb2312转换程utf-8
首先通过解码【decode】转换成unicode编码
其次通过编码【encode】转换成utf-8编码

实战（python3）

import sys,time
print('系统默认\t',sys.getdefaultencoding())            #系统默认编码
str = '庆余年很好看哈'                                  #字符串的编码是unicode
str_utf8 = str.encode('utf-8')
str_gb2312 = str_utf8.decode('utf-8').encode('gb2312')  #通过unicode转换
str_gbk = str.encode('gbk')
print('unicode\t',str)
print('utf-8\t',str_utf8)
print('gb2312\t',str_gb2312)
print('gbk\t\t',str_gbk)

运行结果：

系统默认  utf-8
unicode  庆余年很好看哈
utf-8  b'\xe5\xba\x86\xe4\xbd\x99\xe5\xb9\xb4\xe5\xbe\x88\xe5\xa5\xbd\xe7\x9c\x8b\xe5\x93\x88'
gb2312  b'\xc7\xec\xd3\xe0\xc4\xea\xba\xdc\xba\xc3\xbf\xb4\xb9\xfe'
gbk   b'\xc7\xec\xd3\xe0\xc4\xea\xba\xdc\xba\xc3\xbf\xb4\xb9\xfe'

实战（python2）

import sys
#*-* coding=utf-8 *-*                       

str = "我爱北京天安门"                      #该字符串的编码是utf-8
print(sys.getdefaultencoding())             #获取默认编码
str_unicode = str.decode('utf-8')           #uft-8转换成unicode
str_gb2312 = str_unicode.encode('gb2312')   #unicode转换成gb2312
print(str_unicode)
print(str_gb2312)

运行结果：

ascii
我爱北京天安门
我爱北京天安门

备注

1.python3中字符串的编码方式是unicode，即使你更改系统的编码方式，依旧是unicode
2.python2中默认编码是ascii，需要更改系统的编码方式，声明方式如下

#*-* coding=utf-8 *-*

3.python2中字符串的编码是跟声明的编码一致的。

来源：CSDN

作者：小嘿黑15斤

链接：https://blog.csdn.net/weixin_45590490/article/details/103662490

标签

unicode

gb2312

字符编码

转码

编码转换

python3

utf8