Converting from bytes to French text in Python
问题 I am cleaning the monolingual corpus of Europarl for French (http://data.statmt.org/wmt19/translation-task/fr-de/monolingual/europarl-v7.fr.gz). The original raw data in .gz file (I downloaded using wget ). I want to extract the text and see how it looks like in order to further process the corpus. Using the following code to extract the text from gzip , I obtained data with the class being bytes . with gzip.open(file_path, 'rb') as f_in: print('type(f_in)=', type(f_in)) text = f_in.read()