Big Query job fails with “Bad character (ASCII 0) encountered.”

后端未结

关注

 3  652

孤街浪徒

I have a job that is failing with the error

Line:14222274 / Field:1, Bad character (ASCII 0) encountered. Rest of file not processed.

相关标签:

3条回答

难免孤独

2021-01-22 18:36
I had similar problems, trying to load in BigQuery a compressed file (saved it in Google Cloud Storage). These are the logs:
```
File: 0 / Offset:4563403089 / Line:328480 / Field:21: Bad character (ASCII 0) encountered: field starts with:  (error code: invalid)
File: 0 / Offset:4563403089 / Line:328485 / Field:21: Bad character (ASCII 0) encountered: field starts with:  (error code: invalid)
File: 0 / Offset:4563403089 / Line:328490 / Field:21: Bad character (ASCII 0) encountered: field starts with:  (error code: invalid)
File: 0 / Offset:4563403089 / Line:328511 / Field:21: Bad character (ASCII 0) encountered: field starts with:  (error code: invalid)
File: 0 / Offset:4563403089 / Line:328517 / Field:21: Bad character (ASCII 0) encountered: field starts with:  (error code: invalid) 
```
For resolve the problem what I've done is remove the ASCII 0 characters from the compressed file. To do it, I executed the following commnad from an instance of Compute Engine with the sdk installed:
```
gsutil cp gs://bucket_987234/compress_file.gz - | gunzip | tr -d '\000' | gsutil cp - gs://bucket_987234/uncompress_and_clean_file
```
By using pipelines, I avoid having to have all storage on the hard disk (1G compress + 52G uncompress). The first program gets the compressed file from Storage, the second decompresses it, the thrid removes the ASCII 0 characters and the fourth program updaloads the result to Storage.

I don't compress the result when I upload again to Storage, because for BigQuery is faster load a uncompressed file. After that I can load on BigQuery the data without problems.
0 讨论(0)
发布评论:

提交评论
- 加载中...
佛祖请我去吃肉

2021-01-22 18:46

When you compress what utility did you use?.

I saw this issue when i compressed my csv file in ZIP format ( in windows) . Google BigQuery seems to accept only gzip format.

Make sure to compress your CSV using gzip. If you are in windows 7-zip is a great utility which allows you to compress in gzip.

In Unix gzip is standard.

0 讨论(0)
发布评论:

提交评论
- 加载中...
暗喜

2021-01-22 18:47

Bad character (ASCII 0) encountered. Rest of file not processed.

Clearly states you have a UTF-16 character there which cannot be decoded. BigQuery service only supports UTF-8 and latin1 text encodings. So, the file is supposed to be UTF-8 encoded.

There are only 14222273 lines in the file, so the line number that is printed in the error message is one line past the end of the file.

Probably you have a UTF-16 encoded tab character at the end of the file, which cannot be decoded.

Solution: Use the -a or --ascii flag with gzip command. It'll be decoded ok by bigquery.

0 讨论(0)
发布评论:

提交评论
- 加载中...