Big Query job fails with “Bad character (ASCII 0) encountered.”

后端 未结 3 581
孤街浪徒
孤街浪徒 2021-01-22 17:55

I have a job that is failing with the error

Line:14222274 / Field:1, Bad character (ASCII 0) encountered. Rest of file not processed.

<
相关标签:
3条回答
  • 2021-01-22 18:36

    I had similar problems, trying to load in BigQuery a compressed file (saved it in Google Cloud Storage). These are the logs:

    File: 0 / Offset:4563403089 / Line:328480 / Field:21: Bad character (ASCII 0) encountered: field starts with:  (error code: invalid)
    File: 0 / Offset:4563403089 / Line:328485 / Field:21: Bad character (ASCII 0) encountered: field starts with:  (error code: invalid)
    File: 0 / Offset:4563403089 / Line:328490 / Field:21: Bad character (ASCII 0) encountered: field starts with:  (error code: invalid)
    File: 0 / Offset:4563403089 / Line:328511 / Field:21: Bad character (ASCII 0) encountered: field starts with:  (error code: invalid)
    File: 0 / Offset:4563403089 / Line:328517 / Field:21: Bad character (ASCII 0) encountered: field starts with:  (error code: invalid) 
    

    For resolve the problem what I've done is remove the ASCII 0 characters from the compressed file. To do it, I executed the following commnad from an instance of Compute Engine with the sdk installed:

    gsutil cp gs://bucket_987234/compress_file.gz - | gunzip | tr -d '\000' | gsutil cp - gs://bucket_987234/uncompress_and_clean_file

    By using pipelines, I avoid having to have all storage on the hard disk (1G compress + 52G uncompress). The first program gets the compressed file from Storage, the second decompresses it, the thrid removes the ASCII 0 characters and the fourth program updaloads the result to Storage.

    I don't compress the result when I upload again to Storage, because for BigQuery is faster load a uncompressed file. After that I can load on BigQuery the data without problems.

    0 讨论(0)
  • 2021-01-22 18:46

    When you compress what utility did you use?.

    I saw this issue when i compressed my csv file in ZIP format ( in windows) . Google BigQuery seems to accept only gzip format.

    Make sure to compress your CSV using gzip. If you are in windows 7-zip is a great utility which allows you to compress in gzip.

    In Unix gzip is standard.

    0 讨论(0)
  • 2021-01-22 18:47

    Bad character (ASCII 0) encountered. Rest of file not processed.

    Clearly states you have a UTF-16 character there which cannot be decoded. BigQuery service only supports UTF-8 and latin1 text encodings. So, the file is supposed to be UTF-8 encoded.

    There are only 14222273 lines in the file, so the line number that is printed in the error message is one line past the end of the file.

    Probably you have a UTF-16 encoded tab character at the end of the file, which cannot be decoded.


    Solution: Use the -a or --ascii flag with gzip command. It'll be decoded ok by bigquery.

    0 讨论(0)
提交回复
热议问题