Google Big Query Error: CSV table encountered too many errors, giving up. Row: 1 errors: 1

前端 未结 8 2350
星月不相逢
星月不相逢 2020-12-04 02:26

I am trying to run a query on a 12 GB csv file loaded in Google big query, I cant run any query on the dataset. I am not sure if the dataset is loaded correctly. It shows a

相关标签:
8条回答
  • 2020-12-04 02:53

    job.errors contains detailed errors for the job.

    This doesn't appear to be documented anywhere, but you can see it in the source code: https://googlecloudplatform.github.io/google-cloud-python/0.20.0/_modules/google/cloud/bigquery/job.html and ctrl+f for _AsyncJob.

    So your wait_for_job code could look like this:

    def wait_for_job(job):
        while True:
            job.reload()
            if job.state == 'DONE':
                if job.error_result:
                    raise RuntimeError(job.errors)
                return
            time.sleep(1)
    
    0 讨论(0)
  • 2020-12-04 02:55

    To get more info on the errors try this from the CLI:

    >bq show -j <jobid>

    It prints the status and/or detailed error information.

    To list all the jobids: bq ls -j

    0 讨论(0)
  • 2020-12-04 02:55

    I had the same issue following the instructions in the GCP docs.

    It failed on the second bq load, but not the first.

    I found that repeating the job in the BigQuery web interface selecting the ignore unknown values option.

    I have not spotted any errors with the data yet, but just getting started looking at it.

    0 讨论(0)
  • 2020-12-04 02:57

    Another trick: If you use csv files with a header line and want to load with a defined schema, you need to add option --skip_leading_rows=1 to submit command(example: bq load --skip_leading_rows=1 --source_format=CSV ...).

    Without this option, Bigquery will parse your first row(header line) as an data row, may lead to TYPE MISMATCH ERROR (your defined schema of a column is FLOAT, but its column name is STRING, and bq load command parses your column name as a FLOAT value).

    0 讨论(0)
  • 2020-12-04 02:57

    Seems to be a known bug @google. The already made the fix, but did not push it in production. https://code.google.com/p/google-bigquery/issues/detail?id=621

    0 讨论(0)
  • 2020-12-04 03:01

    So it looks like you're querying against a CSV file that hasn't been loaded into BigQuery, it is just being pointed to by a federated table that lives in Google Cloud Storage.

    It looks like there were errors in the underlying CSV file:

    Too many value in row starting at position:11398444388 in file:gs://syntheticpopulation-storage/Alldatamerged_Allgrps.csv
    Too many value in row starting at position:9252859186 in file:gs://syntheticpopulation-storage/Alldatamerged_Allgrps.csv
    ...
    

    Please let me know if this is enough to diagnose the issue. I believe you can see those messages as warnings on the query job if you look at the query history.

    I've filed three bugs internally:

    1. Poor grammar in the error message.
    2. Error messages stemming from problems in federated tables are not diagnosable because they don't tell you what table has the problem.
    3. Error messages from problems in federated tables aren't actionable in the UI, since the information about what went wrong is in the warning stream, which is not displayed.
    0 讨论(0)
提交回复
热议问题