Google Big Query Error: CSV table encountered too many errors, giving up. Row: 1 errors: 1

前端未结

关注

 8  2350

I am trying to run a query on a 12 GB csv file loaded in Google big query, I cant run any query on the dataset. I am not sure if the dataset is loaded correctly. It shows a

相关标签:

8条回答

陌清茗

2020-12-04 02:53
job.errors contains detailed errors for the job.

This doesn't appear to be documented anywhere, but you can see it in the source code: https://googlecloudplatform.github.io/google-cloud-python/0.20.0/_modules/google/cloud/bigquery/job.html and ctrl+f for _AsyncJob.

So your wait_for_job code could look like this:
```
def wait_for_job(job):
    while True:
        job.reload()
        if job.state == 'DONE':
            if job.error_result:
                raise RuntimeError(job.errors)
            return
        time.sleep(1)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
长情又很酷

2020-12-04 02:55

To get more info on the errors try this from the CLI:

>bq show -j <jobid>

It prints the status and/or detailed error information.

To list all the jobids: bq ls -j

0 讨论(0)
发布评论:

提交评论
- 加载中...
死守一世寂寞

2020-12-04 02:55

I had the same issue following the instructions in the GCP docs.

It failed on the second bq load, but not the first.

I found that repeating the job in the BigQuery web interface selecting the ignore unknown values option.

I have not spotted any errors with the data yet, but just getting started looking at it.

0 讨论(0)
发布评论:

提交评论
- 加载中...
名媛妹妹

2020-12-04 02:57

Another trick: If you use csv files with a header line and want to load with a defined schema, you need to add option --skip_leading_rows=1 to submit command(example: bq load --skip_leading_rows=1 --source_format=CSV ...).

Without this option, Bigquery will parse your first row(header line) as an data row, may lead to TYPE MISMATCH ERROR (your defined schema of a column is FLOAT, but its column name is STRING, and bq load command parses your column name as a FLOAT value).

0 讨论(0)
发布评论:

提交评论
- 加载中...
温柔的废话

2020-12-04 02:57

Seems to be a known bug @google. The already made the fix, but did not push it in production. https://code.google.com/p/google-bigquery/issues/detail?id=621

0 讨论(0)
发布评论:

提交评论
- 加载中...
面向向阳花

2020-12-04 03:01
So it looks like you're querying against a CSV file that hasn't been loaded into BigQuery, it is just being pointed to by a federated table that lives in Google Cloud Storage.

It looks like there were errors in the underlying CSV file:
```
Too many value in row starting at position:11398444388 in file:gs://syntheticpopulation-storage/Alldatamerged_Allgrps.csv
Too many value in row starting at position:9252859186 in file:gs://syntheticpopulation-storage/Alldatamerged_Allgrps.csv
...
```
Please let me know if this is enough to diagnose the issue. I believe you can see those messages as warnings on the query job if you look at the query history.

I've filed three bugs internally:
1. Poor grammar in the error message.
2. Error messages stemming from problems in federated tables are not diagnosable because they don't tell you what table has the problem.
3. Error messages from problems in federated tables aren't actionable in the UI, since the information about what went wrong is in the warning stream, which is not displayed.
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页