Max file count using big query data transfer job

耗尽温柔 提交于 2020-07-23 06:40:27

问题


I have about 54 000 files in my GCP bucket. When I try to schedule a big query data transfer job to move files from GCP bucket to big query, I am getting the following error:

Error code 9 : Transfer Run limits exceeded. Max size: 15.00 TB. Max file count: 10000. Found: size = 267065994 B (0.00 TB) ; file count = 54824.

I thought the max file count was 10 million.


回答1:


I think that BigQuery transfer service lists all the files matching the wildcard and then use the list to load them. So it will be same that providing the full list to bq load ... therefore reachinh the 10,000 URIs limit. This is probably necessary because BigQuery transfer service will skip already loaded files, so it needs to look them one by one to decide which to actually load.

I think that your only option is to schedule a job yourself and load them directly into BigQuery. For example using Cloud Composer or writing a little cloud run service that can be invoked by Cloud Scheduler.




回答2:


The Error message Transfer Run limits exceeded as mentioned before is related to a known limit for Load jobs in BigQuery. Unfortunately this is a hard limit and cannot be changed. There is an ongoing Feature Request to increase this limit but for now there is no ETA for it to be implemented.

The main recommendation for this issue is to split a single operation in multiple processes that will send data in requests that don't exceed this limit. With this we could cover the main question: "Why I see this Error message and how to avoid it?".

Is is normal to ask now "how to automate or perform these actions easier?" I can think of involve more products:

  • Dataflow, which will help you to process the data that will be added to BigQuery. Here is where you can send multiple requests.

  • Pub/Sub, will help to listen to events and automate the times where the processing will start.

Please, take a look at this suggested implementation where the aforementioned scenario is wider described.

Hope this is helpful! :)



来源:https://stackoverflow.com/questions/62495378/max-file-count-using-big-query-data-transfer-job

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!