问题
Which is the best way to schedule BigQuery jobs?
BigQuery doesn't offer a direct approach, and the best I got from searching is using app engine cron service, but from what I understood I have to create a web application to use this service.
My use case is to do some aggregations over clicks and impressions, daily or weekly and use them in our admin portal.
I used Hive as a data warehouse before and Oozie as our scheduler.
Is there a way to accomplish the same logic with BigQuery?
回答1:
Unfortunately, there is no built in scheduler within BigQuery, although the engineering team takes requests! link.
However, there are a few interesting alternatives.
- As you mentioned, using the cron service from App Engine would absolutely work, and you could write a small, simple web service that would invoke the query you want on a regular cadence. This service will not be web facing, so the charges should remain extremely small.
- Apache Airflow is a service that I have been playing around with that is very promising; it allows you to define more complex data manipulation tasks across a variety of cloud services in Python and execute them on whatever cadence you choose. Very handy.
- Regular Cron - if you have a server available to you, you could just set up a basic cron job that uses the 'bq' command line tool to execute whatever queries you want and save the results to tables in BQ.
Hope that helps! I'm positive there are other options as well, just wanted to give you a few.
来源:https://stackoverflow.com/questions/46584097/cron-bigquery-jobs