The biggest chunk of my BigQuery billing comes from query consumption. I am trying to optimize this by understanding which datasets/tables consume the most.
I am the
It might be easier to use the INFORMATION_SCHEMA.JOBS_BY_*
views because you don't have to set up the stackdriver logging and can use them right away.
Example taken & modified from How to monitor query costs in Google BigQuery
DECLARE gb_divisor INT64 DEFAULT 1024*1024*1024;
DECLARE tb_divisor INT64 DEFAULT gb_divisor*1024;
DECLARE cost_per_tb_in_dollar INT64 DEFAULT 5;
DECLARE cost_factor FLOAT64 DEFAULT cost_per_tb_in_dollar / tb_divisor;
SELECT
ROUND(SUM(total_bytes_processed) / gb_divisor,2) as bytes_processed_in_gb,
ROUND(SUM(IF(cache_hit != true, total_bytes_processed, 0)) * cost_factor,4) as cost_in_dollar,
user_email,
FROM (
(SELECT * FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_USER)
UNION ALL
(SELECT * FROM `other-project.region-us`.INFORMATION_SCHEMA.JOBS_BY_USER)
)
WHERE
DATE(creation_time) BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY) and CURRENT_DATE()
GROUP BY
user_email
Some caveats:
UNION ALL
all of the projects that you use explicitlyJOBS_BY_USER
did not work for me on my private account (supposedly because me login email is @googlemail and big query stores my email as @gmail`)WHERE
condition needs to be adjusted for your billing period (instead of the last 30 days)DECLARE cost_per_tb_in_dollar INT64 DEFAULT 5;
reflects only US costs - other regions might have different costs - see https://cloud.google.com/bigquery/pricing#on_demand_pricing