Understanding GCP Dataproc billing and how it is affected by labels

江枫思渺然 提交于 2019-12-24 00:59:51

问题


I'm trying to make sure I have a clear understanding of how my organisation gets billed for Google Cloud Platform Dataproc.

We have exported our billing history to BigQuery so that we can analyse it. This morning we had two dataproc clusters running and the screenshot below shows a subset of the billing history for those two clusters. I have filtered on labels.key = "goog-dataproc-cluster-uuid" or labels.key = "goog-dataproc-cluster-name" or labels.key = "goog-dataproc-location". Here is a subset of the results

I've drawn boxes around the costs for two kinds of sku. Lets's take a look at the Standard Intel N1 16 VCPU running in EMEA items.

I only have two clusters yet for each of those two clusters there are three lines. The reason is that there are three labels applied to each dataproc cluster, hence the costs 1.271852 & 3.815556 appear three times each.

My simple question then is...how do I get the total cost of my dataproc clusters? Do I add up all of these numbers (thus implying that the total cost is split equally over all of the labels) or do I take just one of the values (implying that the cost is repeated for each label)?


Here's another way of phrasing my question. Does this query give the total cost of running cluster data-dev-dataplatform-dataproc for one day:

SELECT  sum(cost)
FROM [dh-billing-179310:billing.gcp_billing_export_XXXXXXXX] 
WHERE labels.key = "goog-dataproc-cluster-name"
  and labels.value = "data-dev-dataplatform-dataproc" 
  and usage_start_time >= "2018-07-05 00:00:00"
  and usage_end_time <= "2018-07-06 00:00:00"

or do I need to include other labels in order to get the total cost?


回答1:


In that flattened view of billing export data, the cost is repeated for each label; you should pick a single label value for any particular calculation. If you're trying to calculate the Dataproc total, it's probably most convenient to use one of the Dataproc-inserted "goog-dataproc-*" labels.

The idea here is that you can use different sets of labels to easily organize your total Dataproc-related costs attributed to any given subproject, so that you can then filter your billing queries along different dimensions.



来源:https://stackoverflow.com/questions/51213085/understanding-gcp-dataproc-billing-and-how-it-is-affected-by-labels

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!