Google BigQuery Underlying Architecture

前端 未结 1 847
时光说笑
时光说笑 2021-02-04 11:05

So I just started messing around with Google BigQuery about 10 minutes ago, and I was wondering if anyone is aware of the underlying architecture that they\'re using to store th

相关标签:
1条回答
  • 2021-02-04 11:59

    There are no indexes... every query is a table scan. The query architecture is described here. Your data is stored in a proprietary columnar format called ColumnIO on Colossus (a successor to GFS). Colossus replicates the data within a datacenter and your data is also replicated to other geographic regions to make sure it stays available even if a Google datacenter goes offline.

    To answer your specific questions

    • While data may be temporarily stored in Bigtable, all data is stored long-term in Colossus (for now!).
    • New data added to bigquery is encrypted at rest (that is, whenever it is written out to permanent storage). It is also encrypted when sent over the network.
    • As mentioned, no indexes, so there are no strategies for rebuilding the index. Depending on how you add data to your table, your table may be coalesced, which means rewriting the underlying files in a more efficient manner.
    • Colossus underlies a massive amount of Google data across a wide range of services, ColumnIO is a standard throughout Google. I would call both of these technologies mature.
    • However, you should also consider it a black box. All of the details here may change as storage systems at Google mature or architectures change. However, it should always "just work" (within SLA caveats, of course)

    If you're interested in more details about how BigQuery works under the covers or how to use it effectively, here is a shameless plug for our book on the subject which is due out in June.

    0 讨论(0)
提交回复
热议问题