Does a Secondry index lock anything when it is being created?

拥有回忆 提交于 2021-02-10 06:29:23

问题


Given the following table schema:

CREATE TABLE Record (
    -- uuidv4
    recordId STRING(36) NOT NULL,
    -- uuidv4
    userId STRING(36),
    isActive BOOL
    lastUpdate TIMESTAMP NOT NULL OPTIONS (allow_commit_timestamp=true)
    ...
) PRIMARY KEY (recordId)

CREATE NULL_FILTERED INDEX RecordByUser 
ON Record (userId, isActive)

For every record created we make a record (in the index) to be able able to get all of a user's records by their userId. Depending on what may be needed there could be an extra STORING clause with additional information columns.

My understanding is that as I add records to the Record table, Spanner will trigger a write to the index. Since the index is non-interleaved the data itself may have a different locality to the original record.

Under that assumption, will that write to the secondary index lock the Record table until it is completed or does one not affect the other?

I'm going to guess they are totally independent since an index can be created after the fact and Spanner will trigger a backfill operation that does not affect the operational status of the Record table.

The act of writing the index has to take some resources though from the node(s) so I would imagine that is really the limitation. Under a high write scenario for the Record table, we would also be effectively invoking a second write for the Index table RecordByUser consuming a bit more of the node(s) write throughput capacity.

So the act of adding to a Secondary Index doesn't require any locking on the source table (Record in this case). The primary concern would be the write throughput and any hotspots from those writes. For example, if we indexed on a timestamp as the first part of the index, the writes to the index would bunch up. Is my understanding here correct?

During the act of creating the index on an existing table, does the backfill process hold an exclusive lock on the index, like Postgres for example:

https://www.postgresql.org/docs/current/index-locking.html

Or can new writes land in the index during the secondary index creation while backfill is taking place?

I can imagine a backfill process on spanners end of things that takes a read snapshot and starts writing. Given Spanners fancy clocks if it encounters a row in the index newer than the row it is attempting to write, it just drops the old row on the floor and carries on.


回答1:


Thanks for the question. Google engineer here for the help.

+1 to chainicko@ answer for the general locking mechanism. It is not "locked" in the sense that you can still read/write the original table despite the backfill is still running.

Read/query to the index itself are not allowed during the backfill. But writes to the original table are allowed. New writes are added to the index concurrently. After the backfill, Spanner will make sure only the latest data will be presented when queried.

As for the example of "indexed on a timestamp as the first part of the index", since it creates a hotspot on the index, so it would still have a negative impact on the system as a whole, even though it does not lock the original table.



来源:https://stackoverflow.com/questions/62501756/does-a-secondry-index-lock-anything-when-it-is-being-created

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!