问题
Given the following table schema:
CREATE TABLE Record (
-- uuidv4
recordId STRING(36) NOT NULL,
-- uuidv4
userId STRING(36),
isActive BOOL
lastUpdate TIMESTAMP NOT NULL OPTIONS (allow_commit_timestamp=true)
...
) PRIMARY KEY (recordId)
CREATE NULL_FILTERED INDEX RecordByUser
ON Record (userId, isActive)
For every record created we make a record (in the index) to be able able to get all of a user's records by their userId. Depending on what may be needed there could be an extra STORING
clause with additional information columns.
My understanding is that as I add records to the Record
table, Spanner will trigger a write to the index. Since the index is non-interleaved the data itself may have a different locality to the original record.
Under that assumption, will that write to the secondary index lock the Record
table until it is completed or does one not affect the other?
I'm going to guess they are totally independent since an index can be created after the fact and Spanner will trigger a backfill operation that does not affect the operational status of the Record
table.
The act of writing the index has to take some resources though from the node(s) so I would imagine that is really the limitation. Under a high write scenario for the Record
table, we would also be effectively invoking a second write for the Index table RecordByUser
consuming a bit more of the node(s) write throughput capacity.
So the act of adding to a Secondary Index doesn't require any locking on the source table (Record
in this case). The primary concern would be the write throughput and any hotspots from those writes. For example, if we indexed on a timestamp as the first part of the index, the writes to the index would bunch up. Is my understanding here correct?
During the act of creating the index on an existing table, does the backfill process hold an exclusive lock on the index, like Postgres for example:
https://www.postgresql.org/docs/current/index-locking.html
Or can new writes land in the index during the secondary index creation while backfill is taking place?
I can imagine a backfill process on spanners end of things that takes a read snapshot and starts writing. Given Spanners fancy clocks if it encounters a row in the index newer than the row it is attempting to write, it just drops the old row on the floor and carries on.
回答1:
Thanks for the question. Google engineer here for the help.
+1 to chainicko@ answer for the general locking mechanism. It is not "locked" in the sense that you can still read/write the original table despite the backfill is still running.
Read/query to the index itself are not allowed during the backfill. But writes to the original table are allowed. New writes are added to the index concurrently. After the backfill, Spanner will make sure only the latest data will be presented when queried.
As for the example of "indexed on a timestamp as the first part of the index", since it creates a hotspot on the index, so it would still have a negative impact on the system as a whole, even though it does not lock the original table.
来源:https://stackoverflow.com/questions/62501756/does-a-secondry-index-lock-anything-when-it-is-being-created