select bottleneck and insert into select doesn't work on cockroach db

浪尽此生 提交于 2019-12-13 03:14:29


I have to union 2 tables like below query. and 'table2' has 15GB data. But it show errors. I set max-sql-memory=.80 and I don't know how to solve this. When I execute this query with limit 50000 option, it works! Even 'select * from table2' shows same error. I think there are a select bottleneck somehow.... Also, with this query it is unusual only 1 of 3nodes's latency goes up. (AWS EC2 i3.xlarge type)

▶ Query
insert into table1 ( InvoiceID, PayerAccountId, LinkedAccountId, RecordType, RecordId, ProductName ) select InvoiceID, PayerAccountId, LinkedAccountId, RecordType, RecordId, ProductName from table2;

▶ Error : driver: bad connection warning: connection lost! opening new connection: all session settings will be lost

▶ Log : W180919 04:59:20.452985 186 storage/raft_transport.go:465 [n3] raft transport stream to node 2 failed: rpc error: code = Unavailable desc = transport is closing W180919 04:59:20.452996 190 vendor/ grpc: addrConn.createTransport failed to connect to { 0 }. Err :connection error: desc = "transport: Error while dialing cannot reuse client connection". Reconnecting...


If I'm understanding your question correctly, you're using a single statement to read ~15GB of data from table2 and insert it into table1. Unfortunately, as you've discovered this won't work. See limits for a single statement which covers exactly this scenario. Setting --max-sql-memory=.80 will not help and most likely will hurt as CockroachDB needs some breathing room as our memory tracking is not precise. The "bad connection warning" and the error you found in the logs are both symptoms which occur when a Cockroach process has crashed, presumably due to running out of memory.

If you need to copy the data from table2 to table1 transactionally then you're a bit out of luck at this time. While you could try using an explicit transaction and breaking the single INSERT statement into multiple statements, you'll very likely run into transaction size limits. If you can handle performing the copy non-transactionally then I'd suggest breaking the INSERT into pieces. Something along the lines of:

INSERT INTO table1 (...) SELECT ... FROM table2 WHERE InvoiceID > $1 LIMIT 10000 RETURNING InvoiceID

The idea here is that you copy in 10k row batches. You would use the RETURNING InvoiceID clause to track the last InvoiceID that was copied and start the next insert from there.

