问题
I have been using Spark Data Source to write to Kudu from Parquet, and the write performance is terrible: about 12000 rows / seconds. Each row roughly 160 bytes.
We have 7 kudu nodes, 24 core + 64 GB RAM each + 12 SATA disk each. None of the resources seem to be the bottleneck: tserver cpu usage ~3-4 core, RAM 10G, no disk congestion.
Still I see most of the time write requests were stuck in queuing. Any ideas are appreciated.
W0811 12:34:03.526340 7753 rpcz_store.cc:251] Call kudu.tserver.TabletServerService.Write from 10.60.170.18:10000 (ReqId={client: 81ae6f3c6e1b4d9493ea95f87ccd1dfa, seq_no=9365, attempt_no=1}) took 13255ms (client timeout 10000).
W0811 12:34:03.526489 7753 rpcz_store.cc:255] Trace:
0811 12:33:50.270477 (+ 0us) service_pool.cc:163] Inserting onto call queue
0811 12:33:50.270497 (+ 20us) service_pool.cc:222] Handling call
0811 12:34:03.526316 (+13255819us) inbound_call.cc:157] Queueing success response
Related trace 'txn':
0811 12:34:03.328337 (+ 0us) write_transaction.cc:101] PREPARE: Starting
0811 12:34:03.328563 (+ 226us) write_transaction.cc:268] Acquiring schema lock in shared mode
0811 12:34:03.328564 (+ 1us) write_transaction.cc:271] Acquired schema lock
0811 12:34:03.328564 (+ 0us) tablet.cc:400] PREPARE: Decoding operations
0811 12:34:03.328742 (+ 178us) tablet.cc:422] PREPARE: Acquiring locks for 24 operations
0811 12:34:03.447163 (+118421us) lock_manager.cc:377] Waited 118408us for lock on <redacted>
0811 12:34:03.447203 (+ 40us) tablet.cc:426] PREPARE: locks acquired
0811 12:34:03.447203 (+ 0us) write_transaction.cc:126] PREPARE: finished.
0811 12:34:03.447361 (+ 158us) write_transaction.cc:136] Start()
0811 12:34:03.447366 (+ 5us) write_transaction.cc:141] Timestamp: P: 1533965643563964 usec, L: 6
0811 12:34:03.447674 (+ 308us) log.cc:582] Serialized 64909 byte log entry
0811 12:34:03.449561 (+ 1887us) write_transaction.cc:149] APPLY: Starting
0811 12:34:03.526238 (+ 76677us) tablet_metrics.cc:365] ProbeStats: bloom_lookups=48,key_file_lookups=48,delta_file_lookups=24,mrs_lookups=0
0811 12:34:03.526260 (+ 22us) log.cc:582] Serialized 237 byte log entry
0811 12:34:03.526268 (+ 8us) write_transaction.cc:309] Releasing row and schema locks
0811 12:34:03.526280 (+ 12us) write_transaction.cc:277] Released schema lock
0811 12:34:03.526300 (+ 20us) write_transaction.cc:196] FINISH: updating metrics
Metrics: {"child_traces":[["txn",{"apply.queue_time_us":11,"cfile_cache_hit":205,"cfile_cache_hit_bytes":21900627,"num_ops":24,"prepare.queue_time_us":13057291,"prepare.run_cpu_time_us":1017,"prepare.run_wall_time_us":119378,"raft.queue_time_us":71,"raft.run_cpu_time_us":303,"raft.run_wall_time_us":304,"replication_time_us":2170,"row_lock_wait_count":1,"row_lock_wait_us":118408,"spinlock_wait_cycles":45824}]]}
回答1:
The first challenge was that it took forever to ingest a 23M row table with 200 columns into Kudu (4 hash partitions by primary key). Precisely, it took a whopping 58 minutes, which translated to 63 rows per second. I could not believe Kudu was that slow and we did double check installation and configuration docs. Unfortunately, we had trusted the defaults and as I found out on the Kudu slack channel (thanks, Will Berkeley!), there are two parameters that need to be tweaked. Specifically:
memory_limit_hard_bytes
controls the total amount of memory Kudu daemon should use.
maintenance_manager_num
number of maintenance threads, recommended setting to 1/3 of the number of disks, used for Kudu
The defaults with CDH Kudu parcel were quite terrible - Kudu was limited by 1Gb of memory and was only using 1 maintenance thread. We set the latter one to 4 (12 drives / 3) and the former one to 0 (dynamic allocation). CM did not want to accept 0 for memory_limit_hard_bytes
and we had to use a CM safety valve to override it. Once it was done and Kudu restarted, my first 23M table was finished in 240 seconds (~95k rows per second) - much better! CTAS from Impala to Impala parquet took only 60 seconds.
回答2:
Turn out it's due to the duplication in our data. We are using a field which contains about 1.2 million rows with the same value (which is and empty string) as the primary key in Kudu. So Kudu updated that same key 1.2 million times, and each time it needed to acquire a lock, hence the drop in ingesting speed over time.
We have removed the duplicated key rows and the ingesting speed increase to 10x.
来源:https://stackoverflow.com/questions/51815419/apache-kudu-slow-insert-high-queuing-time