问题
I am getting an unacceptable low performance with the galera setup i created. In my setup there are 2 nodes in active-active and i am doing read/writes on both the nodes in a round robin fashion using HA-proxy load balancer.
I was easily able to get over 10000 TPS on my application with the single mariadb server with the below configuration: 36 vpcu, 60 GB RAM, SSD, 10Gig dedicated pipe
With galera i am hardly getting 3500 TPS although i am using 2 nodes(36vcpu, 60 GB RAM) of DB load balanced by ha-proxy. For information, ha-proxy is hosted as a standalone node on a different server. I have removed ha-proxy as of now but there is no improvement in performance.
Can someone please suggest some tuning parameters in my.cnf i should consider to tune this severely under-performing setup.
I am using the below my.cnf file:
回答1:
I was easily able to get over 10000 TPS on my application with the single mariadb server with the below configuration: 36 vpcu, 60 GB RAM, SSD, 10Gig dedicated pipe
With galera i am hardly getting 3500 TPS although i am using 2 nodes(36vcpu, 60 GB RAM) of DB load balanced by ha-proxy.
Clusters based on Galera are not designed to scale writes as I see you intend to do; In fact, as Rick mentioned above: sending writes to multiple nodes for the same tables will end up causing certification conflicts that will reflect as deadlocks for your application, adding huge overhead.
I am getting an unacceptable low performance with the galera setup i created. In my setup there are 2 nodes in active-active and i am doing read/writes on both the nodes in a round robin fashion using HA-proxy load balancer.
Please send all writes to a single node and see if that improves performane; There will always be some overhead due to the nature of virtually-synchronous replication that Galera uses, which literally adds network overhead to each write you perform (albeit true clock-based parallel replication will offset this impact quite a bit, still you are bound to see slightly lower throughput volumes).
Also make sure to keep your transactions short and COMMIT as soon as you are done with an atomic unit of work, since replication-certification process is single-threaded and will stall writes on the other nodes (if you see that your writer node shows transactions wsrep pre-commit stage that means the other nodes are doing certification for a large transaction or that the node is suffering performance problems of some sort -swap, full disk, abusively large reads, etc.
Hope that helps, and let us know how it goes when you move to single node.
回答2:
Turn off the QC:
query_cache_size = 0 -- not 22 bytes
query_cache_type = OFF -- QC is incompatible with Galera
Increase innodb_io_capacity
How far apart (ping time) are the two nodes?
Suggest you pretend that it is Master-Slave. That is, have HAProxy send all traffic to one node, leaving the other as a hot backup. Certain things can run faster in this mode; I don't know about your app.
来源:https://stackoverflow.com/questions/41610698/getting-a-very-bad-performance-with-galera-as-compared-to-a-standalone-mariadb-s