Google Dataproc timing out and killing excutors
问题 I have a google dataproc spark cluster set up with one master node, and 16 worker nodes. The master has 2 cpus and 13g of memory and each worker has 2 cpus and 3.5g of memory. I am running a rather network-intensive job where I have an array of 16 objects and I partition this array into 16 partitions so each worker gets one object. The objects make about 2.5 million web requests and aggregates them to send back to the master. Each request is a Solr response and is less than 50k. One field (an