问题
I was wanting to pull data from about 1500 remote Oracle tables with Spark, and I want to have a multi-threaded application that picks up a table per thread or maybe 10 tables per thread and launches a spark job to read from their respective tables.
From official spark site https://spark.apache.org/docs/latest/job-scheduling.html it's clear that this can work...
...cluster managers that Spark runs on provide facilities for scheduling across applications. Second, within each Spark application, multiple “jobs” (Spark actions) may be running concurrently if they were submitted by different threads. This is common if your application is serving requests over the network. Spark includes a fair scheduler to schedule resources within each SparkContext.
However you might have noticed in this SO post Concurrent job Execution in Spark that there was no accepted answer on this similar question and the most upvoted answer starts with
This is not really in the spirit of Spark
- Everyone knows it's not in the "spirit" of Spark
- Who cares what is the spirit of Spark? That doesn't actually mean anything
Has anyone gotten something like this to work before? Did you have to do anything special? Just wanted some pointers before I wasted a lot of work hours prototyping. I would really appreciate any help on this!
回答1:
The spark context is thread safe, so it's possible to call it from many threads in parallel. (I am doing it in production)
One thing to be aware of, is to limit the number of thread you have running, because:
1. the executor memory is shared between all threads, and you might get OOM or constantly swap in and out memory from the cache
2. the cpu is limited, so having more tasks than core won't have any improvement
回答2:
You do not need to submit your jobs in one multithreaded application (although I do see no reason you could not do so). Just submit your jobs as individual processes. Have a script that submits all those jobs one at a time and push the process to the background, or submit in yarn-cluster mode. Your scheduler (yarn, mesos, spark cluster), will only let some of your jobs wait as it has no room for all the schedulers to run at the same time based on memory and / or cpu availability.
Note that I only see benefit to your approach if you truly process your tables using multiple partitions - not just one as I have seen many times. Also because you need to process that many tables, I am not sure how much - if any at all - you will benefit. It might be simpler, depending on what you do with the table data, to have just multiple single thread and non-spark jobs running.
Also see @cowbert his note.
回答3:
Agree with @lev, I was wondering about it for a long time, So I wrote a simple small code to make sure it works, PLEASE NOTE!! in order to control the number of workers per driver you need to limit the dataframe/set with coalesce.
Here is the Example code:
import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession
object SparkMultiThreadExample extends App{
val TOTAL_WORKERS = 10
val NUMBER_OF_WORKERS_PER_DRIVER = 2
val sparkConf = new SparkConf()
sparkConf.setMaster(s"local[${TOTAL_WORKERS}]")
val spark = SparkSession.builder().config(sparkConf).getOrCreate()
val list1 = (0 until 10).toList
import spark.implicits._
list1.par.foreach(t => {
spark.createDataset(list1).coalesce(NUMBER_OF_WORKERS_PER_DRIVER).foreach(i => {
println(s"${Thread.currentThread()}, Driver thread ${t}: This is inside worker ${i} " )
Thread.sleep(1000)
println(s"FINISH ${Thread.currentThread()} Driver thread ${t}: This is inside worker ${i} " )
})
}) }
OUTPUT:
Thread[Executor task launch worker for task 0,5,main], Driver thread 0: This is inside worker 0
Thread[Executor task launch worker for task 4,5,main], Driver thread 3: This is inside worker 0
Thread[Executor task launch worker for task 7,5,main], Driver thread 5: This is inside worker 5
Thread[Executor task launch worker for task 1,5,main], Driver thread 0: This is inside worker 5
Thread[Executor task launch worker for task 3,5,main], Driver thread 2: This is inside worker 5
Thread[Executor task launch worker for task 6,5,main], Driver thread 5: This is inside worker 0
Thread[Executor task launch worker for task 2,5,main], Driver thread 2: This is inside worker 0
Thread[Executor task launch worker for task 5,5,main], Driver thread 3: This is inside worker 5
Thread[Executor task launch worker for task 9,5,main], Driver thread 4: This is inside worker 5
Thread[Executor task launch worker for task 8,5,main], Driver thread 4: This is inside worker 0
FINISH Thread[Executor task launch worker for task 0,5,main] Driver thread 0: This is inside worker 0
FINISH Thread[Executor task launch worker for task 7,5,main] Driver thread 5: This is inside worker 5
FINISH Thread[Executor task launch worker for task 4,5,main] Driver thread 3: This is inside worker 0
FINISH Thread[Executor task launch worker for task 3,5,main] Driver thread 2: This is inside worker 5
FINISH Thread[Executor task launch worker for task 1,5,main] Driver thread 0: This is inside worker 5
Thread[Executor task launch worker for task 3,5,main], Driver thread 2: This is inside worker 6
Thread[Executor task launch worker for task 4,5,main], Driver thread 3: This is inside worker 1
Thread[Executor task launch worker for task 1,5,main], Driver thread 0: This is inside worker 6
Thread[Executor task launch worker for task 0,5,main], Driver thread 0: This is inside worker 1
Thread[Executor task launch worker for task 7,5,main], Driver thread 5: This is inside worker 6
FINISH Thread[Executor task launch worker for task 2,5,main] Driver thread 2: This is inside worker 0
FINISH Thread[Executor task launch worker for task 5,5,main] Driver thread 3: This is inside worker 5
Thread[Executor task launch worker for task 2,5,main], Driver thread 2: This is inside worker 1
FINISH Thread[Executor task launch worker for task 9,5,main] Driver thread 4: This is inside worker 5
FINISH Thread[Executor task launch worker for task 6,5,main] Driver thread 5: This is inside worker 0
Thread[Executor task launch worker for task 9,5,main], Driver thread 4: This is inside worker 6
Thread[Executor task launch worker for task 5,5,main], Driver thread 3: This is inside worker 6
FINISH Thread[Executor task launch worker for task 8,5,main] Driver thread 4: This is inside worker 0
Thread[Executor task launch worker for task 6,5,main], Driver thread 5: This is inside worker 1
Thread[Executor task launch worker for task 8,5,main], Driver thread 4: This is inside worker 1
FINISH Thread[Executor task launch worker for task 3,5,main] Driver thread 2: This is inside worker 6
FINISH Thread[Executor task launch worker for task 4,5,main] Driver thread 3: This is inside worker 1
FINISH Thread[Executor task launch worker for task 1,5,main] Driver thread 0: This is inside worker 6
Thread[Executor task launch worker for task 4,5,main], Driver thread 3: This is inside worker 2
Thread[Executor task launch worker for task 3,5,main], Driver thread 2: This is inside worker 7
FINISH Thread[Executor task launch worker for task 0,5,main] Driver thread 0: This is inside worker 1
Thread[Executor task launch worker for task 1,5,main], Driver thread 0: This is inside worker 7
FINISH Thread[Executor task launch worker for task 7,5,main] Driver thread 5: This is inside worker 6
Thread[Executor task launch worker for task 0,5,main], Driver thread 0: This is inside worker 2
Thread[Executor task launch worker for task 7,5,main], Driver thread 5: This is inside worker 7
FINISH Thread[Executor task launch worker for task 2,5,main] Driver thread 2: This is inside worker 1
Thread[Executor task launch worker for task 2,5,main], Driver thread 2: This is inside worker 2
FINISH Thread[Executor task launch worker for task 9,5,main] Driver thread 4: This is inside worker 6
Thread[Executor task launch worker for task 9,5,main], Driver thread 4: This is inside worker 7
FINISH Thread[Executor task launch worker for task 5,5,main] Driver thread 3: This is inside worker 6
Thread[Executor task launch worker for task 5,5,main], Driver thread 3: This is inside worker 7
FINISH Thread[Executor task launch worker for task 6,5,main] Driver thread 5: This is inside worker 1
Thread[Executor task launch worker for task 6,5,main], Driver thread 5: This is inside worker 2
FINISH Thread[Executor task launch worker for task 8,5,main] Driver thread 4: This is inside worker 1
Thread[Executor task launch worker for task 8,5,main], Driver thread 4: This is inside worker 2
FINISH Thread[Executor task launch worker for task 4,5,main] Driver thread 3: This is inside worker 2
FINISH Thread[Executor task launch worker for task 7,5,main] Driver thread 5: This is inside worker 7
FINISH Thread[Executor task launch worker for task 0,5,main] Driver thread 0: This is inside worker 2
FINISH Thread[Executor task launch worker for task 1,5,main] Driver thread 0: This is inside worker 7
FINISH Thread[Executor task launch worker for task 3,5,main] Driver thread 2: This is inside worker 7
Thread[Executor task launch worker for task 7,5,main], Driver thread 5: This is inside worker 8
Thread[Executor task launch worker for task 4,5,main], Driver thread 3: This is inside worker 3
Thread[Executor task launch worker for task 3,5,main], Driver thread 2: This is inside worker 8
Thread[Executor task launch worker for task 0,5,main], Driver thread 0: This is inside worker 3
Thread[Executor task launch worker for task 1,5,main], Driver thread 0: This is inside worker 8
FINISH Thread[Executor task launch worker for task 2,5,main] Driver thread 2: This is inside worker 2
Thread[Executor task launch worker for task 2,5,main], Driver thread 2: This is inside worker 3
FINISH Thread[Executor task launch worker for task 9,5,main] Driver thread 4: This is inside worker 7
FINISH Thread[Executor task launch worker for task 5,5,main] Driver thread 3: This is inside worker 7
Thread[Executor task launch worker for task 9,5,main], Driver thread 4: This is inside worker 8
Thread[Executor task launch worker for task 5,5,main], Driver thread 3: This is inside worker 8
FINISH Thread[Executor task launch worker for task 6,5,main] Driver thread 5: This is inside worker 2
FINISH Thread[Executor task launch worker for task 8,5,main] Driver thread 4: This is inside worker 2
Thread[Executor task launch worker for task 6,5,main], Driver thread 5: This is inside worker 3
Thread[Executor task launch worker for task 8,5,main], Driver thread 4: This is inside worker 3
FINISH Thread[Executor task launch worker for task 7,5,main] Driver thread 5: This is inside worker 8
FINISH Thread[Executor task launch worker for task 4,5,main] Driver thread 3: This is inside worker 3
FINISH Thread[Executor task launch worker for task 0,5,main] Driver thread 0: This is inside worker 3
FINISH Thread[Executor task launch worker for task 3,5,main] Driver thread 2: This is inside worker 8
Thread[Executor task launch worker for task 0,5,main], Driver thread 0: This is inside worker 4
Thread[Executor task launch worker for task 3,5,main], Driver thread 2: This is inside worker 9
Thread[Executor task launch worker for task 4,5,main], Driver thread 3: This is inside worker 4
Thread[Executor task launch worker for task 7,5,main], Driver thread 5: This is inside worker 9
FINISH Thread[Executor task launch worker for task 1,5,main] Driver thread 0: This is inside worker 8
Thread[Executor task launch worker for task 1,5,main], Driver thread 0: This is inside worker 9
FINISH Thread[Executor task launch worker for task 2,5,main] Driver thread 2: This is inside worker 3
Thread[Executor task launch worker for task 2,5,main], Driver thread 2: This is inside worker 4
FINISH Thread[Executor task launch worker for task 9,5,main] Driver thread 4: This is inside worker 8
FINISH Thread[Executor task launch worker for task 5,5,main] Driver thread 3: This is inside worker 8
Thread[Executor task launch worker for task 9,5,main], Driver thread 4: This is inside worker 9
FINISH Thread[Executor task launch worker for task 6,5,main] Driver thread 5: This is inside worker 3
FINISH Thread[Executor task launch worker for task 8,5,main] Driver thread 4: This is inside worker 3
Thread[Executor task launch worker for task 5,5,main], Driver thread 3: This is inside worker 9
Thread[Executor task launch worker for task 8,5,main], Driver thread 4: This is inside worker 4
Thread[Executor task launch worker for task 6,5,main], Driver thread 5: This is inside worker 4
FINISH Thread[Executor task launch worker for task 0,5,main] Driver thread 0: This is inside worker 4
FINISH Thread[Executor task launch worker for task 4,5,main] Driver thread 3: This is inside worker 4
FINISH Thread[Executor task launch worker for task 3,5,main] Driver thread 2: This is inside worker 9
FINISH Thread[Executor task launch worker for task 7,5,main] Driver thread 5: This is inside worker 9
FINISH Thread[Executor task launch worker for task 1,5,main] Driver thread 0: This is inside worker 9
FINISH Thread[Executor task launch worker for task 2,5,main] Driver thread 2: This is inside worker 4
FINISH Thread[Executor task launch worker for task 9,5,main] Driver thread 4: This is inside worker 9
FINISH Thread[Executor task launch worker for task 5,5,main] Driver thread 3: This is inside worker 9
FINISH Thread[Executor task launch worker for task 6,5,main] Driver thread 5: This is inside worker 4
FINISH Thread[Executor task launch worker for task 8,5,main] Driver thread 4: This is inside worker 4
Thread[Executor task launch worker for task 11,5,main], Driver thread 7: This is inside worker 5
Thread[Executor task launch worker for task 10,5,main], Driver thread 7: This is inside worker 0
Thread[Executor task launch worker for task 12,5,main], Driver thread 6: This is inside worker 0
Thread[Executor task launch worker for task 13,5,main], Driver thread 6: This is inside worker 5
Thread[Executor task launch worker for task 14,5,main], Driver thread 1: This is inside worker 0
Thread[Executor task launch worker for task 15,5,main], Driver thread 1: This is inside worker 5
Thread[Executor task launch worker for task 16,5,main], Driver thread 8: This is inside worker 0
Thread[Executor task launch worker for task 17,5,main], Driver thread 8: This is inside worker 5
Thread[Executor task launch worker for task 19,5,main], Driver thread 9: This is inside worker 5
Thread[Executor task launch worker for task 18,5,main], Driver thread 9: This is inside worker 0
FINISH Thread[Executor task launch worker for task 11,5,main] Driver thread 7: This is inside worker 5
Thread[Executor task launch worker for task 11,5,main], Driver thread 7: This is inside worker 6
FINISH Thread[Executor task launch worker for task 10,5,main] Driver thread 7: This is inside worker 0
Thread[Executor task launch worker for task 10,5,main], Driver thread 7: This is inside worker 1
FINISH Thread[Executor task launch worker for task 12,5,main] Driver thread 6: This is inside worker 0
Thread[Executor task launch worker for task 12,5,main], Driver thread 6: This is inside worker 1
FINISH Thread[Executor task launch worker for task 13,5,main] Driver thread 6: This is inside worker 5
Thread[Executor task launch worker for task 13,5,main], Driver thread 6: This is inside worker 6
FINISH Thread[Executor task launch worker for task 14,5,main] Driver thread 1: This is inside worker 0
Thread[Executor task launch worker for task 14,5,main], Driver thread 1: This is inside worker 1
FINISH Thread[Executor task launch worker for task 15,5,main] Driver thread 1: This is inside worker 5
Thread[Executor task launch worker for task 15,5,main], Driver thread 1: This is inside worker 6
FINISH Thread[Executor task launch worker for task 16,5,main] Driver thread 8: This is inside worker 0
Thread[Executor task launch worker for task 16,5,main], Driver thread 8: This is inside worker 1
FINISH Thread[Executor task launch worker for task 17,5,main] Driver thread 8: This is inside worker 5
Thread[Executor task launch worker for task 17,5,main], Driver thread 8: This is inside worker 6
FINISH Thread[Executor task launch worker for task 19,5,main] Driver thread 9: This is inside worker 5
Thread[Executor task launch worker for task 19,5,main], Driver thread 9: This is inside worker 6
FINISH Thread[Executor task launch worker for task 18,5,main] Driver thread 9: This is inside worker 0
Thread[Executor task launch worker for task 18,5,main], Driver thread 9: This is inside worker 1
FINISH Thread[Executor task launch worker for task 11,5,main] Driver thread 7: This is inside worker 6
Thread[Executor task launch worker for task 11,5,main], Driver thread 7: This is inside worker 7
FINISH Thread[Executor task launch worker for task 10,5,main] Driver thread 7: This is inside worker 1
Thread[Executor task launch worker for task 10,5,main], Driver thread 7: This is inside worker 2
FINISH Thread[Executor task launch worker for task 12,5,main] Driver thread 6: This is inside worker 1
Thread[Executor task launch worker for task 12,5,main], Driver thread 6: This is inside worker 2
FINISH Thread[Executor task launch worker for task 13,5,main] Driver thread 6: This is inside worker 6
Thread[Executor task launch worker for task 13,5,main], Driver thread 6: This is inside worker 7
FINISH Thread[Executor task launch worker for task 14,5,main] Driver thread 1: This is inside worker 1
Thread[Executor task launch worker for task 14,5,main], Driver thread 1: This is inside worker 2
FINISH Thread[Executor task launch worker for task 15,5,main] Driver thread 1: This is inside worker 6
Thread[Executor task launch worker for task 15,5,main], Driver thread 1: This is inside worker 7
FINISH Thread[Executor task launch worker for task 16,5,main] Driver thread 8: This is inside worker 1
Thread[Executor task launch worker for task 16,5,main], Driver thread 8: This is inside worker 2
FINISH Thread[Executor task launch worker for task 17,5,main] Driver thread 8: This is inside worker 6
Thread[Executor task launch worker for task 17,5,main], Driver thread 8: This is inside worker 7
FINISH Thread[Executor task launch worker for task 19,5,main] Driver thread 9: This is inside worker 6
Thread[Executor task launch worker for task 19,5,main], Driver thread 9: This is inside worker 7
FINISH Thread[Executor task launch worker for task 18,5,main] Driver thread 9: This is inside worker 1
Thread[Executor task launch worker for task 18,5,main], Driver thread 9: This is inside worker 2
FINISH Thread[Executor task launch worker for task 11,5,main] Driver thread 7: This is inside worker 7
Thread[Executor task launch worker for task 11,5,main], Driver thread 7: This is inside worker 8
FINISH Thread[Executor task launch worker for task 10,5,main] Driver thread 7: This is inside worker 2
Thread[Executor task launch worker for task 10,5,main], Driver thread 7: This is inside worker 3
FINISH Thread[Executor task launch worker for task 12,5,main] Driver thread 6: This is inside worker 2
Thread[Executor task launch worker for task 12,5,main], Driver thread 6: This is inside worker 3
FINISH Thread[Executor task launch worker for task 13,5,main] Driver thread 6: This is inside worker 7
Thread[Executor task launch worker for task 13,5,main], Driver thread 6: This is inside worker 8
FINISH Thread[Executor task launch worker for task 14,5,main] Driver thread 1: This is inside worker 2
Thread[Executor task launch worker for task 14,5,main], Driver thread 1: This is inside worker 3
FINISH Thread[Executor task launch worker for task 15,5,main] Driver thread 1: This is inside worker 7
Thread[Executor task launch worker for task 15,5,main], Driver thread 1: This is inside worker 8
FINISH Thread[Executor task launch worker for task 16,5,main] Driver thread 8: This is inside worker 2
Thread[Executor task launch worker for task 16,5,main], Driver thread 8: This is inside worker 3
FINISH Thread[Executor task launch worker for task 17,5,main] Driver thread 8: This is inside worker 7
Thread[Executor task launch worker for task 17,5,main], Driver thread 8: This is inside worker 8
FINISH Thread[Executor task launch worker for task 19,5,main] Driver thread 9: This is inside worker 7
Thread[Executor task launch worker for task 19,5,main], Driver thread 9: This is inside worker 8
FINISH Thread[Executor task launch worker for task 18,5,main] Driver thread 9: This is inside worker 2
Thread[Executor task launch worker for task 18,5,main], Driver thread 9: This is inside worker 3
FINISH Thread[Executor task launch worker for task 11,5,main] Driver thread 7: This is inside worker 8
Thread[Executor task launch worker for task 11,5,main], Driver thread 7: This is inside worker 9
FINISH Thread[Executor task launch worker for task 10,5,main] Driver thread 7: This is inside worker 3
Thread[Executor task launch worker for task 10,5,main], Driver thread 7: This is inside worker 4
FINISH Thread[Executor task launch worker for task 12,5,main] Driver thread 6: This is inside worker 3
Thread[Executor task launch worker for task 12,5,main], Driver thread 6: This is inside worker 4
FINISH Thread[Executor task launch worker for task 13,5,main] Driver thread 6: This is inside worker 8
Thread[Executor task launch worker for task 13,5,main], Driver thread 6: This is inside worker 9
FINISH Thread[Executor task launch worker for task 14,5,main] Driver thread 1: This is inside worker 3
Thread[Executor task launch worker for task 14,5,main], Driver thread 1: This is inside worker 4
FINISH Thread[Executor task launch worker for task 15,5,main] Driver thread 1: This is inside worker 8
Thread[Executor task launch worker for task 15,5,main], Driver thread 1: This is inside worker 9
FINISH Thread[Executor task launch worker for task 16,5,main] Driver thread 8: This is inside worker 3
Thread[Executor task launch worker for task 16,5,main], Driver thread 8: This is inside worker 4
FINISH Thread[Executor task launch worker for task 17,5,main] Driver thread 8: This is inside worker 8
Thread[Executor task launch worker for task 17,5,main], Driver thread 8: This is inside worker 9
FINISH Thread[Executor task launch worker for task 19,5,main] Driver thread 9: This is inside worker 8
Thread[Executor task launch worker for task 19,5,main], Driver thread 9: This is inside worker 9
FINISH Thread[Executor task launch worker for task 18,5,main] Driver thread 9: This is inside worker 3
Thread[Executor task launch worker for task 18,5,main], Driver thread 9: This is inside worker 4
FINISH Thread[Executor task launch worker for task 11,5,main] Driver thread 7: This is inside worker 9
FINISH Thread[Executor task launch worker for task 10,5,main] Driver thread 7: This is inside worker 4
FINISH Thread[Executor task launch worker for task 12,5,main] Driver thread 6: This is inside worker 4
FINISH Thread[Executor task launch worker for task 13,5,main] Driver thread 6: This is inside worker 9
FINISH Thread[Executor task launch worker for task 14,5,main] Driver thread 1: This is inside worker 4
FINISH Thread[Executor task launch worker for task 15,5,main] Driver thread 1: This is inside worker 9
FINISH Thread[Executor task launch worker for task 16,5,main] Driver thread 8: This is inside worker 4
FINISH Thread[Executor task launch worker for task 17,5,main] Driver thread 8: This is inside worker 9
FINISH Thread[Executor task launch worker for task 19,5,main] Driver thread 9: This is inside worker 9
FINISH Thread[Executor task launch worker for task 18,5,main] Driver thread 9: This is inside worker 4
来源:https://stackoverflow.com/questions/47842048/launching-apache-spark-sql-jobs-from-multi-threaded-driver