How to get ID of a map task in Spark?

后端 未结 2 1107
有刺的猬
有刺的猬 2020-11-29 12:08

Is there a way to get ID of a map task in Spark? For example if each map task calls a user defined function, can I get the ID of that map task from whithin that user defined

相关标签:
2条回答
  • 2020-11-29 12:47

    I believe TaskContext.taskAttemptId is what you want. You can get the current task's context within a function via TaskContext.get.

    0 讨论(0)
  • 2020-11-29 12:55

    I am not sure what you mean by ID of map task but you can access task information using TaskContext:

    import org.apache.spark.TaskContext
    
    sc.parallelize(Seq[Int](), 4).mapPartitions(_ => {
        val ctx = TaskContext.get
        val stageId = ctx.stageId
        val partId = ctx.partitionId
        val hostname = java.net.InetAddress.getLocalHost().getHostName()
        Iterator(s"Stage: $stageId, Partition: $partId, Host: $hostname")
    }).collect.foreach(println)
    

    A similar functionality has been added to PySpark in Spark 2.2.0 (SPARK-18576):

    from pyspark import TaskContext
    import socket
    
    def task_info(*_):
        ctx = TaskContext()
        return ["Stage: {0}, Partition: {1}, Host: {2}".format(
            ctx.stageId(), ctx.partitionId(), socket.gethostname())]
    
    for x in sc.parallelize([], 4).mapPartitions(task_info).collect():
        print(x)
    
    0 讨论(0)
提交回复
热议问题