Spark: Broadcast variables: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transforamtion

前端 未结 1 492
一生所求
一生所求 2020-11-27 21:11
Class ProdsTransformer:

    def __init__(self):  
      self.products_lookup_hmap = {}
      self.broadcast_products_lookup_map = None

    def create_broadcast_var         


        
相关标签:
1条回答
  • 2020-11-27 21:32

    By referencing the object containing your broadcast variable in your map lambda, Spark will attempt to serialize the whole object and ship it to workers. Since the object contains a reference to the SparkContext, you get the error. Instead of this:

    pairs = distinct_users_projected.map(lambda x: (x.user_id, pt.broadcast_products_lookup_map.value[x.Prod_ID]))
    

    Try this:

    bcast = pt.broadcast_products_lookup_map
    pairs = distinct_users_projected.map(lambda x: (x.user_id, bcast.value[x.Prod_ID]))
    

    The latter avoids the reference to the object (pt) so that Spark only needs to ship the broadcast variable.

    0 讨论(0)
提交回复
热议问题