问题
I would like to know how to specify a custom profiler class in PySpark for Spark version 2+. Under 1.6, I know I can do so like this:
sc = SparkContext('local', 'test', profiler_cls='MyProfiler')
but when I create the SparkSession
in 2.0 I don't explicitly have access to
the SparkContext
. Can someone please advise how to do this for Spark 2.0+ ?
回答1:
SparkSession
can be initialized with an existing SparkContext
, for example:
from pyspark import SparkContext
from pyspark.sql import SparkSession
from pyspark.profiler import BasicProfiler
spark = SparkSession(SparkContext('local', 'test', profiler_cls=BasicProfiler))
来源:https://stackoverflow.com/questions/42676078/specifiying-custom-profilers-for-pyspark-running-spark-2-0