apache-spark-1.6

udf No TypeTag available for type string

你。 提交于 2019-11-28 11:29:07
问题 I don't understand a behavior of spark. I create an udf wich return an Integer like below import org.apache.spark.sql.SQLContext import org.apache.spark.{SparkConf, SparkContext} object Show { def main(args: Array[String]): Unit = { val (sc,sqlContext) = iniSparkConf("test") val testInt_udf = sqlContext.udf.register("testInt_udf", testInt _) } def iniSparkConf(appName: String): (SparkContext, SQLContext) = { val conf = new SparkConf().setAppName(appName)//.setExecutorEnv("spark.ui.port",

Spark CrossValidatorModel access other models than the bestModel?

南笙酒味 提交于 2019-11-27 16:04:55
I am using Spark 1.6.1: Currently I am using a CrossValidator to train my ML Pipeline with various parameters. After the training process I can use the bestModel property of the CrossValidatorModel to get the Model that performed best during the Cross Validation. Are the other models of the cross validation automatically discarded or can I select a model that performed worse than the bestModel? I am asking because I am using the F1 Score metric for the cross validation but I am also interested in the weighedRecall of all of the models and not just of the model that has performed best during

Where is the reference for options for writing or reading per format?

安稳与你 提交于 2019-11-27 12:29:53
I use Spark 1.6.1. We are trying to write an ORC file to HDFS using HiveContext and DataFrameWriter. While we can use df.write().orc(<path>) we would rather do something like df.write().options(Map("format" -> "orc", "path" -> "/some_path") This is so that we have the flexibility to change the format or root path depending on the application that uses this helper library. Where can we find a reference to the options that can be passed into the DataFrameWriter? I found nothing in the docs here https://spark.apache.org/docs/1.6.0/api/java/org/apache/spark/sql/DataFrameWriter.html#options(java

Spark CrossValidatorModel access other models than the bestModel?

你说的曾经没有我的故事 提交于 2019-11-26 17:24:24
问题 I am using Spark 1.6.1: Currently I am using a CrossValidator to train my ML Pipeline with various parameters. After the training process I can use the bestModel property of the CrossValidatorModel to get the Model that performed best during the Cross Validation. Are the other models of the cross validation automatically discarded or can I select a model that performed worse than the bestModel? I am asking because I am using the F1 Score metric for the cross validation but I am also

Where is the reference for options for writing or reading per format?

只愿长相守 提交于 2019-11-26 16:05:36
问题 I use Spark 1.6.1. We are trying to write an ORC file to HDFS using HiveContext and DataFrameWriter. While we can use df.write().orc(<path>) we would rather do something like df.write().options(Map("format" -> "orc", "path" -> "/some_path") This is so that we have the flexibility to change the format or root path depending on the application that uses this helper library. Where can we find a reference to the options that can be passed into the DataFrameWriter? I found nothing in the docs here