Trying to use map on a Spark DataFrame

こ雲淡風輕ζ 提交于 2020-06-24 22:24:07

问题


I recently started experimenting with both Spark and Java. I initially went through the famous WordCountexample using RDD and everything went as expected. Now I am trying to implement my own example but using DataFrames and not RDDs.

So I am reading a dataset from a file with

DataFrame df = sqlContext.read()
        .format("com.databricks.spark.csv")
        .option("inferSchema", "true")
        .option("delimiter", ";")
        .option("header", "true")
        .load(inputFilePath);

and then I try to select a specific column and apply a simple transformation to every row like that

df = df.select("start")
        .map(text -> text + "asd");

But the compilation finds a problem with the second row which I don't fully understand (The start column is inferred as of type string).

Multiple non-overriding abstract methods found in interface scala.Function1

Why is my lambda function treated as a Scala function and what does the error message actually mean?


回答1:


If you use the selectfunction on a dataframe you get a dataframe back. Then you apply a function on the Rowdatatype not the value of the row. Afterwards you should get the value first so you should do the following:

df.select("start").map(el->el.getString(0)+"asd")

But you will get an RDD as return value not a DF




回答2:


I use concat to achieve this

df.withColumn( concat(col('start'), lit('asd'))

As you're mapping the same text twice I'm not sure if you're also looking to replace the first part of the string? but if you are, I would do:

df.withColumn('start', concat(
                      when(col('start') == 'text', lit('new'))
                      .otherwise(col('start))
                     , lit('asd')
                     )

This solution scales up when using big data, as it's concatinating two columns instead of iterating over values.



来源:https://stackoverflow.com/questions/42561084/trying-to-use-map-on-a-spark-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!