I need to define custom methods on DataFrame. What is the better way to do it? The solution should be scalable, as I intend to define a significant number of custom methods
There is a slightly simpler approach: just declare MyClass as implicit
implicit class MyClass(df: DataFrame) { def myMethod = ... }
This automatically creates the implicit conversion method (also called MyClass
). You can also make it a value class by adding extends AnyVal
which avoids some overhead by not actually creating a MyClass
instance at runtime, but this is very unlikely to matter in practice.
Finally, putting MyClass
into a package object will allow you to use the new methods anywhere in this package without requiring import of MyClass
, which may be a benefit or a drawback for you.
Your way is the way to go (see [1]). Even though I solved it a little different, the approach stays similar:
object ExtraDataFrameOperations {
object implicits {
implicit def dFWithExtraOperations(df: DataFrame) = DFWithExtraOperations(df)
}
}
case class DFWithExtraOperations(df: DataFrame) {
def customMethod(param: String) : DataFrame = {
// do something fancy with the df
// or delegate to some implementation
//
// here, just as an illustrating example: do a select
df.select( df(param) )
}
}
To use the new customMethod
method on a DataFrame
:
import ExtraDataFrameOperations.implicits._
val df = ...
val otherDF = df.customMethod("hello")
Instead of using an implicit method
(see above), you can also use an implicit class
:
object ExtraDataFrameOperations {
implicit class DFWithExtraOperations(df : DataFrame) {
def customMethod(param: String) : DataFrame = {
// do something fancy with the df
// or delegate to some implementation
//
// here, just as an illustrating example: do a select
df.select( df(param) )
}
}
}
import ExtraDataFrameOperations._
val df = ...
val otherDF = df.customMethod("hello")
In case you want to prevent the additional import
, turn the object
ExtraDataFrameOperations
into an package object
and store it in in a file called package.scala
within your package.
[1] The original blog "Pimp my library" by M. Odersky is available at http://www.artima.com/weblogs/viewpost.jsp?thread=179766
I think you should add an implicit conversion between DataFrame and your custom wrapper, but use an implicit clas - this should be the easiest to use and you will store your custom methods in one common place.
implicit class WrappedDataFrame(val df: DataFrame) {
def customMethod(String arg1, int arg2) {
...[do your stuff here]
}
...[other methods you consider useful, getters, setters, whatever]...
}
If the implicit wrapper is in DataFrame's scope, you can just use normal DataFrame as if it was your wrapper, ie.:
df.customMethod("test", 100)