What is the best way to define custom methods on a DataFrame?

前端 未结 3 1497
感情败类
感情败类 2020-12-13 21:51

I need to define custom methods on DataFrame. What is the better way to do it? The solution should be scalable, as I intend to define a significant number of custom methods

相关标签:
3条回答
  • 2020-12-13 22:15

    There is a slightly simpler approach: just declare MyClass as implicit

    implicit class MyClass(df: DataFrame) { def myMethod = ... }
    

    This automatically creates the implicit conversion method (also called MyClass). You can also make it a value class by adding extends AnyVal which avoids some overhead by not actually creating a MyClass instance at runtime, but this is very unlikely to matter in practice.

    Finally, putting MyClass into a package object will allow you to use the new methods anywhere in this package without requiring import of MyClass, which may be a benefit or a drawback for you.

    0 讨论(0)
  • 2020-12-13 22:20

    Your way is the way to go (see [1]). Even though I solved it a little different, the approach stays similar:

    Possibility 1

    Implicits

    object ExtraDataFrameOperations {
      object implicits {
        implicit def dFWithExtraOperations(df: DataFrame) = DFWithExtraOperations(df)
      }
    }
    
    case class DFWithExtraOperations(df: DataFrame) {
      def customMethod(param: String) : DataFrame = {
        // do something fancy with the df
        // or delegate to some implementation
        //
        // here, just as an illustrating example: do a select
        df.select( df(param) )
      }
    }
    

    Usage

    To use the new customMethod method on a DataFrame:

    import ExtraDataFrameOperations.implicits._
    val df = ...
    val otherDF = df.customMethod("hello")
    

    Possibility 2

    Instead of using an implicit method (see above), you can also use an implicit class:

    Implicit class

    object ExtraDataFrameOperations {
      implicit class DFWithExtraOperations(df : DataFrame) {
         def customMethod(param: String) : DataFrame = {
          // do something fancy with the df
          // or delegate to some implementation
          //
          // here, just as an illustrating example: do a select
          df.select( df(param) )
        }
      }
    }
    

    Usage

    import ExtraDataFrameOperations._
    val df = ...
    val otherDF = df.customMethod("hello")
    

    Remark

    In case you want to prevent the additional import, turn the object ExtraDataFrameOperations into an package object and store it in in a file called package.scala within your package.

    Official documentation / references

    [1] The original blog "Pimp my library" by M. Odersky is available at http://www.artima.com/weblogs/viewpost.jsp?thread=179766

    0 讨论(0)
  • 2020-12-13 22:23

    I think you should add an implicit conversion between DataFrame and your custom wrapper, but use an implicit clas - this should be the easiest to use and you will store your custom methods in one common place.

       implicit class WrappedDataFrame(val df: DataFrame) {
            def customMethod(String arg1, int arg2) {
               ...[do your stuff here]
            }
         ...[other methods you consider useful, getters, setters, whatever]...
          }
    

    If the implicit wrapper is in DataFrame's scope, you can just use normal DataFrame as if it was your wrapper, ie.:

    df.customMethod("test", 100)

    0 讨论(0)
提交回复
热议问题