Can Spark and the ScalaNLP library Breeze be used together?

 ̄綄美尐妖づ 提交于 2019-12-08 07:52:27

问题


I'm developing a Scala-based extreme learning machine, in Apache Spark. My model has to be a Spark Estimator and use the Spark framework in order to fit into the machine learning pipeline. Does anyone know if Breeze can be used in tandem with Spark? All of my data is in Spark data frames and conceivably I could import it using Breeze, use Breeze DenseVectors as the data structure then convert to a DataFrame for the Estimator part. The advantage of Breeze is that it has a function pinv for the Moore-Penrose pseudo-inverse, which is an inverse for a non-square matrix. There is no equivalent function in the Spark MLlib, as far as I can see. I have no idea whether it's possible to convert Breeze tensors to Spark DataFrames so if anyone has experience of this it would be really useful. Thanks!


回答1:


  • Breeze can be used with Spark. In fact is used internally for many MLLib functions, but required conversions are not exposed as public. You can add your own conversions and use Breeze to process individual records.

    For example for Vectors you can find conversion code:

    • SparseVector.asBreeze
    • DenseVector.asBreeze
    • Vector.fromBreeze

    For Matrices please see asBreeze / fromBreeze in Matrices.scala

  • It cannot however, be used on distributed data structures. Breeze objects use low level libraries, which cannot be used for distributed processing. Therefore DataFrame - Breeze objects conversions are possible only if you collect data to the driver and are limited to the scenarios where data can be stored in the driver memory.

  • There exist other libraries, like SysteML, which integrate with Spark and provide more comprehensive linear algebra routines on distributed objects.



来源:https://stackoverflow.com/questions/45898093/can-spark-and-the-scalanlp-library-breeze-be-used-together

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!