I am using IBM Watson Studio (Default spark python environment) and trying to convert a Keras model to systemml DML and train it on Spark.
!pip install systemml
import systemml
this executes just fine. But this -
from systemml import mllearn
throws SyntaxError: import * only allowed at module level
doesn't show mllearn.
I tried to install it from http://www.romeokienzler.com/systemml-1.0.0-SNAPSHOT-python.tar.gz and https://sparktc.ibmcloud.com/repo/latest/systemml-1.0.0-SNAPSHOT-python.tar.gz and a git clone but was unsuccessful. What am I doing wrong?
You need to do dir(systemml.mllearn) to see mllearn functions.
>>> dir(systemml.mllearn)
['Caffe2DML', 'Keras2DML', 'LinearRegression', 'LogisticRegression',
'NaiveBayes', 'SVM', '__all__', '__builtins__', '__doc__', '__file__',
'__name__', '__package__', '__path__', 'estimators']
Please install SystemML 1.2 from pypi.org. 1.2 is the latest release from Aug. 2018. Release 1.0 only had experimental support.
Can you please try to only import MLContext, just to see whether loading the main SystemML jar file works, and what version your installation uses?
>>> from systemml import MLContext
>>> ml = MLContext(sc)
Welcome to Apache SystemML!
Version 1.2.0
>>> print (ml.buildTime())
2018-08-17 05:58:31 UTC
>>> from sklearn import datasets, neighbors
>>> from systemml.mllearn import LogisticRegression
>>> y_digits = digits.target
>>> n_samples = len(X_digits)
>>> X_train = X_digits[:int(.9 * n_samples)]
>>> y_train = y_digits[:int(.9 * n_samples)]
>>> X_test = X_digits[int(.9 * n_samples):]
>>> y_test = y_digits[int(.9 * n_samples):]
>>> logistic = LogisticRegression(spark)
>>> print('LogisticRegression score: %f' % logistic.fit(X_train, y_train).score(X_test, y_test))
18/10/20 00:15:52 WARN BaseSystemMLEstimatorOrModel: SystemML local memory budget:5097 mb. Approximate free memory available on the driver JVM:416 mb.
18/10/20 00:15:52 WARN StatementBlock: WARNING: [line 81:0] -> maxinneriter -- Variable maxinneriter defined with different value type in if and else clause.
18/10/20 00:15:53 WARN SparkExecutionContext: Configuration parameter spark.driver.maxResultSize set to 1 GB. You can set it through Spark default configuration setting either to 0 (unlimited) or to available memory budget of size 4 GB.
The code works with the Python 2.7 kernel, but not with the Python 3.5 kernel. The commit https://github.com/apache/systemml/commit/9e7ee19a45102f7cbb37507da25b1ba0641868fd fixes the issue for Python 3.5. If you want to fix the older released version in your local environment, please follow two steps:
A. Fix for the indentation requirement of Python 3.5:
pip install autopep8
find /<location>/systemml/ -name '*.py' | xargs autopep8 --in-place --aggressive
find /<location>/systemml/mllearn/ -name '*.py' | xargs autopep8 --in-place --aggressive
You can find the <location>
using pip show systemml
B. Fix for the stricter Python 3.5 syntax: Replace the line in mllearn/estimator.py
from .keras2caffe import *
import keras
from .keras2caffe import convertKerasToCaffeNetwork, convertKerasToCaffeSolver, convertKerasToSystemMLModel
Since the fix is already delivered, you will have to wait for the next release i.e. 1.3.0. Alternatively, you can build and install the latest version:
git clone https://github.com/apache/systemml.git
cd systemml
mvn package -P distribution
pip install target/systemml-1.3.0-SNAPSHOT-python.tar.gz
final this is perfectly working if you are working on IBM cloud notebook
! pip install --upgrade https://github.com/niketanpansare/future_of_data/raw/master/systemml-1.3.0-SNAPSHOT-python.tar.gz
!ln -s -f /home/spark/shared/user-libs/python3/systemml/systemml-java/systemml-1.3.0-SNAPSHOT-extra.jar ~/user-libs/spark2/systemml-1.3.0-SNAPSHOT-extra.jar
!ln -s -f /home/spark/shared/user-libs/python3/systemml/systemml-java/systemml-1.3.0-SNAPSHOT.jar ~/user-libs/spark2/systemml-1.3.0-SNAPSHOT.jar