I have some interesting user data. It gives some information on the timeliness of certain tasks the users were asked to perform. I am trying to find out, if late
As a heads up for the future, you'll generally get faster (and better) responses if you provide a publicly available dataset with your attempted plotting code, since we don't have 'April.csv'. You can also leave out your data-wrangling code for 'April.csv'. With that said...
Sebastian Raschka created the mlxtend package, which has has a pretty awesome plotting function for doing this. It uses matplotlib under the hood.
import numpy as np
import pandas as pd
from sklearn import svm
from mlxtend.plotting import plot_decision_regions
import matplotlib.pyplot as plt
# Create arbitrary dataset for example
df = pd.DataFrame({'Planned_End': np.random.uniform(low=-5, high=5, size=50),
'Actual_End': np.random.uniform(low=-1, high=1, size=50),
'Late': np.random.random_integers(low=0, high=2, size=50)}
)
# Fit Support Vector Machine Classifier
X = df[['Planned_End', 'Actual_End']]
y = df['Late']
clf = svm.SVC(decision_function_shape='ovo')
clf.fit(X.values, y.values)
# Plot Decision Region using mlxtend's awesome plotting function
plot_decision_regions(X=X.values,
y=y.values,
clf=clf,
legend=2)
# Update plot object with X/Y axis labels and Figure Title
plt.xlabel(X.columns[0], size=14)
plt.ylabel(X.columns[1], size=14)
plt.title('SVM Decision Region Boundary', size=16)