how to Load CSV Data in scikit and using it for Naive Bayes Classification

后端未结

关注

 1  1011

Trying to load custom data to perform NB Classification in Scikit. Need help in loading the sample data into Scikit and then perform NB. How to load categorical values for targe

相关标签:

1条回答

生来不讨喜

2021-02-10 10:14

The following should get you started you will need pandas and numpy. You can load your .csv into a data frame and use that to input into the model. You all so need to define targets (0 for negatives and 1 for positives, assuming binary classification) depending on what you are trying to separate.

from sklearn.naive_bayes import GaussianNB
import pandas as pd
import numpy as np

# create data frame containing your data, each column can be accessed # by df['column   name']
df = pd.read_csv('/your/path/yourFile.csv')

target_names = np.array(['Positives','Negatives'])

# add columns to your data frame
df['is_train'] = np.random.uniform(0, 1, len(df)) <= 0.75
df['Type'] = pd.Factor(targets, target_names)
df['Targets'] = targets

# define training and test sets
train = df[df['is_train']==True]
test = df[df['is_train']==False]

trainTargets = np.array(train['Targets']).astype(int)
testTargets = np.array(test['Targets']).astype(int)

# columns you want to model
features = df.columns[0:7]

# call Gaussian Naive Bayesian class with default parameters
gnb = GaussianNB()

# train model
y_gnb = gnb.fit(train[features], trainTargets).predict(train[features])

0 讨论(0)